From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E5C2C001B0 for ; Mon, 24 Jul 2023 11:32:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08FEF8E0001; Mon, 24 Jul 2023 07:32:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 018996B0074; Mon, 24 Jul 2023 07:32:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAE058E0001; Mon, 24 Jul 2023 07:32:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C406F6B0071 for ; Mon, 24 Jul 2023 07:32:48 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8C348A0315 for ; Mon, 24 Jul 2023 11:32:48 +0000 (UTC) X-FDA: 81046293216.26.B22A74F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 6DFA210001C for ; Mon, 24 Jul 2023 11:32:45 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PWawkXlw; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of vschneid@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690198365; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=q9NPDCcBt89I6QlMBvo89JajwtVPnFPWg7ziLiggEtw=; b=6bCw2Ljbjg3EFcK5fG1fikBx7XQn8T+TxuwdOb55RaCkUikGTNxjTQZYDyRhIex+sTcntQ iIZ2MVwh8XbDvvFyg1UzG7vpdvjmZm68TFO/aloaeEs3zJJGAhON0vrEcJKKtX0jiZQR/+ T+WFi6whtx3BkKi+TA0waZjQYD5jHng= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PWawkXlw; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf14.hostedemail.com: domain of vschneid@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690198365; a=rsa-sha256; cv=none; b=yLJQVpkWuPZ4jS/syFXkuvxst651rKnzK9QIoeVjk4F8MG54mlpjDb7e278CEqGC1i4e/H bH76C+4ml0fRt7gMBuCtPpoNhQPWtB5GsQdmzLglSE/gU3WPSC8QKWp8cTb4xNe/7lGSCd sdpdUssDe945+BX9FpiNVdUPwNL7yiA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690198364; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=q9NPDCcBt89I6QlMBvo89JajwtVPnFPWg7ziLiggEtw=; b=PWawkXlwrRgPtxuilim3Dcdgr0tONocUKQ3tvFtbhO7RvhMsTBUK0YPp5173g0rQMYbvS1 Mc/npv32FPDS7o5IeVaTL8+uSxq1s9ecFLdHCvOksBJ0UHuV/MDMDo2OH2OvzOffiO65Cr 7pYiuzK+hqOs9XfObPe8Y0+TiTywSXE= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-212-irZYtqupOOKXXqwrlhPvfw-1; Mon, 24 Jul 2023 07:32:43 -0400 X-MC-Unique: irZYtqupOOKXXqwrlhPvfw-1 Received: by mail-wr1-f72.google.com with SMTP id ffacd0b85a97d-315a03cae87so2539300f8f.0 for ; Mon, 24 Jul 2023 04:32:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690198362; x=1690803162; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EXJhrkLdNOLUHGZbbc+F0hjfqb3hPCsrY6WTn8MVyN8=; b=NJ7chpjjXHv2yeWifWiaMZqsrJUeK7XZzaPxyYGeVquVrkbP1/IXG6HpsRjeVWd6SQ a2AMQ79d+/rhKuYX1Ggyon+Y+tK5JcdSumjNhAlHt246wU2WPShGcaX3OmFXrUw1hmaX TC5gOGmstJ4wYtzCHpINz9FeToVotFfSRy4PtRCR7Frggc+1Pd45giB8OtJ42wg9Ul5F vIJmlFnmGg3Kyi3ACHNSRkhglq2vWSIcbEcs9K9GWCZrILaSGDJZu18Al22c3qWfKQWw w7+bNplsPdDXNo9DdpjKO7L5MKBkNe7DIf+pz9W6w2dT3bxmFMtbZUEaLbzyD+K44xCF tvyw== X-Gm-Message-State: ABy/qLaSw5MAKMIfibQiGwj0zMq3kuo0JJWLRH9pfo884fQBE922b2Tb 07SFqocDPXYwZl3/6JLMmglXB7H7jFxvL4vH86cg3yYxhdM6mAM7WBRz4HB/hmfKQatq2HsKJKC sZZ5Vu9jiCgM= X-Received: by 2002:a5d:4ec5:0:b0:314:98f:2495 with SMTP id s5-20020a5d4ec5000000b00314098f2495mr7489870wrv.12.1690198362017; Mon, 24 Jul 2023 04:32:42 -0700 (PDT) X-Google-Smtp-Source: APBJJlEI60yM7OEQYltk8pZJRHHkHj4AM6BZ+3LNkwODYxQy3aj86Uaqvq9RfiI7soFJIBmCEAmJvQ== X-Received: by 2002:a5d:4ec5:0:b0:314:98f:2495 with SMTP id s5-20020a5d4ec5000000b00314098f2495mr7489803wrv.12.1690198361680; Mon, 24 Jul 2023 04:32:41 -0700 (PDT) Received: from vschneid.remote.csb ([149.12.7.81]) by smtp.gmail.com with ESMTPSA id w7-20020a5d4047000000b00313f61889ecsm12615302wrp.66.2023.07.24.04.32.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 04:32:41 -0700 (PDT) From: Valentin Schneider To: Nadav Amit Cc: Linux Kernel Mailing List , "linux-trace-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "kvm@vger.kernel.org" , linux-mm , bpf , the arch/x86 maintainers , "rcu@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , Thomas =?utf-8?Q?Wei=C3=9Fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky Subject: Re: [RFC PATCH v2 20/20] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs In-Reply-To: <188AEA79-10E6-4DFF-86F4-FE624FD1880F@vmware.com> References: <20230720163056.2564824-1-vschneid@redhat.com> <20230720163056.2564824-21-vschneid@redhat.com> <188AEA79-10E6-4DFF-86F4-FE624FD1880F@vmware.com> Date: Mon, 24 Jul 2023 12:32:38 +0100 Message-ID: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 6DFA210001C X-Stat-Signature: 3ijpied6651pyw4ybwp6g8jozrkcgwrc X-Rspam-User: X-HE-Tag: 1690198365-805406 X-HE-Meta: U2FsdGVkX18Ar6E2N/H3SDZFunxeD2pPONv4h5CD7AtgT9nr44YRZPHTCsh6zkt/ZZQY2gSXl2fmR+HM9pb/8YiuolRK9Lil2Dle0gIh70AJiR/GEFWlJr/JJTwmM3UkGT9xH/Okvm1zbLBZiaZzs3wBOXu1jjX1yUjqPKnb1MUmBnOmWJVxlt7Z35Qpz7dCNU/FoHR1sJYHzD6skWYPI17Ej8ungYCFB7i5Q6Npv5RtVwOI65EhEQukhhXeS+pFVtxsaWiSvxMjs7x4sSTVvu3dYIQjYfJv8p2glB8XUmSNptzmhkJeJ4mXgCMtaSzM+A/KAIignvwiBBcHHKwIpOambj1hc3+jE9OdmJO1Pww9YhjHv1GGvIShyFIizsub3vEwutKz7RqqHiGOU4X6wfsHsYzoNP7BYuagkFzkhdbEl50TJlti9mfm4BFzFjZQes7AT4w8YkAqS7GbILCBbLDfuwYWkTz+eCxuDONG0CSNYlfw2CdcY8cHmkPYEIUcikDAcvL04A74Q0Ik7kL2DcBRa303vRwxyDrnEd4luBxPp/usCG1v1tK2RPdDtDmeMLGheev3x+bAlzWQTGDw4XnyXclUUiNCsPuGEnHliavuXwofx2VrZi6wCh3o0rMQbpkgfsCF0umbPKXIK1nB6E0/O7vXbjXSfHphFSJ1yfLbE2iTZ6DdgKozfVqO6l+wSkouQmeIat8xC3OAJ6/Md7c6CbWyJmkm9iQDN4bYZJERP+cmHky5y5z9ABnSEw0l/LaOtHFjr9wRmbnNAgoRnqvKy09CWdM7f4C1SeF4Ur7C1acB/yZIwmZe74Jo603vjqD2TZl3c2ZfDR1hCFliyZXa15sBDPqGnWusith3mc2pBzWMXaok5Whmbh2MA27eXovwLFihD4abCEHp9WH4Ttj4wCDmEh+uTfURfgJNIlgF83rYSChrC96RokI/oxXXvCvel9pWNIcah4qxQgG kLQOKT+A nUcn3AreiK5s72/5RSBwYFI/nAuGU14o+IEbM5E7S6x0CQaI+p+GNr9RKHMMNewGkeJYAdxkO1iiMWpYPg18kOmUM9kz8mkeO6lkprD83oEvzD0UNDvEyJXCteAEBagymAKCn40dRw7+P2CJ94xcfAiTXQawu5C5B6DXGr5GAlMstNIEALjkxTXTIS5LPRSkbCeqAZy+U5rzwG32ZWel2y6ixl7ySpeY2HSEbtj+4LbQE77WxUK8V4R4zHHjSM59Qrnb8b5AGrbPLjbbT0qO9BEr3EljhfS/mdRK1ahnLQMIrn1+keT5eccsqWkl0E20cF4qEBHvrwxZdKLXJjxbVui2nDsXsAg9xAK/kOqCTrQ5dBfDDhOmuFZ3OPjEL2ftpv1upY2JqzNPWnmhFFEyJMSEBA0VFAwJAU8RHDFXj343X1O6zTNVmZgm4JFLEwc8FU4GPDRs8xfqln1mn2DzWvWpnMxfDTMgNDJxscBOLyapD5ebSQ/DWpF1urhifS4AdwsVOxeJtETF1b/d0UoLxtW4GKEHtNrHHRgrtz9V/QgQhsm9IZHHMnQ2kFw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 21/07/23 18:15, Nadav Amit wrote: >> On Jul 20, 2023, at 9:30 AM, Valentin Schneider wr= ote: >> >> vunmap()'s issued from housekeeping CPUs are a relatively common source = of >> interference for isolated NOHZ_FULL CPUs, as they are hit by the >> flush_tlb_kernel_range() IPIs. >> >> Given that CPUs executing in userspace do not access data in the vmalloc >> range, these IPIs could be deferred until their next kernel entry. > > So I think there are a few assumptions here that it seems suitable to con= firm > and acknowledge the major one in the commit log (assuming they hold). > > There is an assumption that VMAP page-tables are not freed. I actually > never paid attention to that, but skimming the code it does seem so. To > clarify the issue: if page-tables were freed and their pages were reused, > there would be a problem that page-walk caches for instance would be used > and =E2=80=9Cjunk=E2=80=9D entries from the reused pages would be used. S= ee [1]. > Thanks for looking into this and sharing context. This is an area I don't have much experience with, so help is much appreciated! Indeed, accessing addresses that should be impacted by a TLB flush *before* executing the deferred flush is an issue. Deferring sync_core() for instruction patching is a similar problem - it's all in the shape of "access @addr impacted by @operation during kernel entry, before actually executing @operation". AFAICT the only reasonable way to go about the deferral is to prove that no such access happens before the deferred @operation is done. We got to prove that for sync_core() deferral, cf. PATCH 18. I'd like to reason about it for deferring vunmap TLB flushes: What addresses in VMAP range, other than the stack, can early entry code access? Yes, the ranges can be checked at runtime, but is there any chance of figuring this out e.g. at build-time? > I would also assume the memory-hot-unplug of some sorts is not an issue, > (i.e., you cannot have a stale TLB entry pointing to memory that was > unplugged). > > I also think that there might be speculative code execution using stale > TLB entries that would point to memory that has been reused and perhaps > controllable by the user. If somehow the CPU/OS is tricked to use the > stale executable TLB entries early enough on kernel entry that might be > an issue. I guess it is probably theoretical issue, but it would be helpf= ul > to confirm. > > In general, deferring TLB flushes can be done safely. This patch, I think= , > takes it one step forward and allows the reuse of the memory before the T= LB > flush is actually done. This is more dangerous. > > [1] https://lore.kernel.org/lkml/tip-b956575bed91ecfb136a8300742ecbbf4514= 71ab@git.kernel.org/