From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 059FEC02183 for ; Fri, 17 Jan 2025 17:00:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D539280009; Fri, 17 Jan 2025 12:00:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 684CF280002; Fri, 17 Jan 2025 12:00:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4D72A280009; Fri, 17 Jan 2025 12:00:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2CB5E280002 for ; Fri, 17 Jan 2025 12:00:41 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AE677A0636 for ; Fri, 17 Jan 2025 17:00:40 +0000 (UTC) X-FDA: 83017557840.30.72FD7F0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 6910940008 for ; Fri, 17 Jan 2025 17:00:37 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=i4MejUZ6; spf=pass (imf12.hostedemail.com: domain of vschneid@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737133237; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2zB9z7u7dclAVOo2ULVj1p74p7CXPvWK/XjFblVO3P8=; b=G+ImEOROUeMZQDX5W5Ue5lkNPgnIlWCvR/7dDnHHgBkm9qbjc/5vqcsM4CORNrvLFqiUZD dyllldQ9zSLLQjvgnv/1kKTUlVBtBLQAcjzq7vIuAI/oekVJqdZroquM8y7bGohoJIpah3 rzOTL7gJn9gGCHypD46is5WzzitOXuM= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=i4MejUZ6; spf=pass (imf12.hostedemail.com: domain of vschneid@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737133237; a=rsa-sha256; cv=none; b=Jbp10Y0oBTOD7GOueqER5Mj0ZIC2hwjeGxN5dI3/Tg6FNvnpknPbKv7TuIs1Abfalan2eX RIM1zpYCp0w+DeM4ngctm5Hdht+LXsfMC5eytThIo/YcOoWWxRlV56qVW98RqMJLTThnKl dIWfMhemafVW/lY3FyDRMIaQCqWn9Qc= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737133236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2zB9z7u7dclAVOo2ULVj1p74p7CXPvWK/XjFblVO3P8=; b=i4MejUZ6WmxXZxh2uPO4VWtJxweEGVCHpPxsqy2zLae42WshVn/gkSC/jGjgdY9nc12BcH IiaDZIi14B2ZahOYO0EaVNglwI7T9juUXpRE4Cg4Q89YjKrvSgZM1M73v51ImunCbwQe7F uhl103ZrTegnrqT2PUHKKyP2u+py/u8= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-152-z8WzGH-nPr6xfRfI6bVHrA-1; Fri, 17 Jan 2025 12:00:35 -0500 X-MC-Unique: z8WzGH-nPr6xfRfI6bVHrA-1 X-Mimecast-MFC-AGG-ID: z8WzGH-nPr6xfRfI6bVHrA Received: by mail-wm1-f72.google.com with SMTP id 5b1f17b1804b1-436219070b4so11175955e9.1 for ; Fri, 17 Jan 2025 09:00:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737133234; x=1737738034; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XGvTpyNczRvA62kOrzrUa3TgLdL0aqoCD5BBfkghzOs=; b=m0GjvQEBDVi30dSoRXbKqXUKeQtnyb4hcu+ZPI/rYdl4MPSN1EakoasxZ7fBKuetEx //O1YCFEpPBtgf/Co+6Yd+TyvwD57oO5sLmZBiM8kJpMs7+VXdUmJpNZ2kc57yw+WJ0R 23XxM/fszIPfgxNDbNLfJRTPUDUARUXOOvLCQvNjXoyEnjoTCFDfkCeDOtPbfSuvF4qc fX+A3A6BlC5pVtw87nKC8rT/qBU6ZEDFv8HhaSb5V+LZXRVWzT1DcNi/TbokZmW5+K/X khfIJgTk8XZnn3Cko1Gxj8P4bDrNJCHyUXs0t6CgTqiN8yoBKM4YdbQIlxVa11Jrnjp3 5m4g== X-Forwarded-Encrypted: i=1; AJvYcCUJIoUlysRhYC961fa2eqdZuIP7IXD5SAqA6a1rI6VytHK9WLiie94yaD3bnz+DZMFVk1xphu2ygA==@kvack.org X-Gm-Message-State: AOJu0Yy/aB3opBIJ7Ntve1wsV4sLFIbci0rzrbaMK3L/9oIkZuhKAUQg UdD5lk4YmjMFVYbZharKH3Y6n1lYlWNFZwXrig0gojuTew6BUA6nv95w57NLsTcrmBYGn97XScF I1RmFNazSl+5vU5XjqXrngeN0CT2ExfNd3RCzGQFlRfbacE4r X-Gm-Gg: ASbGnctgfFrYCisWoQe92VX8z45IW7tqFFgm7ewUyqW1tEMaXFID9VFNzESZsslGGXW z5cqdqKBEAKeGnG6iJqIqHANLGZw+14gyMQC7f+f4r5A07Y5krh/wTwtDOsZJu6HOQDIDDu/HVa zlK8jT88ZlrbhLocuAU2LRX2ovtKwpmLdvUaPjGSo04Rmm0PClLUSiEd/E4t1Uw5cKcIipi2dIv fJgJyfXCFEZ6sYYlRg7QrDaL5uS/3H8wJUeyaFIAM3rN2OLLXS6e/i4C6PYah5g6fnvmAth8oyh Wxci5BN1wqJFDym226rc7dY7qysiIW6Jr3mENzidTg== X-Received: by 2002:a05:600c:3585:b0:434:9936:c823 with SMTP id 5b1f17b1804b1-438913ef6d0mr38316945e9.18.1737133234126; Fri, 17 Jan 2025 09:00:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IFagDashmtCONr1e0mPjwxw78kvxvfLwflTa5M+wz0yje22vE2jx/prTk0zBbNIWdtqILag4w== X-Received: by 2002:a05:600c:3585:b0:434:9936:c823 with SMTP id 5b1f17b1804b1-438913ef6d0mr38315575e9.18.1737133233397; Fri, 17 Jan 2025 09:00:33 -0800 (PST) Received: from vschneid-thinkpadt14sgen2i.remote.csb (213-44-141-166.abo.bbox.fr. [213.44.141.166]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-437c7499932sm99166875e9.7.2025.01.17.09.00.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jan 2025 09:00:32 -0800 (PST) From: Valentin Schneider To: Uladzislau Rezki Cc: Jann Horn , linux-kernel@vger.kernel.org, x86@kernel.org, virtualization@lists.linux.dev, linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev, linux-riscv@lists.infradead.org, linux-perf-users@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-arch@vger.kernel.org, rcu@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com, Juergen Gross , Ajay Kaher , Alexey Makhalov , Russell King , Catalin Marinas , Will Deacon , Huacai Chen , WANG Xuerui , Paul Walmsley , Palmer Dabbelt , Albert Ou , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Liang, Kan" , Boris Ostrovsky , Josh Poimboeuf , Pawan Gupta , Sean Christopherson , Paolo Bonzini , Andy Lutomirski , Arnd Bergmann , Frederic Weisbecker , "Paul E. McKenney" , Jason Baron , Steven Rostedt , Ard Biesheuvel , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Juri Lelli , Clark Williams , Yair Podemsky , Tomas Glozar , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Kees Cook , Andrew Morton , Christoph Hellwig , Shuah Khan , Sami Tolvanen , Miguel Ojeda , Alice Ryhl , "Mike Rapoport (Microsoft)" , Samuel Holland , Rong Xu , Nicolas Saenz Julienne , Geert Uytterhoeven , Yosry Ahmed , "Kirill A. Shutemov" , "Masami Hiramatsu (Google)" , Jinghao Jia , Luis Chamberlain , Randy Dunlap , Tiezhu Yang Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs In-Reply-To: References: <20250114175143.81438-1-vschneid@redhat.com> <20250114175143.81438-30-vschneid@redhat.com> Date: Fri, 17 Jan 2025 18:00:30 +0100 Message-ID: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: -fPS7NBvIT10fIRsMnh9sYnz9XVx1EB7zfFvxUZ9DTE_1737133234 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 6910940008 X-Stat-Signature: dnuy8phxq1w1jkhdpjrpjy94nfzscrbg X-Rspam-User: X-HE-Tag: 1737133237-399388 X-HE-Meta: U2FsdGVkX1+bErURQ5XEykdE+psK4LCqfdcUVGsiUBHn+MHJl+l4kpXfMOqXFuqSxEgdagfjyImltbWHXjex9M3ZP5atRPg4EgagE/4p2dt4/eSntmdUSTtfhuaKsFmyzMuuklxZFW73++kO3WB3Gx8sBb18aCqvaq5uvJznVnucLUuDJOjpWeXm2Jl8KTaXsBtnmSEJmZ0BZ0vbUKeJNeQmjktZSgIfB7dEpqlrhFtCgo5xYn97fgvlZAA/r6+qAvnkJ1Pkm9bgTRgkb8GOHZBSW7VR4bJ1aEO9nZNlWpkRBH7L0JDosxr4iBhdA7CN/v8c3PTfdu+RMLVJ4Kb54ii5CgsAiriYWJTboVq8gLcC9IUfjG8T8O55R1fmWflrrMso9OPgTXIUM/NgjoidaE2+xIVFGNZrE8E0RVK0uoCWMd3o8gftD7zMuDs7AM9fVJjIFSOt/0+bBfY3PtEBOeqOLL6s6p8wG3Hu8RVF7RrRIFh2G2qdnKshixdUpIWy6/uSX5ZrJOQOANR3nycoMSrp+sYNfHpuwyIRyJ+8tWFLtHLEihb71Z9HTSFSE5y4GbpJ+FHg79vYWO1xzIMv0pnMuIQE0Fmu5ECEYSkJ9SpVFLOCL+0zbNP/o8XVvPVu1Jzqnx2kBOtE5goHd85QxKA8J7xao4C5WWTrxq1892Ewv9nA8IKcEQ4BA80D4RlHcw842NGa/4QwQohXipAoj2Ue5r1jKrnKysHGf40Z8JAGZsD4LTL0W5KguP4uDupdu/PWpG11N6mHjs+WSSbO9BKbQvyl5z9wHBP/MGH2yaRwcQ4bU5G8FC9c5dFB61OwDM04liY2PfUmNgvbBpsUGxZVVH2/29w0hMgHaD2FQMNNf0YqpSl6ML7G4Bb3LxwK98xTlFXLqiXcPFX+r4SCuwy9hsPekxYUStbre8PJai4etl/s07nCaN1B21AIAtuuJWdoGnxfFbWMuhsXOn/ /H/Jhq0X 9BT4/FhqWzlk9pnVWJYtXv74oNsHbAHLqSptaALiCSWYnHbeO2P4nBCiSUGydoOHMpVSQ1dNyOB282BUefs4vNeet/TyIZ4jizVRC9uQ2Zxp505cQydP8ocHhDPzEP25zV772ZUxMA0b8i+kFMunpVgLv7UKpQMiFgU8tfpRd4mY6Qdda4gdfLlEZkUHPW2ZLF2rxPBq96G9iq31ELZivCJJTVtFoLjyFHSuCWKWbgMeYslBEIdxnd3P0zz5v7691F7BSXBuqJ0qLdoowdUR+DBaAig+7eOylHnE7ksCx7jMKx/HQzn5S2uXflZ2KXKtCrGG9W2m/OrjDPtxyiz7j/nLoMFaVmAramZa2pqG2RWTuU1drXu+afxo4lzvJNDVljqHoPWlQRUK1MwHvwJJ4NQFIZllZ0rbF5T2hRIlm68U7CELlYI3D6UEvc823+2wKzhsPKhTto1nchHUpPHyk5VCNwVDKKKgqnIXz3Nor/YkjZP2n/zvbFsN3AA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 17/01/25 17:11, Uladzislau Rezki wrote: > On Fri, Jan 17, 2025 at 04:25:45PM +0100, Valentin Schneider wrote: >> On 14/01/25 19:16, Jann Horn wrote: >> > On Tue, Jan 14, 2025 at 6:51=E2=80=AFPM Valentin Schneider wrote: >> >> vunmap()'s issued from housekeeping CPUs are a relatively common sour= ce of >> >> interference for isolated NOHZ_FULL CPUs, as they are hit by the >> >> flush_tlb_kernel_range() IPIs. >> >> >> >> Given that CPUs executing in userspace do not access data in the vmal= loc >> >> range, these IPIs could be deferred until their next kernel entry. >> >> >> >> Deferral vs early entry danger zone >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> >> >> This requires a guarantee that nothing in the vmalloc range can be vu= nmap'd >> >> and then accessed in early entry code. >> > >> > In other words, it needs a guarantee that no vmalloc allocations that >> > have been created in the vmalloc region while the CPU was idle can >> > then be accessed during early entry, right? >> >> I'm not sure if that would be a problem (not an mm expert, please do >> correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't >> deferred anyway. >> >> So after vmapping something, I wouldn't expect isolated CPUs to have >> invalid TLB entries for the newly vmapped page. >> >> However, upon vunmap'ing something, the TLB flush is deferred, and thus >> stale TLB entries can and will remain on isolated CPUs, up until they >> execute the deferred flush themselves (IOW for the entire duration of th= e >> "danger zone"). >> >> Does that make sense? >> > Probably i am missing something and need to have a look at your patches, > but how do you guarantee that no-one map same are that you defer for TLB > flushing? > That's the cool part: I don't :') For deferring instruction patching IPIs, I (well Josh really) managed to get instrumentation to back me up and catch any problematic area. I looked into getting something similar for vmalloc region access in .noinstr code, but I didn't get anywhere. I even tried using emulated watchpoints on QEMU to watch the whole vmalloc range, but that went about as well as you could expect. That left me with staring at code. AFAICT the only vmap'd thing that is accessed during early entry is the task stack (CONFIG_VMAP_STACK), which itself cannot be freed until the task exits - thus can't be subject to invalidation when a task is entering kernelspace. If you have any tracing/instrumentation suggestions, I'm all ears (eyes?). > As noted by Jann, we already defer a TLB flushing by backing freed areas > until certain threshold and just after we cross it we do a flush. > > -- > Uladzislau Rezki