From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 46FA4C02183 for ; Fri, 17 Jan 2025 15:53:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8EACE6B0083; Fri, 17 Jan 2025 10:53:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 873FF6B0085; Fri, 17 Jan 2025 10:53:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6C5F06B0088; Fri, 17 Jan 2025 10:53:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 478786B0083 for ; Fri, 17 Jan 2025 10:53:00 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id ED5851C783D for ; Fri, 17 Jan 2025 15:52:59 +0000 (UTC) X-FDA: 83017387278.16.B14F037 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf30.hostedemail.com (Postfix) with ESMTP id 0AE1780009 for ; Fri, 17 Jan 2025 15:52:57 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="yQZ/R3lv"; spf=pass (imf30.hostedemail.com: domain of jannh@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737129178; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=E7d+TpfXp3p6m5vGusLumGYBM9vV8hEetyox2i/2Tk8=; b=sM+vCa2cWXQsESOW0TAcTdfGlypU6fOGtzX9SnNKPU3RClKkrenaaKgxbzzC3x/1q5hoEu euhlMcnNZCBBV0POVbUdQ/xmAuL1fVVVxUuK95rzKDRZO1G8QDX5W5MP9cjB/7GQXVxQVp 2R3WhboUk63XRa/92tQ8kO8GpZEKrpc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737129178; a=rsa-sha256; cv=none; b=cGxwVzyyWbchaHSa8WDhPd09lplumAd75+CEPpFs9Ezxg7k7h+78NwW2WaOoybRV4QDVfM QIdgXSaNGVbp/4T0Oquj1vucZjoVRxqK9PCacgz5LNUqIlmBgFC1kghvRcCu0dGdwM15Bf D8IEh2HX7A+D5PUQCxOkA0JILq2d32M= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="yQZ/R3lv"; spf=pass (imf30.hostedemail.com: domain of jannh@google.com designates 209.85.208.42 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f42.google.com with SMTP id 4fb4d7f45d1cf-5d442f9d285so7590a12.1 for ; Fri, 17 Jan 2025 07:52:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737129176; x=1737733976; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=E7d+TpfXp3p6m5vGusLumGYBM9vV8hEetyox2i/2Tk8=; b=yQZ/R3lvscdPJhmsmQV5goMazDsiFWtiTykT7ePgNjMm4L8peqeOVSct/9VfPjQ0LC kxe9jqS8d60SiDMbJYw7HYO2xPXytCtcJ505yNdtr0IeSN1KiV2+CeFxl79R38MMYV6n zAkSWFr/l4EQEn46E2DYc3wK34ShSAZkZYTkUy5wFP5424r2pZIQWagv7aSU0xlYEh2n cMxA/hGMQwRR3YQXzupM968IS8wkNbqfT5IGb7B0RBDjhPQVfLLFOLkWxsCydnp1ge9u XoBWOaHvKqFzoP0lUYHYUS11aUcoJMGIPeZrIBv1eJ8cNjRfHPNKS8ExeNLfkRBA0c/N xeGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737129176; x=1737733976; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=E7d+TpfXp3p6m5vGusLumGYBM9vV8hEetyox2i/2Tk8=; b=KiGSSB3RlwaXpq5aIGXRE6vk6Ji09wBcM2ItTKNxTf0XXtEVsUFPEcxDvKCdLyCxD3 jvHCe4GZ9VSyUYI8vxMhd73hQWU/v3wpgzjps042ctqu/2/mIrdlNGJVC+cAJ2GdQMVG KD5nTL+SqhtpLW9ZTTdvUtMGYsR37dMJwuEBX9G9isMZaMefACkWefdObxlIBtPX23Er JS73CuOJ6zRdeoce34p+PjGEYGrvf8jbrw6qTsGuKe+7ewDXPqrnu/t1kHP6GrlHCW7v HZBqLcnCEkhJ5056CaBza6nE6o4qUZoD+g2Vw2defpMCxg1RVrwUkTRU/UYXbshrKjsH d/Rg== X-Forwarded-Encrypted: i=1; AJvYcCVEgTKXCpBFtNFswxTtAao19NIi2xekK4wHcb50Ftr31h5ikgsTbL8fgOqi6RVU01YN5b7LEhF2Qg==@kvack.org X-Gm-Message-State: AOJu0YxVxIQkGo1IE+cL1bfqXmTKKL3DcR3nOUIsNepKZE0Mm5YBBM6z o1qHd67x4D5JH4GhrzJQA9pfX2qCg1gq5uElzSSj/ZikZmx3mBNMgdbfumcjT/xRZSpG1zCRem/ OS6+skYm6WwkLIwtKCbye8atHjlmIIJ1ZvPng X-Gm-Gg: ASbGncsxeWbh3cWiPkQzYQale7FYPYlcQ0Nn0VyU7TIv3vPBN+r/V7jPYPCZZgrnNpn X6C3JtE5HVRkmyVGl1W195hvSV44UtTtfmWKgPUljt5qPcp53Axc+okYZVcdefEzb7w== X-Google-Smtp-Source: AGHT+IEEkI7Iy2SK9FPAK0t7gLelF/8kGvCq2TFqgDI01h52pfAV4lOJlwJrdbZvw3hEJ10c/9do3FNZA9vfJuC5wNY= X-Received: by 2002:a50:8e10:0:b0:5d0:acf3:e3a6 with SMTP id 4fb4d7f45d1cf-5db7e2c3f9dmr75503a12.1.1737129175787; Fri, 17 Jan 2025 07:52:55 -0800 (PST) MIME-Version: 1.0 References: <20250114175143.81438-1-vschneid@redhat.com> <20250114175143.81438-30-vschneid@redhat.com> In-Reply-To: From: Jann Horn Date: Fri, 17 Jan 2025 16:52:19 +0100 X-Gm-Features: AbW1kvZJ0qyEwdtkrx_UmrEU0O4aMmTblOKPsNQ-aUYqHdGS_cvMGi3arE1PehU Message-ID: Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs To: Valentin Schneider Cc: linux-kernel@vger.kernel.org, x86@kernel.org, virtualization@lists.linux.dev, linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev, linux-riscv@lists.infradead.org, linux-perf-users@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-arch@vger.kernel.org, rcu@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com, Juergen Gross , Ajay Kaher , Alexey Makhalov , Russell King , Catalin Marinas , Will Deacon , Huacai Chen , WANG Xuerui , Paul Walmsley , Palmer Dabbelt , Albert Ou , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Liang, Kan" , Boris Ostrovsky , Josh Poimboeuf , Pawan Gupta , Sean Christopherson , Paolo Bonzini , Andy Lutomirski , Arnd Bergmann , Frederic Weisbecker , "Paul E. McKenney" , Jason Baron , Steven Rostedt , Ard Biesheuvel , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Juri Lelli , Clark Williams , Yair Podemsky , Tomas Glozar , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Kees Cook , Andrew Morton , Christoph Hellwig , Shuah Khan , Sami Tolvanen , Miguel Ojeda , Alice Ryhl , "Mike Rapoport (Microsoft)" , Samuel Holland , Rong Xu , Nicolas Saenz Julienne , Geert Uytterhoeven , Yosry Ahmed , "Kirill A. Shutemov" , "Masami Hiramatsu (Google)" , Jinghao Jia , Luis Chamberlain , Randy Dunlap , Tiezhu Yang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0AE1780009 X-Stat-Signature: 9mz5khywypia4d1cfenfd4911anthju3 X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1737129177-992212 X-HE-Meta: U2FsdGVkX1/MHLteM1SxsKZgHvC2Qyoq6K0GM9u4FcvQ+30agsSRa1lsVsX72Mm6emhEze4mR56AxxlApkjsMsl5mp3I1S/O8UIIj+cy25XuPXo3c9N90RC0GpDSfcsK1P6cNRTF62KZjtPpXGoih/oDltK16Z2mnHsJ20W1uNTuXohW8NEUxwi+QrRaqB9eNIlX8Vhm4+gzheDQ6bciFuIY/JHWtJxCJXwYTmEtl44AgO12ttbuQBmVG9CCChMokyCUjHK4jRNKkxfJvuubeKsHZ/C/AElo1Pd6D6TcfUUnnQtnIn3C9ytmjFfvenTJ3cwEEU2N7tWdawRwnt9HegiH8R5kHcOzIF+gOlToEckFDQ0LXqhyjcZhVAyZ15AuZ+Q7xuToQwc3LWNUMovL0Wek5iYI4r288JmXctvsb3c+whtoEiQhh45QdC39EuzWOn3+lh6/6ThFTgqT4OxxD/P9W7wjpYuEMwihTDcx36UIoC3WK/sEKwF+Q9W4245+mYQYB5X6L0JfVk3l6pRElDrS9z10Qg5zzvfX0dfO4/vnjpxDoSBaz1ZG+bWK4UWe3RBEkoKz6gMINBDVVfv6Cv3x4tD98sIFo98ySl+TGslfDIizKUTpM+B5+GoMRt8j/cgb4kdVJ0SCaTbo+vst3UyAO97IgxjF7Wi5SNr4REqtc10a4FZ+9Z3mn1r5G/v1a06rxR3Vhf0OJ6DS/1HycInvaz4gN1UN2q0tCm7D/0ADZGnu4W6ba+xjMdDgLRbTMYt7hi0eMmBq5VH3c+sZVUSYKBTOSNlI3g3xaI9zr6ChR/dYyXTI2mNwO+uR9Uq4gvFLbFk/LOTggAKHGZoQTRtGbZpnYaXfKQvbA+mJJfAv8Nt/rBN6uHYhNJ4duX+pENhrwyTSw9oinOkoOARTqZvd+97G8YKbB12xaG/EuL7xQwfxNLAg3va52xn2aHh54GFR/Ug9/rx4mM3Lloz mAKGXeKy omH7X0H8oj4u2DYSaGOdTuVoyguqMhqynBrRRaXzkS789y4XXmoIw78oI7ZtePwstEmV9YX6G6MpEyr4h/ah3wqxilMoOwRWjm4NLR0miDYF542WjX8Gx78/sheDvwZnwJyfCBI66fapdgJ1VoC/B/f7i0641WTHtvtY9drm5U5Ti22eKAJLi4Q8A3INCklbykKa2UFFmG+bSFDAw3xli0b748SUJVCnvTxJltWGPj+TYnwguMzEkrNJLavk5vKUMPQCUNmlz+QKqWXlK6GowWmntA/8+l9riWNqAmr/wfnFKWkNy/44bxPQINEbGDH3qxdn6vZQtf4RboilB73V+yVQ8nw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 17, 2025 at 4:25=E2=80=AFPM Valentin Schneider wrote: > On 14/01/25 19:16, Jann Horn wrote: > > On Tue, Jan 14, 2025 at 6:51=E2=80=AFPM Valentin Schneider wrote: > >> vunmap()'s issued from housekeeping CPUs are a relatively common sourc= e of > >> interference for isolated NOHZ_FULL CPUs, as they are hit by the > >> flush_tlb_kernel_range() IPIs. > >> > >> Given that CPUs executing in userspace do not access data in the vmall= oc > >> range, these IPIs could be deferred until their next kernel entry. > >> > >> Deferral vs early entry danger zone > >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> > >> This requires a guarantee that nothing in the vmalloc range can be vun= map'd > >> and then accessed in early entry code. > > > > In other words, it needs a guarantee that no vmalloc allocations that > > have been created in the vmalloc region while the CPU was idle can > > then be accessed during early entry, right? > > I'm not sure if that would be a problem (not an mm expert, please do > correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't > deferred anyway. flush_cache_vmap() is about stuff like flushing data caches on architectures with virtually indexed caches; that doesn't do TLB maintenance. When you look for its definition on x86 or arm64, you'll see that they use the generic implementation which is simply an empty inline function. > So after vmapping something, I wouldn't expect isolated CPUs to have > invalid TLB entries for the newly vmapped page. > > However, upon vunmap'ing something, the TLB flush is deferred, and thus > stale TLB entries can and will remain on isolated CPUs, up until they > execute the deferred flush themselves (IOW for the entire duration of the > "danger zone"). > > Does that make sense? The design idea wrt TLB flushes in the vmap code is that you don't do TLB flushes when you unmap stuff or when you map stuff, because doing TLB flushes across the entire system on every vmap/vunmap would be a bit costly; instead you just do batched TLB flushes in between, in __purge_vmap_area_lazy(). In other words, the basic idea is that you can keep calling vmap() and vunmap() a bunch of times without ever doing TLB flushes until you run out of virtual memory in the vmap region; then you do one big TLB flush, and afterwards you can reuse the free virtual address space for new allocations again. So if you "defer" that batched TLB flush for CPUs that are not currently running in the kernel, I think the consequence is that those CPUs may end up with incoherent TLB state after a reallocation of the virtual address space. Actually, I think this would mean that your optimization is disallowed at least on arm64 - I'm not sure about the exact wording, but arm64 has a "break before make" rule that forbids conflicting writable address translations or something like that. (I said "until you run out of virtual memory in the vmap region", but that's not actually true - see the comment above lazy_max_pages() for an explanation of the actual heuristic. You might be able to tune that a bit if you'd be significantly happier with less frequent interruptions, or something along those lines.)