From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D013C02183 for ; Fri, 17 Jan 2025 16:53:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 940526B0085; Fri, 17 Jan 2025 11:53:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F0376B0088; Fri, 17 Jan 2025 11:53:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 744106B008C; Fri, 17 Jan 2025 11:53:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 539C76B0085 for ; Fri, 17 Jan 2025 11:53:45 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F3AB6B076B for ; Fri, 17 Jan 2025 16:53:44 +0000 (UTC) X-FDA: 83017540368.17.E6D02F8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 72BBB8001D for ; Fri, 17 Jan 2025 16:53:42 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ntd2U3lY; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of vschneid@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737132822; a=rsa-sha256; cv=none; b=t3olcTGCHICFYd5BCFkECgtWJ3euhgQgovTTYs/O5bzKUHQg+FAPcK+Y0glL2uz+KvIRbF nIQ9UK2+Qn1tDc1LodbciTqlSEb+EOyKHMV/86xBLPlQouVOmVTliklCFah9vyP5t1KChx tVXWXCHaDsMDT4Ov5UL4JBrQU+v0LSo= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Ntd2U3lY; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of vschneid@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737132822; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N/IFo0/H4hlssGmiFiddM6Zrvy7F7bLkhW3DK1UInsM=; b=Vkqp7Sx4iFd0nAW4eNRSbMh+JAZeusXs2fOxc6Y0UNg+jX0OWqQtapzp25Xl9tVLS6Pm8N DsJ6QXUR0dywxw+A96n1MYbQJ5NuS1L1E78Dd/d610omScWYoeMF9FyVQreSzbRN0Md3Ft l0k6o7kCSmmCyASx8KkY2AkeOqCo3m8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1737132821; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N/IFo0/H4hlssGmiFiddM6Zrvy7F7bLkhW3DK1UInsM=; b=Ntd2U3lYLeu77YgOCBVB7oKVppawROQgH1t2gyDdM3bX9Ht0gP70BeQK0ctGdX4SchefxF zZLglbsRy7c+yt58goyXyOBp17R5/WmlDvOBViMvj4QNipKyMukiKKmSA3YB+46zviaIEJ Oi3WtSNRr9Ffet3C2u3BHIjxA37DafE= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-467-yKvfiU8_PsewySwQCxOAhw-1; Fri, 17 Jan 2025 11:53:38 -0500 X-MC-Unique: yKvfiU8_PsewySwQCxOAhw-1 X-Mimecast-MFC-AGG-ID: yKvfiU8_PsewySwQCxOAhw Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-38634103b0dso1466750f8f.2 for ; Fri, 17 Jan 2025 08:53:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737132817; x=1737737617; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zS2KlTiZnGu6aQKTYj9yMifWHTI073WZ1gonGb0ZMuU=; b=j0MLtaRQds2nocyHDfEgiKbPHy+28X892oM8HWmYyKbk6kkK2/6IbyLxMRDbRKWqcY uMzu9WbWiz8LZyELVRObEbyc3SbAE65tEKrybdQihy2qYICLSzW0vuZJ+XbFC9FaS25T 3l2uQ+Sjf55F+uHptJF0GWdcOEfgjQuRtS0sRFyn8O/VP5c5V5eAGNT1Cu1Lx5si7FEc YbVfSPiRrgQqbdm33ifOh2blWuWBCD6IudEJJP3c1+rxEXRqSlCPr1br+1QgvlwKsY8J AuZoBQrlbuTBxhFKFmOXW9rrubNmX4Q4y7cCTv3mB/bntyQXKKQUfQEWCong+fYXMy9I HSnQ== X-Forwarded-Encrypted: i=1; AJvYcCU5ETXlZiNyFlal2O1x5ZUo9YBR3Qf2GNMw6LaaoGKozXOAN62IhkhCg5YwjloV3BKKOsTN37S3rQ==@kvack.org X-Gm-Message-State: AOJu0YySfZ1qMhpZL7X5qfm5lo74OwQi1XGP4WVcaurUfkirPy26kOCv vvZNUcV3nU3zB+ty/dFb6/cLVxgkFdjAkMXrfsmOneVJjmLdvFK/oWKuroUyr9GhqaqkBhHUXI5 imXoe5q4NhyF9xcZ+EIdAas3IfRU3K9G1u9dM9FWDSH4cZxD3 X-Gm-Gg: ASbGnctqSQ/sT2efj4LJFGurcJ2r9JX1Yedil6sQ3x+OZ2CtcBOYdFnXfAuyAklh3Xo Fzu6jsUYPihg62NNw0wfsWS1zDplLeyB0yp6BYMQhxMYZMt6vV+hEyhn5KZWfsZdopIbGWztrMC BlQsBb837ETBRPTnBia2RdiubxMLpg7LREYH/rhMmMzl2cmUhfBYxcH3kWEXBwAHcOFO/NlOG9d OpOMV5254KoqgMgxCle46kcU5KVQUPf+TjcM2ypFZBEOBVFx1EzOvu1gsgxF5rAdeO6jjfnCgV1 ccBv4fXz/QlyiABLgHjmEbVv2SEAROS+YMvxsSA92g== X-Received: by 2002:a5d:6da4:0:b0:38b:e32a:10a6 with SMTP id ffacd0b85a97d-38bf57a9932mr3394910f8f.41.1737132817318; Fri, 17 Jan 2025 08:53:37 -0800 (PST) X-Google-Smtp-Source: AGHT+IGgg/tn25OOVaMMDMlg+FUQ1oxKEYFZYFbhCFaXXhI5w4vXLbPHJsNwkMnyD4aTZ+GRHXAAGg== X-Received: by 2002:a5d:6da4:0:b0:38b:e32a:10a6 with SMTP id ffacd0b85a97d-38bf57a9932mr3394815f8f.41.1737132816637; Fri, 17 Jan 2025 08:53:36 -0800 (PST) Received: from vschneid-thinkpadt14sgen2i.remote.csb (213-44-141-166.abo.bbox.fr. [213.44.141.166]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38bf3221db2sm2893201f8f.29.2025.01.17.08.53.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jan 2025 08:53:36 -0800 (PST) From: Valentin Schneider To: Jann Horn Cc: linux-kernel@vger.kernel.org, x86@kernel.org, virtualization@lists.linux.dev, linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev, linux-riscv@lists.infradead.org, linux-perf-users@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-arch@vger.kernel.org, rcu@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com, Juergen Gross , Ajay Kaher , Alexey Makhalov , Russell King , Catalin Marinas , Will Deacon , Huacai Chen , WANG Xuerui , Paul Walmsley , Palmer Dabbelt , Albert Ou , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Liang, Kan" , Boris Ostrovsky , Josh Poimboeuf , Pawan Gupta , Sean Christopherson , Paolo Bonzini , Andy Lutomirski , Arnd Bergmann , Frederic Weisbecker , "Paul E. McKenney" , Jason Baron , Steven Rostedt , Ard Biesheuvel , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Juri Lelli , Clark Williams , Yair Podemsky , Tomas Glozar , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Kees Cook , Andrew Morton , Christoph Hellwig , Shuah Khan , Sami Tolvanen , Miguel Ojeda , Alice Ryhl , "Mike Rapoport (Microsoft)" , Samuel Holland , Rong Xu , Nicolas Saenz Julienne , Geert Uytterhoeven , Yosry Ahmed , "Kirill A. Shutemov" , "Masami Hiramatsu (Google)" , Jinghao Jia , Luis Chamberlain , Randy Dunlap , Tiezhu Yang Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs In-Reply-To: References: <20250114175143.81438-1-vschneid@redhat.com> <20250114175143.81438-30-vschneid@redhat.com> Date: Fri, 17 Jan 2025 17:53:33 +0100 Message-ID: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: SGS1ub5Su21LltiZ3i5cMzyLUPKpqKsbvW-OgWDnuYQ_1737132817 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 72BBB8001D X-Stat-Signature: q5rfwgntkjptnw34csispefcznzmgst4 X-HE-Tag: 1737132822-847988 X-HE-Meta: U2FsdGVkX1/WmjZri0xEjvs5xK1Oz4/vKFLIcXcGKMJNMhA6iaK2TOG8O+a2p3HpeBrAkKGz+VU/7uiJlYL1Rmk9BEQ3PxoUw6tpLz1qrpwlRE4fV2JBf2CIphu23qC8/nrl1OWBi10SZgnHcWJeWxGnDa2Rcema5VtlIDb0SpQOn4XQNQmicbzwEvd5nGjYvj/cdfZ6vEwKPttrHBYZ+1VbtZQbTMZ1GlNQ3ZpNNIlD3ZPH98t/5uz+hRGZ4zfFn+duiEtBrhGmSWNSA7x3ZUqgYrmrXKGWn6ooIz2iUDuu6+wSRwNwOHAkQHqKwcsOcwIWy2WiiDASNpwMJ5UIpx8ZsxO9DiZsB1TqJqAxs4y6v+X6NksLj4ECY5FzOFAo+8Jr9MStOh+VjbyUEskuU0Zt1suCwL7MiAVWpkFlYjlLbdLRvX7+ITufFQyXUcmVZ5ucP6N9b/lEeBhxPwAVprSG1mtrsQuiQtU4lTlryxNKR3TKtZ1Dyqws8ejvppDHe/HwEz9WC8DQ8GJmOmhYie3xpVOKB2Vgrw+l647pSaZ5wUu4Cf2G4NwMNc8tcSOx61bg7nS2ihO3k8cbyhka1Lsuw/dp7Xzpb18nu/LwWmWqVMT0y7xYjfP/hqrvtqT1JPZdUwORNyRlAM4xUeYkGz8+0V74NufO/Kdchn6STgpTcrnabxGGrDVpeZ3Gm2CsFIFWza4ZTa23EyUyEQJ537RfdDCP0uE6paPw/9xfBKDS7pyDgsTPiufORTd4XUgY0aw/4h3W9V6RI1s/vr2076cySuYPtW5clSfdeGLW5PhDk3C3tprW5LKnfB2zlFSeM+vB1lmnKWXkkzwS7czpsTHfW14O82vb4j7wGL2lZYqLPGsw1qfLBAfiFTD7+rEJQVdw7O18uPfcll1gCMlAcb49Urczwlemonen7Q/VYlZQ4jGWGUbo5riPLMY1djImJFUcGn3cX1hb4fPxnSd T6BjrDm0 jArfKZ7TLBpKKbP/A0k88mhyg/H2ImCe7Vpg/yzZRq5v4yS2T7mqvdKLuMH2rppI4P73VnB5Pq60yY/HyQSZcu1I+5fJZM8uyFaKFr0CJfPGNgFGiCrgIvEccMnJiFCvBP62CsjbRO9jg4JTz4OIqgSVsB73PYKEYzHf/Ya4g4LsC6SJ7Q0kUon20X/4yTWys3qeqxXMpjSACdH9FkOOP8swwg3Cxv9R+9ObIuIjU/v0u/SKcaMy3+qcERI8QL0AQf2BTzRO2ZFwE8hktmSgyltcnieCxvKU7aUXdzInQIkzwbdGdj1+b7oMNOZOsULKCdH+ZqXPwYu23Z9/cWxBPklpvXS364H020NdIRfYF+OyTNfnbikd3VnEUDnAN/yf4dByUB8nysO3DQN4aOFZXpdDJvdj86zXCObUQjLySy1DlEQFmLaKhiUlgLSJWQQGYVtjj+hqONTkGMFMHHKo4nBFOMg/3D7dKNzZQGK3QO+ksGNN7WzphmRrJUQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 17/01/25 16:52, Jann Horn wrote: > On Fri, Jan 17, 2025 at 4:25=E2=80=AFPM Valentin Schneider wrote: >> On 14/01/25 19:16, Jann Horn wrote: >> > On Tue, Jan 14, 2025 at 6:51=E2=80=AFPM Valentin Schneider wrote: >> >> vunmap()'s issued from housekeeping CPUs are a relatively common sour= ce of >> >> interference for isolated NOHZ_FULL CPUs, as they are hit by the >> >> flush_tlb_kernel_range() IPIs. >> >> >> >> Given that CPUs executing in userspace do not access data in the vmal= loc >> >> range, these IPIs could be deferred until their next kernel entry. >> >> >> >> Deferral vs early entry danger zone >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> >> >> This requires a guarantee that nothing in the vmalloc range can be vu= nmap'd >> >> and then accessed in early entry code. >> > >> > In other words, it needs a guarantee that no vmalloc allocations that >> > have been created in the vmalloc region while the CPU was idle can >> > then be accessed during early entry, right? >> >> I'm not sure if that would be a problem (not an mm expert, please do >> correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't >> deferred anyway. > > flush_cache_vmap() is about stuff like flushing data caches on > architectures with virtually indexed caches; that doesn't do TLB > maintenance. When you look for its definition on x86 or arm64, you'll > see that they use the generic implementation which is simply an empty > inline function. > >> So after vmapping something, I wouldn't expect isolated CPUs to have >> invalid TLB entries for the newly vmapped page. >> >> However, upon vunmap'ing something, the TLB flush is deferred, and thus >> stale TLB entries can and will remain on isolated CPUs, up until they >> execute the deferred flush themselves (IOW for the entire duration of th= e >> "danger zone"). >> >> Does that make sense? > > The design idea wrt TLB flushes in the vmap code is that you don't do > TLB flushes when you unmap stuff or when you map stuff, because doing > TLB flushes across the entire system on every vmap/vunmap would be a > bit costly; instead you just do batched TLB flushes in between, in > __purge_vmap_area_lazy(). > > In other words, the basic idea is that you can keep calling vmap() and > vunmap() a bunch of times without ever doing TLB flushes until you run > out of virtual memory in the vmap region; then you do one big TLB > flush, and afterwards you can reuse the free virtual address space for > new allocations again. > > So if you "defer" that batched TLB flush for CPUs that are not > currently running in the kernel, I think the consequence is that those > CPUs may end up with incoherent TLB state after a reallocation of the > virtual address space. > Ah, gotcha, thank you for laying this out! In which case yes, any vmalloc that occurred while an isolated CPU was NOHZ-FULL can be an issue if said CPU accesses it during early entry; > Actually, I think this would mean that your optimization is disallowed > at least on arm64 - I'm not sure about the exact wording, but arm64 > has a "break before make" rule that forbids conflicting writable > address translations or something like that. > On the bright side of things, arm64 is not as bad as x86 when it comes to IPI'ing isolated CPUs :-) I'll add that to my notes, thanks! > (I said "until you run out of virtual memory in the vmap region", but > that's not actually true - see the comment above lazy_max_pages() for > an explanation of the actual heuristic. You might be able to tune that > a bit if you'd be significantly happier with less frequent > interruptions, or something along those lines.)