From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A93B4C021AA for ; Wed, 19 Feb 2025 16:18:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B200280240; Wed, 19 Feb 2025 11:18:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 162DA28023C; Wed, 19 Feb 2025 11:18:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1DB4280240; Wed, 19 Feb 2025 11:18:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D549128023C for ; Wed, 19 Feb 2025 11:18:25 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6F2E7B71E1 for ; Wed, 19 Feb 2025 16:18:25 +0000 (UTC) X-FDA: 83137201770.17.20681F4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf01.hostedemail.com (Postfix) with ESMTP id ABDA940009 for ; Wed, 19 Feb 2025 16:18:22 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZhlsMIoc; spf=pass (imf01.hostedemail.com: domain of vschneid@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739981902; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/+vVVMJTXnY0V8Ji7GQili3vpyK+jYVcqRgGTJ9UsZo=; b=kQebdU+uorROP0sLylOp+hv8a/VGNSO5T+15qoNjVFj+1SlxqZbE9/kOxU8sjCVq4gkR9d pfKEsfPmBgvwa6awZGxqaIq+bwpsyJnviRp9C7bXxVm7LijBzvGUjYnqldGrURJXaLTUlD Pzs5KuGBNTbZFqPR5D6+m65zpw3ZULA= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZhlsMIoc; spf=pass (imf01.hostedemail.com: domain of vschneid@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=vschneid@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739981902; a=rsa-sha256; cv=none; b=q/B/xpdtDFpjrqdanwme/OY55P7xKM6F2wPCBTccJtev+1Wz/MjUfEVpaxGj+FcCyRlqQn XMi1jDMACcRh6KAuzKP3wQ+7L7JrKMvCmRDndDjnSyZowEsCPVScDdJi+d/sZprewd+8LZ eCm5MFYDj8UDoSAQ+sc6xHRUFVZ8xtU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1739981902; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/+vVVMJTXnY0V8Ji7GQili3vpyK+jYVcqRgGTJ9UsZo=; b=ZhlsMIocp/cJ06u5GP0r60IFUL2aS3+UIBUmK3REyDawooZqOmWwcwOPVlu2nCR+708hYp wjej4+EH3RzssvbOViASUU42VH1645AG6hHNHEkJnIEGjZayuW+qI89gmWNGJ8cDZ7LWq2 HasTBC69y/iNAN1NYF2tjgRLU0wKEtk= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-36-nhih9qQDMd-8qmI27M3k4w-1; Wed, 19 Feb 2025 11:18:20 -0500 X-MC-Unique: nhih9qQDMd-8qmI27M3k4w-1 X-Mimecast-MFC-AGG-ID: nhih9qQDMd-8qmI27M3k4w_1739981900 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4399a5afcb3so10911115e9.3 for ; Wed, 19 Feb 2025 08:18:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739981899; x=1740586699; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b38N2G5pI18KXCo6dYoQ0A/4T04VhxVdQ6rNeU31Ph8=; b=RjiI1kVZ/XhNhOcfMNjdEtRtV85hiUsuq7RsFnXy4Z6Oc32DracWhasFcz+w3gSYNY X7o/WCVHh8RgD7cs/Gnq9Xgb/O9L4g0g1VwMG9/san29Fsg1KTTHnEDMsNkpNhTGz17B Fyff0ow8zm9wrMccbL2YPsBXxlQbNX1BgjHjGF6u7nIcZtzYUel0+5xUCmKJ5J88x8CA uskoxzH8VgGsuuCBUCfsZ6CnmiXOHAE++nuVGWg+mmBiM7hdgjro5s7gNEBrY8Rm4G37 hafpcaA9j0JKbgMl6Y26Wf/qFRP+q3nFcr6eZ9ZUI8bXUXHRocYsqbCJqaDfUfF98s+Y /Jsg== X-Forwarded-Encrypted: i=1; AJvYcCXt9qj76X8DiZ+ZAjynoGr27bziC+cT1oGsOB0NODhNBw1yh01tWeLsXK8jL5rSycCH/Es3prOCOw==@kvack.org X-Gm-Message-State: AOJu0YwQZ3Y6hqQfNfPu5KsmDlCtGURhAh17J/IsCyESwUzQcfNeIkyS Y90+DkSlwmpCzDHFHcXxDqryJKBg4PESdAM0Q9L8AHSCAjMhKtQWAvDwBj7WDFsAXwef9Suq6pw kv7STOnbRduJt7QLjqNlaN8aCAFk3OupzI2uAVP96A9SyAHsp X-Gm-Gg: ASbGncsweUWaMhwx5OvJcMpxGP/569UnatYijzvVfTl+Jw2UEAw5Ia8juvph2HZ7vP6 EpdksMOULkQPJk8hHCLxdYxU2CobUJJz7cxfmdY/ejq+MHiP54/KRYBfoWeQFiARYU+emz+jh01 XDN/1VPq9jl+2flNUTo3ny1m5JxZVGK/mtOSRMUAFKgkMpZnvDIlj3Ee4rLdWuhoQe72NGbtwSA Qufu97zNrU8YWkMQQiGBks0QPxBM3FO5F0qRQGCu6l6+DxXsYXxIcL+5yYZk4MnepN9ihFxqvqg JmFQOZr9xMRSjHihXesKfGpE+PXQy7nzgLJqZ3+N79zXlAcMlCnmPCKb4ghje/iKsg== X-Received: by 2002:a05:600c:5246:b0:439:9d75:9e92 with SMTP id 5b1f17b1804b1-4399d75b257mr27951805e9.28.1739981899414; Wed, 19 Feb 2025 08:18:19 -0800 (PST) X-Google-Smtp-Source: AGHT+IGrgaKWzB18Ksr+NPSQeT1o3+i2kAVUfh7lIILUz6n7wULpVTIiD8VAP1UIlBXohEkgY6+FYQ== X-Received: by 2002:a05:600c:5246:b0:439:9d75:9e92 with SMTP id 5b1f17b1804b1-4399d75b257mr27950675e9.28.1739981898863; Wed, 19 Feb 2025 08:18:18 -0800 (PST) Received: from vschneid-thinkpadt14sgen2i.remote.csb (213-44-141-166.abo.bbox.fr. [213.44.141.166]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-38f2591570esm18461449f8f.59.2025.02.19.08.18.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Feb 2025 08:18:17 -0800 (PST) From: Valentin Schneider To: Joel Fernandes Cc: Jann Horn , linux-kernel@vger.kernel.org, x86@kernel.org, virtualization@lists.linux.dev, linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev, linux-riscv@lists.infradead.org, linux-perf-users@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-arch@vger.kernel.org, rcu@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com, Juergen Gross , Ajay Kaher , Alexey Makhalov , Russell King , Catalin Marinas , Will Deacon , Huacai Chen , WANG Xuerui , Paul Walmsley , Palmer Dabbelt , Albert Ou , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Liang, Kan" , Boris Ostrovsky , Josh Poimboeuf , Pawan Gupta , Sean Christopherson , Paolo Bonzini , Andy Lutomirski , Arnd Bergmann , Frederic Weisbecker , "Paul E. McKenney" , Jason Baron , Steven Rostedt , Ard Biesheuvel , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Uladzislau Rezki , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Juri Lelli , Clark Williams , Yair Podemsky , Tomas Glozar , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Kees Cook , Andrew Morton , Christoph Hellwig , Shuah Khan , Sami Tolvanen , Miguel Ojeda , Alice Ryhl , "Mike Rapoport (Microsoft)" , Samuel Holland , Rong Xu , Nicolas Saenz Julienne , Geert Uytterhoeven , Yosry Ahmed , "Kirill A. Shutemov" , "Masami Hiramatsu (Google)" , Jinghao Jia , Luis Chamberlain , Randy Dunlap , Tiezhu Yang Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs In-Reply-To: <20250219145302.GA480110@joelnvbox> References: <20250114175143.81438-1-vschneid@redhat.com> <20250114175143.81438-30-vschneid@redhat.com> <20250219145302.GA480110@joelnvbox> Date: Wed, 19 Feb 2025 17:18:14 +0100 Message-ID: MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: EdcOd_ln7zaCSciP10juUK314z_U7fCCWKzp23lVgk4_1739981900 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: ABDA940009 X-Stat-Signature: 3zm6iwkhqcoj91rhdcft1s6s1jhgwg7u X-Rspamd-Server: rspam03 X-HE-Tag: 1739981902-763563 X-HE-Meta: U2FsdGVkX1/d+ocRlFTIUxxVPYI2C0ipof2+XxmtD0MaGsCjB0uzauAMi2LeHRoVFcxlNn5IHigyDfgr6Ib1DBkL2UqHoKTGCzNuZEMRe79dSVXViT9Vc4c6+hhRWFohq4ldK1FNGXzvRHusqUSJCIpp66VAkjf5Q7+8N2bh3Puos3yfUdTrUUUhdYGQEkZ8Oae6UjJpXsF1tQv7Nj5QnP5i3Ilbq+HZP1GJHBYdvbduA90bmcuUUOWUS/0Zg5agUjAceHBFqiJ+Zm4vjeO7tbW69mCYnabAA+O88PEEVYa/zgJTAAUyo8u8eLr+Z8dzjLcWJQ5Dwjx0mtEC6fRXOkF5ltr1XxigNCj7tmosTy6q6Ta2v4TIUr2o8za3QIVsHSP9/Ta4P7SOaR1ajz7DkJ5zYfTg2lnm/e9yOqHZywYrc8bKBdiegL5NXp/9QT2wlok7SGp5Rj9c+mS7NSyh6Gw7/9xOOjrQviBr0Il9fsUDzPQYwwD09g2Xu8PSX7QMN1Qd902YwfytlR4RGggnwrbRTr/gPvQrMAykc0HpPuFhfF+kogp3k6pHM2VsuoFv6VAS240J8BhyjMATt9TCPpfkaTtNmLaLxKe/iSFDlYou3AF8rPmTEHd39eCm9RLeOHu4N5YZ5PzNH3jMladqY7jPYi4WsHlnXzaNb6uh/ZOt6FAFUAFcYsKs5jro0r8iN07D5ItRt2mFeS1TQQDqDuMvoysnE00IYXR5HpsAGPrpx6X7fOXGPplo7NxjaPEeNu34Cy/am7+4KS6mKsq4U9ioAufi95tfTU+0BO23M3O1gtdjFXxMJ1LOmGME6adqN1jDNjhSpjsqBr9tZ6LH/zvpchSG8wjiHEsD+fFxIPkxu7WzAzrFNnWwCr2VfLHM/zp3QAT4LkvlpboCzHJDS2TiakgKQ3DYFAG7f+YeNOeSP4Fs1lfL87KfNJjzAFyOVOr+W/pR/7PN0uUlfkW rD+S1Lq6 krMwRjPotv2dErNafE3pfe48PsIso0JWln5Xd3hlum57O1MrtxcNZLxFo3CvRQroUJJ66NK3R6odQvSVZTRbS7hnI+NkmnTvXRzHIN9oRt5b3RuuYNzKBjsUO4fHWZWc9WpWce7zBo/tys4qshg7rNevsTRu7Rkdn5suCDxXnzt/lH234I7lXWgFMmXOFlwTrLJzdmENvUKe5X+0dDSHbLIHRS/BP3Ux8e1wmbyqCZP1oZ4m4CwmNlpzu9DWVGirSdo7me5dDjtRR84fp/zYxW6IOcoxro5za2+VNuk2F11Rj2cewwNdD/0JW1gCA3PGWemyPVvC7UxpYgxX5oGiMUPXty/iz1IYCndffAW9G15EfpHqEGPS5Qd6u6PPTNM275CVrFTgtDJlzDh/hhT+QDsIRLcHRt3MjXme7FjeHZb8OtBfBxa1/dK+MZR+AcVBRNAInSeK1r4jq0JkfrDJF4TH37CIUT62cDmMV3FPGTUsI9G6OlDQSUedvCA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 19/02/25 10:05, Joel Fernandes wrote: > On Fri, Jan 17, 2025 at 05:53:33PM +0100, Valentin Schneider wrote: >> On 17/01/25 16:52, Jann Horn wrote: >> > On Fri, Jan 17, 2025 at 4:25=E2=80=AFPM Valentin Schneider wrote: >> >> On 14/01/25 19:16, Jann Horn wrote: >> >> > On Tue, Jan 14, 2025 at 6:51=E2=80=AFPM Valentin Schneider wrote: >> >> >> vunmap()'s issued from housekeeping CPUs are a relatively common s= ource of >> >> >> interference for isolated NOHZ_FULL CPUs, as they are hit by the >> >> >> flush_tlb_kernel_range() IPIs. >> >> >> >> >> >> Given that CPUs executing in userspace do not access data in the v= malloc >> >> >> range, these IPIs could be deferred until their next kernel entry. >> >> >> >> >> >> Deferral vs early entry danger zone >> >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> >> >> >> >> This requires a guarantee that nothing in the vmalloc range can be= vunmap'd >> >> >> and then accessed in early entry code. >> >> > >> >> > In other words, it needs a guarantee that no vmalloc allocations th= at >> >> > have been created in the vmalloc region while the CPU was idle can >> >> > then be accessed during early entry, right? >> >> >> >> I'm not sure if that would be a problem (not an mm expert, please do >> >> correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't >> >> deferred anyway. >> > >> > flush_cache_vmap() is about stuff like flushing data caches on >> > architectures with virtually indexed caches; that doesn't do TLB >> > maintenance. When you look for its definition on x86 or arm64, you'll >> > see that they use the generic implementation which is simply an empty >> > inline function. >> > >> >> So after vmapping something, I wouldn't expect isolated CPUs to have >> >> invalid TLB entries for the newly vmapped page. >> >> >> >> However, upon vunmap'ing something, the TLB flush is deferred, and th= us >> >> stale TLB entries can and will remain on isolated CPUs, up until they >> >> execute the deferred flush themselves (IOW for the entire duration of= the >> >> "danger zone"). >> >> >> >> Does that make sense? >> > >> > The design idea wrt TLB flushes in the vmap code is that you don't do >> > TLB flushes when you unmap stuff or when you map stuff, because doing >> > TLB flushes across the entire system on every vmap/vunmap would be a >> > bit costly; instead you just do batched TLB flushes in between, in >> > __purge_vmap_area_lazy(). >> > >> > In other words, the basic idea is that you can keep calling vmap() and >> > vunmap() a bunch of times without ever doing TLB flushes until you run >> > out of virtual memory in the vmap region; then you do one big TLB >> > flush, and afterwards you can reuse the free virtual address space for >> > new allocations again. >> > >> > So if you "defer" that batched TLB flush for CPUs that are not >> > currently running in the kernel, I think the consequence is that those >> > CPUs may end up with incoherent TLB state after a reallocation of the >> > virtual address space. >> > >> >> Ah, gotcha, thank you for laying this out! In which case yes, any vmallo= c >> that occurred while an isolated CPU was NOHZ-FULL can be an issue if sai= d >> CPU accesses it during early entry; > > So the issue is: > > CPU1: unmappes vmalloc page X which was previously mapped to physical pag= e > P1. > > CPU2: does a whole bunch of vmalloc and vfree eventually crossing some la= zy > threshold and sending out IPIs. It then goes ahead and does an allocation > that maps the same virtual page X to physical page P2. > > CPU3 is isolated and executes some early entry code before receving said = IPIs > which are supposedly deferred by Valentin's patches. > > It does not receive the IPI becuase it is deferred, thus access by early > entry code to page X on this CPU results in a UAF access to P1. > > Is that the issue? > Pretty much so yeah. That is, *if* there such a vmalloc'd address access in early entry code - testing says it's not the case, but I haven't found a way to instrumentally verify this.