From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0979C02182 for ; Mon, 20 Jan 2025 11:15:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 292D56B0088; Mon, 20 Jan 2025 06:15:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 21B946B0089; Mon, 20 Jan 2025 06:15:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01FB06B008A; Mon, 20 Jan 2025 06:15:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D51B36B0088 for ; Mon, 20 Jan 2025 06:15:33 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 89E11122AD0 for ; Mon, 20 Jan 2025 11:15:33 +0000 (UTC) X-FDA: 83027574546.10.A05A17D Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf16.hostedemail.com (Postfix) with ESMTP id 6E3EA180012 for ; Mon, 20 Jan 2025 11:15:31 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Oh0Rodsv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737371731; a=rsa-sha256; cv=none; b=fZcVTsAAF22fu4ZrEBWxH862T4Uv5AHABmoTek3IRZT1dhCjA8eaBLyrOEF26Z/0SXuWET sRM3FthgdiPbia/0dYJyCD58+MRtZ4jf9SD9oRLa+J4NmwZQ0wMsCEKFLFRUK0LaNndouX XPUjRAtMFVqop3gV70fGeQn/r1cWuCQ= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Oh0Rodsv; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of urezki@gmail.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=urezki@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737371731; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2GVkq9tRDCkhxRDT/CBStGfoQ0If4+e8Z0FyGDV8QRs=; b=qLTufxHZi3To+qSpWKzNP6IGuuFNKiWnB3XA0Ks1xpHueBvhNV/S+BKby1y+BANtDWzu+i 1SPEqkvfm93j+ZFz7zDH7gnrpGHDXHjXmQF+ZTA00wmIkO9qr86E6x0PRFQlH5IU12QFAf bXxfWZmjALk9Dto7syFIOuylW3GCOSc= Received: by mail-lf1-f48.google.com with SMTP id 2adb3069b0e04-53e384e3481so4319913e87.2 for ; Mon, 20 Jan 2025 03:15:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737371729; x=1737976529; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:date:from:from:to :cc:subject:date:message-id:reply-to; bh=2GVkq9tRDCkhxRDT/CBStGfoQ0If4+e8Z0FyGDV8QRs=; b=Oh0Rodsvk8G3ma8yjn92pIYgt5gMQlZ/11mVW//owqf9+8MSbGk+CkWEIyWEpUqbXD ZKhNZss/8p+CYqU8mKLbh0K3h/BkJo5G5lN85EvTxgUQaZqTTBAaycOXrDNJ1qa+4PPG wBSSc+d3Ua+HVGDuhVrX6cBC3+qMVUf8bVWMYPau9GcDAX5dR7/o5+f+lvxRGiHYjGrE TNALuqp44Mhmpy4/D7Fk9w9D16l7sxUw97r0vdEJJU9JBm8Ws1oh3DL4xCOpLFTEnR9b 9XzbZiHg8EK+2SXpO+YZTELVlHefcLUqMbdBXBF/nPuifko8E6x78uPM4GIhM36G5KhX EhCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737371729; x=1737976529; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:date:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2GVkq9tRDCkhxRDT/CBStGfoQ0If4+e8Z0FyGDV8QRs=; b=Bqv1Sbmztk6+GlD6X97E/OIb5Smt8JFBbcsJpzI7za3uGk2rIXcjrxo5yjJEu4691Y qnOi+0qzE6VTWd644jw+T3dPMx27EKEg3CJZ1TCpAlKFGWBw8gKHLZ2AmHCQynbm+OW1 xzNvZRXy+uFMQuZruOMnznJpz0FSMBgoj/fJjDugjOJkNQ57Jg05FNlWvaMNl5UHp/fe y6x7OyZjE6yNeyTDB71dIuAcpUF/8mZHsaGJIvB0pxBrlEEcp/yVkUWnpRj3SOuCe1fZ k2KvNWsYR0XcfybhlkNvr0LCzhH63OKeCzgTt9ZbbRHIGwHIOp26AphSXL75SfSiPju2 f/1A== X-Forwarded-Encrypted: i=1; AJvYcCUu3fzl/wLROnKyLXpVhTG/3CTXDdTqedRTIfRtdAYT9LS7jZaxkOgVKYLQafhrtrdhsnhMClnG9A==@kvack.org X-Gm-Message-State: AOJu0YzPLkaAAiNnllbLvqvxEyVPKpD7VVVWW2JJjZaG8vl3KK7waHVF x+XkSkkI/j7BtbXRcyCkl9dd5+mq9Rb2QmPJyEo6ksFj8DY8lXV6 X-Gm-Gg: ASbGncv6ez7Kr0r9hn5a3507YYtgpv+QDQwScK17FeQfBovYbVFaXXwxhS1s9lAXXA9 YdmEIhSmv5DaCdMh0Bl8eKxhQSIuI5Sj0fNrSw3vxUuodTm2u02862lo9OyfAe8eJ1Ph5El6eoY wc6GCSfUhuyJWGTJQus0LjOkqSUDo0IWfJzLqOBbayfw3Gbv//QjbNe25Z283GhlMo7ab9KhDUf oUU5YiEVcdbHfNlCqcEfnHhMijq9/GUNb03VJn0vpoZpBNuoFQOY5KFjCctHRd8A5NTofxcVhBj Y0xC820KgE3h1k5TF4MiQsOC X-Google-Smtp-Source: AGHT+IHKV/b4OpTja4V+3bSeLRqiI1pV4ugkz/2qcipp1USPLcCJuF4RKlBH/zOnSXkuGgwcoGAG8A== X-Received: by 2002:ac2:4155:0:b0:542:8cf5:a3a3 with SMTP id 2adb3069b0e04-5439c216c23mr3426183e87.5.1737371729069; Mon, 20 Jan 2025 03:15:29 -0800 (PST) Received: from pc636 (host-217-213-93-172.mobileonline.telia.com. [217.213.93.172]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5439af60936sm1298036e87.107.2025.01.20.03.15.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jan 2025 03:15:27 -0800 (PST) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Mon, 20 Jan 2025 12:15:20 +0100 To: Valentin Schneider Cc: Uladzislau Rezki , Jann Horn , linux-kernel@vger.kernel.org, x86@kernel.org, virtualization@lists.linux.dev, linux-arm-kernel@lists.infradead.org, loongarch@lists.linux.dev, linux-riscv@lists.infradead.org, linux-perf-users@vger.kernel.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org, linux-arch@vger.kernel.org, rcu@vger.kernel.org, linux-hardening@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, bpf@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com, Juergen Gross , Ajay Kaher , Alexey Makhalov , Russell King , Catalin Marinas , Will Deacon , Huacai Chen , WANG Xuerui , Paul Walmsley , Palmer Dabbelt , Albert Ou , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Peter Zijlstra , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , "Liang, Kan" , Boris Ostrovsky , Josh Poimboeuf , Pawan Gupta , Sean Christopherson , Paolo Bonzini , Andy Lutomirski , Arnd Bergmann , Frederic Weisbecker , "Paul E. McKenney" , Jason Baron , Steven Rostedt , Ard Biesheuvel , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Juri Lelli , Clark Williams , Yair Podemsky , Tomas Glozar , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Kees Cook , Andrew Morton , Christoph Hellwig , Shuah Khan , Sami Tolvanen , Miguel Ojeda , Alice Ryhl , "Mike Rapoport (Microsoft)" , Samuel Holland , Rong Xu , Nicolas Saenz Julienne , Geert Uytterhoeven , Yosry Ahmed , "Kirill A. Shutemov" , "Masami Hiramatsu (Google)" , Jinghao Jia , Luis Chamberlain , Randy Dunlap , Tiezhu Yang Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Message-ID: References: <20250114175143.81438-1-vschneid@redhat.com> <20250114175143.81438-30-vschneid@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 6E3EA180012 X-Stat-Signature: y549a1mo8gyr3jgf4yhou4gxk6ak7edc X-HE-Tag: 1737371731-507111 X-HE-Meta: U2FsdGVkX19WBGRb095PHMl4nw+BxsQdqpCD1O1HS0NiZck51BXb3rytQxgJKr+Ec/Tfzy7oyuuvu7MB9mdlaNGr9cemyNQ4zdjyIbPUw6GdcUalglXC5e2YPNybF8nCKfT8n7nXk4ccgyz5zoRKiwz94He6EbHBmFr/vyBg216oKtByDlcwabRufLFo2xGPrKvE98wuvQfXtzWobqgyc6hDRxBSAIETkVja09c45Tqxknan1g3XBjuHrfMFuX/u1PNLXhL6s0NiUqBX1cqW0bVI8WqAPBtWS9tNxZLLIg5FELLQGdYgnhPALx4LCTgNcgyYAQc1FqcmbDrCp72CvpIo8bvzAo/jhMfO2T0dXsCfdjSHbyyaYAzDyCYpwnW3rOLJ5pjV2SzIwRNqWNdcAWj9lYvdu9aLPt0QGHtmJdmwalCHtUHuNEdH/G/9FpoB90/r/p/nm/7ZaeVrFElIhCu7P1mgacUa0HhYe3Kvn5OTEqhsGD3cRKO5hkyDlAPNAvQZmBspUNK1IE87i/WhdQ+ixrukPEQ9el2juTr+LL8b1BmxOh3L3tGm+XRqybd4Mz7icaMe5di1924Dttn36aouwyu/LBFWD0j8UaQnk7UVeXhFK0kyNnMT1hGXkV+35PmStt62aovidLcIlnks/cIhBZoo9lx9R422yNLW7mYE5NWqvEy4rSCwZODHd3T8A894EGARsnCaAaiWxnjDYXb9/1+P1wr4yNLcKg4XccuvLUEwmk56wnwbSxsodzai6wbFyarbNhL23EBjESBiZ5wvR8LhQCeOAvu7bBrN/YWvbOzc71SomdRjWI1pAU2w0N7YVFM4W0A1VLb6dfis5NX3/fFUxrjG2zlAyxMTf3sLyGC0nf9Gu7MyzIg1xtRCENw4HjhjCypkWyzo9l1DLruf+IZUaJ8EnQ4C4vj1WWQJqOpkpOBRKqx4OnzNvNa4q/OOdr6qrGR03pDDHfB N/NiLx50 wIAlrK8Mw25ZG6c5fqjrBDJbcBtfl2G4BRYcSfN4ez2H6f11XsTiG/04f85rtPt6+qyY91jYAcQnKT5rr+BT2HSiKFnwMJy/kHXNmJHs1q3dZJ4Vi42L/2skeQQi8K4T5PAafTxGH9iH6VnZbtlsUMTUbNAMC1wK+yAB7BKbs/mdoXmiVNKK9XhEXLD6ddFDayF+6xWC5Zq/OZeZ0VhDlGtTOJc2C8wYTj/nTtBCYSMH7/cJCmCPG2DzYaB7yeFQciesgRPejlR8EmRgdy1rAod0/em0I/9putffup3Y5rTMaChJ0G4Et4ixpI1M1jpx8zy2AvFiFzFz3BqGUcxzf0gJcyNTdND4GQ5o4+WWPYDSLRncUayiNVjLiUYdd8da8rqvKnpkjV7CpLg/80CQiMK6zIPZ+dDCjcoaU3AoXsEj4eG/5RA5/flQJac3s8E8NN4eLZq5LxPe5KImDrRNsNmvyYuatoSjXPxM8gga1l3yd8UMVj9awSrSwQw1/l3RsTPoqA8LcKMJgZIHZWp+iJHPliA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 17, 2025 at 06:00:30PM +0100, Valentin Schneider wrote: > On 17/01/25 17:11, Uladzislau Rezki wrote: > > On Fri, Jan 17, 2025 at 04:25:45PM +0100, Valentin Schneider wrote: > >> On 14/01/25 19:16, Jann Horn wrote: > >> > On Tue, Jan 14, 2025 at 6:51 PM Valentin Schneider wrote: > >> >> vunmap()'s issued from housekeeping CPUs are a relatively common source of > >> >> interference for isolated NOHZ_FULL CPUs, as they are hit by the > >> >> flush_tlb_kernel_range() IPIs. > >> >> > >> >> Given that CPUs executing in userspace do not access data in the vmalloc > >> >> range, these IPIs could be deferred until their next kernel entry. > >> >> > >> >> Deferral vs early entry danger zone > >> >> =================================== > >> >> > >> >> This requires a guarantee that nothing in the vmalloc range can be vunmap'd > >> >> and then accessed in early entry code. > >> > > >> > In other words, it needs a guarantee that no vmalloc allocations that > >> > have been created in the vmalloc region while the CPU was idle can > >> > then be accessed during early entry, right? > >> > >> I'm not sure if that would be a problem (not an mm expert, please do > >> correct me) - looking at vmap_pages_range(), flush_cache_vmap() isn't > >> deferred anyway. > >> > >> So after vmapping something, I wouldn't expect isolated CPUs to have > >> invalid TLB entries for the newly vmapped page. > >> > >> However, upon vunmap'ing something, the TLB flush is deferred, and thus > >> stale TLB entries can and will remain on isolated CPUs, up until they > >> execute the deferred flush themselves (IOW for the entire duration of the > >> "danger zone"). > >> > >> Does that make sense? > >> > > Probably i am missing something and need to have a look at your patches, > > but how do you guarantee that no-one map same are that you defer for TLB > > flushing? > > > > That's the cool part: I don't :') > Indeed, sounds unsafe :) Then we just do not need to free areas. > For deferring instruction patching IPIs, I (well Josh really) managed to > get instrumentation to back me up and catch any problematic area. > > I looked into getting something similar for vmalloc region access in > .noinstr code, but I didn't get anywhere. I even tried using emulated > watchpoints on QEMU to watch the whole vmalloc range, but that went about > as well as you could expect. > > That left me with staring at code. AFAICT the only vmap'd thing that is > accessed during early entry is the task stack (CONFIG_VMAP_STACK), which > itself cannot be freed until the task exits - thus can't be subject to > invalidation when a task is entering kernelspace. > > If you have any tracing/instrumentation suggestions, I'm all ears (eyes?). > As noted before, we defer flushing for vmalloc. We have a lazy-threshold which can be exposed(if you need it) over sysfs for tuning. So, we can add it. -- Uladzislau Rezki