From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDC8FC52D7B for ; Thu, 8 Aug 2024 16:10:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A84C6B0111; Thu, 8 Aug 2024 12:10:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3585C6B0113; Thu, 8 Aug 2024 12:10:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F8E96B0112; Thu, 8 Aug 2024 12:10:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 04B986B0113 for ; Thu, 8 Aug 2024 12:10:20 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6D0531C4FC0 for ; Thu, 8 Aug 2024 16:10:20 +0000 (UTC) X-FDA: 82429565400.12.E3C47A7 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by imf10.hostedemail.com (Postfix) with ESMTP id 0C585C002A for ; Thu, 8 Aug 2024 16:10:17 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=cnvkaN6c; spf=pass (imf10.hostedemail.com: domain of dianders@chromium.org designates 209.85.222.181 as permitted sender) smtp.mailfrom=dianders@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723133367; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9shKGPom8+7CDuqTBeG4B35Rr+yA0WEcyomuSH9VeXc=; b=tu74ZZ2eF+aOYkuhV6G1hJ82REsV8aAK3ovYryN+gxprBUmVVg8Fl31LEMxPDYXv4YvcJt BDj+3iqnYvkIu6Hi9djmfVzRhSgcZ9BVg/ppagRcKVo9uliHaqMK2ouxAxbRfc2wOPYlB0 alg0YZvV3BEm1UgO069RYMN0bn9zdl0= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=cnvkaN6c; spf=pass (imf10.hostedemail.com: domain of dianders@chromium.org designates 209.85.222.181 as permitted sender) smtp.mailfrom=dianders@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723133367; a=rsa-sha256; cv=none; b=SjWdLyiyXxMFMYq7esZkscUcRfAsHHCGylXp376Yw8Py1aAJ+dBHBY/HzsBCE7yfMF0cLK 2UHdFlqGIl3BAhw+uXpy1F3tR/yqy0XOuUFMhsIswoTipaHQtoJBDpGsynZNRKVY7Crj9+ JvH0guxh3unATXu8i/59gEIg7rsW1Dc= Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-7a1d42da3f7so66241985a.2 for ; Thu, 08 Aug 2024 09:10:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1723133414; x=1723738214; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=9shKGPom8+7CDuqTBeG4B35Rr+yA0WEcyomuSH9VeXc=; b=cnvkaN6cG7xwmNpcM7/hz1eR/lBTgX66Uyjt6f9I3gRtEvA1SlH/6Xi157Bz5wiqW4 uxRrb2NLASJ17PGejwbwkD8zlGbRWbb/cgb69cZBHaGFdiiV1mLrcwy6888kWgcv+z7v E0lTcz8EA7telHSg0d6xToyauR9ipwhiJuOo8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723133414; x=1723738214; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9shKGPom8+7CDuqTBeG4B35Rr+yA0WEcyomuSH9VeXc=; b=aYLv1E3BaEnsjHqgcthyHAZ23S8kgWdpvuLZqTk6tDKEFXxQ3HLhUMoo+e3LOgs2Eo brvwtcV7yYsQfq/Rh+LsLN70n/RRZvG6JDnUn/4TtzwoJBugxeuRMXWcSsZhVOJwpejh ilX0WKgeanhtkL2lywA6r7FNH+IEwv80T58o6oVBOlLN3mhC/lf7uDnTN5pepEySibmf npO7z2I791Tpqj7LIR1SZLdmzLGNEJ2vexpcWC8dwiIT0ekr6ytmPDYFAJl8tdoQz5Qv NtDV9/y/Zb3MKvOR8mCIrDbJ6s+ifqaFQipgGxec1+dlqadnm3vCmrJgTPXpi9dGPebM Ga5w== X-Forwarded-Encrypted: i=1; AJvYcCXGAA+5YcpUGXx51+bH5lvQm9xMAVTnvE0rChXKElK+GNG+dNreJShDya+vGTaPQNo8PnkVkqrIchJEGI9mbsDWuJs= X-Gm-Message-State: AOJu0Yy8WbWHv1G20zbhLDhudlqAjqdW6I/nC1znLnMrXCr8Z4y4PzRp 8Y1lOJyH7FGYKMGdK0JrbQSw5rW25Vf1RIzcbHXsP3XuilulUlVTaN49yV073DkrD2UZEi+04Az RqA== X-Google-Smtp-Source: AGHT+IEp4LyOvMU62/aubo6ObNdDL3mf3Q7GvBsq9X33p8kDhOnZ1epxzftR6RUKuQ4Dun4cEvUXmQ== X-Received: by 2002:a05:6214:3f8c:b0:6b0:71c0:cbaa with SMTP id 6a1803df08f44-6bd6bcedae4mr34895496d6.33.1723133413623; Thu, 08 Aug 2024 09:10:13 -0700 (PDT) Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com. [209.85.160.169]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6bb9c83a4d3sm67444226d6.96.2024.08.08.09.10.12 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 08 Aug 2024 09:10:12 -0700 (PDT) Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-45029af1408so234281cf.1 for ; Thu, 08 Aug 2024 09:10:12 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCXwFNeZiocFDH9D9X6t/ygJsBxhr0/HyovMTTrmUF3sJNQqcTswPN10mi2EvhRAPz3bk66fCTIyVimOgyLZIzimekc= X-Received: by 2002:a05:622a:1455:b0:447:d7fd:63f with SMTP id d75a77b69052e-451d384af87mr3066911cf.19.1723133411701; Thu, 08 Aug 2024 09:10:11 -0700 (PDT) MIME-Version: 1.0 References: <20240806022114.3320543-1-yuzhao@google.com> <20240806022114.3320543-3-yuzhao@google.com> In-Reply-To: <20240806022114.3320543-3-yuzhao@google.com> From: Doug Anderson Date: Thu, 8 Aug 2024 09:09:55 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 2/4] arm64: use IPIs to pause/resume remote CPUs To: Yu Zhao Cc: Catalin Marinas , Will Deacon , Andrew Morton , David Rientjes , Frank van der Linden , Mark Rutland , Muchun Song , Nanyong Sun , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: fqwak4bzyo8n8pagw4nkycopjs5ptaek X-Rspam-User: X-Rspamd-Queue-Id: 0C585C002A X-Rspamd-Server: rspam02 X-HE-Tag: 1723133417-406332 X-HE-Meta: U2FsdGVkX183PbSSVpIlUSDI2lm2/e01/wWTgw4KgJOytz5KTeIPU1EIzlAh4W/CE3GapPNKfuhHJBSPnrRR9oqZufiUVAx3RP5N3SWiJS+JW+ECXdCNBFUQTGfZIBe5LdzqzgWo8Dzn2Ia67UqSNS8Yo/OdbbzgTRhKgdnZkSYtC5iJq9e1AxpGPJf2avqA5XzQOCyEE+v5CZmvrUxMXH7/PI9dWpmJ17a0cpwV1AW6NIUTiCC3vBEMbFfdXq2CEHPcE6Q29Yrjh1SY5iymkuRRd05WyYltl9x0gjeEjd/WShxyiC7DSLOZrkzUrCinzRs3QFeeVeSRNPQq0blxdSTBBOuOCRJ8FHYNpkoQSNNCRr/Jr+qQnlkbRp/ElywfujStuCHhzAc1vbcjrrIQ25Gp1/i3caTBhWTQwjZ+IbFIqbMQ9qK9x8gT9UcCj8mAtZJuAyGO9w0GhvToxcqGW4g/RKYRF+HVhUYft3zx1O6yFJmo5UfyxfeW9w/6W99fYmI3tWc279/GOMeRQ8PhMaRUkhln8xyhIhP8ZUvMbLhpBKhvmdaJ0pj42oiXvTQj9Nnaq3OYhkgiaZ29iwZAcEFQS+NGx0+tTnDe/BAsIhAyxo0c4/bnQFxwse2bwqU+5CuH/Z//tJw3fnoZyD0pBqSxfNjwwUUn+cEFvGPc1jVQz6Z6+b3n39zjjxDYbNuavpqioGvPNjCT2LxiofIQAIV/c9hLMM1ueDzgPCK5GV4Sh621dtcm+B5D3x3Aa2+vYxG7hCFWQ7AV2TNbthQiR18/YagDoxJhN8ybkb9bmvuUrhgaqFebXpVBGdn17XpCUK6BkzjtQF6TOvtGk2UCcJROyzAwVQ3rqrIXsUBtWDojtkr2jJHxJ24W3S5HHf7sQIbm+oGCr+CbvDizfTa6YX2DccvYR8hloY3IntiFf42RNYNz9WKXV/VECrFsfGeX1Bu4QYrfTyK8MttvQJj PLxfKdyU sTS0w5y94HYEd/uWUUONFZ4edgjoP9wLVd1tUgJFvxDk8N+ZheAdGlfMXElLK8LdLlZFnSQ/aBTgr/j0Mhpc+TxKO459GVDYMZV8nGDMsaiLzUK52a4+Pa48v3Xv/mHa5F3xOzpya+SA8S5ViH2jPTanfv3snFItlpUmJ6lQmdRUWCw/F2JrlRNUnTN4NpsIYkk6sgsif98Wp7MvcqCtZpHBKGeZ8xqzfmJnwx1H1J1ud5B+UinSuhRBrAGCz8iiMsWBizYlkl5hXL7VUwWds2ct2KfyVYWNZQEYDO+TP2MSCaQZNNY6Vc+Jp36LzYUcfaAmksMVjLoBXoyu0W3g1UEVMQKWNJ1tP6nrn91hA3WDUvwQG//wbYByOSQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, On Mon, Aug 5, 2024 at 7:21=E2=80=AFPM Yu Zhao wrote: > > Use pseudo-NMI IPIs to pause remote CPUs for a short period of time, > and then reliably resume them when the local CPU exits critical > sections that preclude the execution of remote CPUs. > > A typical example of such critical sections is BBM on kernel PTEs. > HugeTLB Vmemmap Optimization (HVO) on arm64 was disabled by commit > 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP"= ) > due to the folllowing reason: > > This is deemed UNPREDICTABLE by the Arm architecture without a > break-before-make sequence (make the PTE invalid, TLBI, write the > new valid PTE). However, such sequence is not possible since the > vmemmap may be concurrently accessed by the kernel. > > Supporting BBM on kernel PTEs is one of the approaches that can > potentially make arm64 support HVO. > > Signed-off-by: Yu Zhao > --- > arch/arm64/include/asm/smp.h | 3 + > arch/arm64/kernel/smp.c | 110 +++++++++++++++++++++++++++++++++++ > 2 files changed, 113 insertions(+) I'm a bit curious how your approach is reliable / performant in all cases. As far as I understand it: 1. Patch #4 in your series unconditionally turns on "ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP" for arm64. 2. In order for it to work reliably, you need the "pause all CPUs" functionality introduced in this patch. 3. In order for the "pause all CPUs" functionality to be performant you need NMI or, at least, pseudo-NMI to be used to pause all CPUs. 4. Even when you configure the kernel for pseudo-NMI it's not 100% guaranteed that pseudo-NMI will be turned on. Specifically: 4a) There's an extra kernel command line parameter you need to actually enable pseudo-NMI. We can debate about the inability to turn on pseudo-NMI without the command line parameter, but at the moment it's there because pseudo-NMI has some performance implications. Apparently these performance implications are more non-trivial on some early arm64 CPUs. 4b) Even if we changed it so that the command-line parameter wasn't needed, there are still some boards out there that are known not to be able to enable pseudo-NMI. There are certainly some Mediatek Chromebooks that have a BIOS bug making pseudo-NMI unreliable. See the `mediatek,broken-save-restore-fw` device tree property. ...and even if you ignore the Mediatek Chromebooks, there's at least one more system I know of that's broken with pseudo-NMI. Since you're at Google, you could look at b/308278090 for details but the quick summary is that some devices running a TEE are hanging when pseudo NMI is enabled. ...and, even if that's fixed, it feels somewhat likely that there are other systems where pseudo-NMI won't be usable. Unless I'm misunderstanding, it feels like anything you have that relies on NMI/pseudo-NMI needs to fall back safely/reliably if NMI/pseudo-NMI isn't there. > diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h > index 2510eec026f7..cffb0cfed961 100644 > --- a/arch/arm64/include/asm/smp.h > +++ b/arch/arm64/include/asm/smp.h > @@ -133,6 +133,9 @@ bool cpus_are_stuck_in_kernel(void); > extern void crash_smp_send_stop(void); > extern bool smp_crash_stop_failed(void); > > +void pause_remote_cpus(void); > +void resume_remote_cpus(void); > + > #endif /* ifndef __ASSEMBLY__ */ > > #endif /* ifndef __ASM_SMP_H */ > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 5e18fbcee9a2..aa80266e5c9d 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -68,16 +68,25 @@ enum ipi_msg_type { > IPI_RESCHEDULE, > IPI_CALL_FUNC, > IPI_CPU_STOP, > + IPI_CPU_PAUSE, > +#ifdef CONFIG_KEXEC_CORE > IPI_CPU_CRASH_STOP, > +#endif > +#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST > IPI_TIMER, > +#endif > +#ifdef CONFIG_IRQ_WORK > IPI_IRQ_WORK, > +#endif I assume all these "ifdefs" are there because this adds up to more than 8 IPIs. That means that someone wouldn't be able to enable all of these things, right? Feels like we'd want to solve this before landing things. In the least it would be good if this built upon: https://lore.kernel.org/r/20240625160718.v2.1.Id4817adef610302554b8aa42b090= d57270dc119c@changeid/ ...and then maybe we could figure out if there are other ways to consolidate NMIs. Previously, for instance, we had the "KGDB" and "backtrace" IPIs combined into one but we split them upon review feedback. If necessary they would probably be easy to re-combine.