From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B5FED2CDFB for ; Tue, 22 Oct 2024 16:15:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D11D96B0085; Tue, 22 Oct 2024 12:15:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC2376B0089; Tue, 22 Oct 2024 12:15:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B89476B0095; Tue, 22 Oct 2024 12:15:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9B2276B0085 for ; Tue, 22 Oct 2024 12:15:10 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D7994140291 for ; Tue, 22 Oct 2024 16:14:52 +0000 (UTC) X-FDA: 82701737160.15.6D4F203 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf06.hostedemail.com (Postfix) with ESMTP id 86EEB180016 for ; Tue, 22 Oct 2024 16:14:56 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bg6wJNCa; spf=pass (imf06.hostedemail.com: domain of maz@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=maz@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729613558; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5TuKQxX6g3d+TdYkfSDM9Ih0uMYWmYoLLZP9qTJZZTw=; b=fWxlf3LXzZGFBn1LOwjlWgA/+r/cOvot75VS/5UZEX/leSWYgkhNVq/vq370Bs9wC7HaDq y6AK5BRwW5Sf0TSARc4M5bsNpQBlqzn4plh68daewE+2jrErDTdA/fYRUYtzSRECjoQDrB t92GDT/cxhmMvVbg1cqw8HnSRnWLE64= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729613558; a=rsa-sha256; cv=none; b=r/ozlLnAC9jIXJYFbZOJcQx3mVi6rPwkMNGKJqgRVr5Z75xcqIyJtsOoOrI49rrWKpzZx0 VGV7LX52FTUKD2f79gqyX+QT4i/sEFgOGBxtqYQ+YWvQ+n+gqGtefTq97jJW52ifKnie5m R8mpk5jAI2SzyjK2RYvLsKfDtoIeDOM= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=bg6wJNCa; spf=pass (imf06.hostedemail.com: domain of maz@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=maz@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 08458A41B28; Tue, 22 Oct 2024 16:14:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD93EC4CEC3; Tue, 22 Oct 2024 16:15:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1729613706; bh=5C93Ix0HLmGpLBxBT0CiwU4XpptXz4IaLPa8ArGBDlE=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=bg6wJNCavjpzrle+OLfs6VjMWtM8KMmz/Z3F8oBUHYHWgZROvaTvyS+ZXyqKW794v NomG7jFH9vbKBam/stDAynZ1z77Z7AOFlMpC0gNjrTTprIyKFGKPcLzHe0/0bybNIN QMG8a5xlMoqNe7GajScxquvTSl/EJQyVfVsSoRi2bRQM4LMMM6g3c9j59gaJ0elJFY mw6QBHKu2f+l7BvOBX4oQPdumU+FxqaBNQQdVfvWlz3ChfYZMWvdvXxjnhE5Vn8q7N vxpOtffuW0xUGKZpNGOYdw6M799Sw8oFhMit7zCW479ZvPlg9TzMCNPJfPULLmfe+n Evnyn8pM2adMA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1t3HXE-005py4-7s; Tue, 22 Oct 2024 17:15:04 +0100 Date: Tue, 22 Oct 2024 17:15:03 +0100 Message-ID: <868qug3yig.wl-maz@kernel.org> From: Marc Zyngier To: Yu Zhao Cc: Andrew Morton , Catalin Marinas , Muchun Song , Thomas Gleixner , Will Deacon , Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v1 4/6] arm64: broadcast IPIs to pause remote CPUs In-Reply-To: <20241021042218.746659-5-yuzhao@google.com> References: <20241021042218.746659-1-yuzhao@google.com> <20241021042218.746659-5-yuzhao@google.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.4 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: yuzhao@google.com, akpm@linux-foundation.org, catalin.marinas@arm.com, muchun.song@linux.dev, tglx@linutronix.de, will@kernel.org, dianders@chromium.org, mark.rutland@arm.com, sunnanyong@huawei.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Rspamd-Queue-Id: 86EEB180016 X-Stat-Signature: 948es9x4id7tw6bs54b4p5xqsj1jtitf X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1729613696-528898 X-HE-Meta: U2FsdGVkX19730l3Ey+CpxIKoCW+3clfu6+uGOq7A1j2vAiLGO4zTi8/fqZTBWti2AJqLh5f2WlAUfSeo9ckr+nIBRkauZffyUKXUdanZF5iwMOFHOOVsJOLxf8M+8EjOfL1Fp3N3uE8pTF6lKzzSsn5TRlOVmMmdAByegPE53xayo818N6s5BQ2NZKd1LIax1hg6C6e7uhsJCBFm7pEz1RLWh0cpERE+wwzULchRo9wZld1jaGCWrcLotWakbdTZE9/x/tFPsxMkHrFbo0K+ND+j75Id+1uiHmRGpCQjxkpAW7L3EGfQDclG37QXdCACJXhGjRIEKmZz/qulbC5WwnF61qgeCwVADnfBl9POSrfiaUP0rtiGvIuV56dmxVL9Jtijdo8TbTPYln8agmtNn37mmYJxXq6fveB8gatV+9W0+40bqNbTDgTWUeBSO+0BbyRyr5t7pXvUNz3LhSFSEhsnx4pnD4iebUh/v+yjBROS0IkCeIuOkNNWrbBkaCWW4/N84xnEBOQsuE48ZrAFQ2qSe4DbDxLvvZxwz4lAFlWPloUVEuT4lXCxIMs87m7jn6biyGQzzHt38H96VoCNeknrXDbD+AhucrcRoUBFot8xslVi9hhTHzID+D5La421g7ELbPu6QhRFnQHrqCd9kPXR1MSdBlVCSGLdYPkz+eK5YtMq6yNfudLlhCT3lwKKqaTzsjGD1aZ18prV36Emxm50Ss83pcQGSCNbht1xfEKwlq02HGArJDmOdkzCxCALfON2s/cRXPEc11626/EjyUHTHFWBGn7uhqE1F6sJPcMpOCr0Izcrh8xuVExJq5aqLhyy+vZkvrY6tebkn98FiBFvBeYKk4FVXyy4+BYETnc61X54IhRGnTkvm1AwxSKIDTXqKj1ssKhgOdFf4sD5nsmZKMdGvi9VbojpjUqh5Rj+7ce0Zo9RSrNEEqOMM2c7B+XEf1evidgrPjay7v TUrcNdfE IrK2evZ23vXuGQn1B+U993TVrK/CGDSO/xwP64DxPtJz4Q+aRQEEjELq2DftjoJoGpxJ2TnkXbxHfiqJEtwfjKvLFguODCNE4ac4CtWgVcC4eJ8F1Sh3K9eIhit/pB/+ymEWj7W/Lm/Mr4SZd+yAjXvP282LE16NNO4io1Hbv1yGFJtAH56rS3Ga4ajsPLx2IymGboysT8vlEgphDOaNtHJ95RmPX/IHf1zBmjGru1UMQB5ASRDyjQmyd2g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 21 Oct 2024 05:22:16 +0100, Yu Zhao wrote: > > Broadcast pseudo-NMI IPIs to pause remote CPUs for a short period of > time, and then reliably resume them when the local CPU exits critical > sections that preclude the execution of remote CPUs. > > A typical example of such critical sections is BBM on kernel PTEs. > HugeTLB Vmemmap Optimization (HVO) on arm64 was disabled by > commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable > HUGETLB_PAGE_OPTIMIZE_VMEMMAP") due to the folllowing reason: > > This is deemed UNPREDICTABLE by the Arm architecture without a > break-before-make sequence (make the PTE invalid, TLBI, write the > new valid PTE). However, such sequence is not possible since the > vmemmap may be concurrently accessed by the kernel. > > Supporting BBM on kernel PTEs is one of the approaches that can make > HVO theoretically safe on arm64. Is the safety only theoretical? I would have expected that we'd use an approach that is absolutely rock-solid. > > Note that it is still possible for the paused CPUs to perform > speculative translations. Such translations would cause spurious > kernel PFs, which should be properly handled by > is_spurious_el1_translation_fault(). Speculative translation faults are never reported, that'd be a CPU bug. *Spurious* translation faults can be reported if the CPU doesn't implement FEAT_ETS2, for example, and that has to do with the ordering of memory access wrt page-table walking for the purpose of translations. > > Signed-off-by: Yu Zhao > --- > arch/arm64/include/asm/smp.h | 3 ++ > arch/arm64/kernel/smp.c | 92 +++++++++++++++++++++++++++++++++--- > 2 files changed, 88 insertions(+), 7 deletions(-) > > diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h > index 2510eec026f7..cffb0cfed961 100644 > --- a/arch/arm64/include/asm/smp.h > +++ b/arch/arm64/include/asm/smp.h > @@ -133,6 +133,9 @@ bool cpus_are_stuck_in_kernel(void); > extern void crash_smp_send_stop(void); > extern bool smp_crash_stop_failed(void); > > +void pause_remote_cpus(void); > +void resume_remote_cpus(void); > + > #endif /* ifndef __ASSEMBLY__ */ > > #endif /* ifndef __ASM_SMP_H */ > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index 3b3f6b56e733..68829c6de1b1 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -85,7 +85,12 @@ static int ipi_irq_base __ro_after_init; > static int nr_ipi __ro_after_init = NR_IPI; > static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init; > > -static bool crash_stop; > +enum { > + SEND_STOP = BIT(0), > + CRASH_STOP = BIT(1), > +}; > + > +static unsigned long stop_in_progress; > > static void ipi_setup(int cpu); > > @@ -917,6 +922,79 @@ static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs > #endif > } > > +static DEFINE_SPINLOCK(cpu_pause_lock); PREEMPT_RT will turn this into a sleeping lock. Is it safe to sleep as you are dealing with kernel mappings? > +static cpumask_t paused_cpus; > +static cpumask_t resumed_cpus; > + > +static void pause_local_cpu(void) > +{ > + int cpu = smp_processor_id(); > + > + cpumask_clear_cpu(cpu, &resumed_cpus); > + /* > + * Paired with pause_remote_cpus() to confirm that this CPU not only > + * will be paused but also can be reliably resumed. > + */ > + smp_wmb(); > + cpumask_set_cpu(cpu, &paused_cpus); > + /* paused_cpus must be set before waiting on resumed_cpus. */ > + barrier(); I'm not sure what this is trying to enforce. Yes, the compiler won't reorder the set and the test. But your comment seems to indicate that also need to make sure the CPU preserves that ordering, and short of a DMB, the test below could be reordered. > + while (!cpumask_test_cpu(cpu, &resumed_cpus)) > + cpu_relax(); > + /* A typical example for sleep and wake-up functions. */ I'm not sure this is "typical",... > + smp_mb(); > + cpumask_clear_cpu(cpu, &paused_cpus); > +} > + > +void pause_remote_cpus(void) > +{ > + cpumask_t cpus_to_pause; > + > + lockdep_assert_cpus_held(); > + lockdep_assert_preemption_disabled(); > + > + cpumask_copy(&cpus_to_pause, cpu_online_mask); > + cpumask_clear_cpu(smp_processor_id(), &cpus_to_pause); This bitmap is manipulated outside of your cpu_pause_lock. What guarantees you can't have two CPUs stepping on each other here? > + > + spin_lock(&cpu_pause_lock); > + > + WARN_ON_ONCE(!cpumask_empty(&paused_cpus)); > + > + smp_cross_call(&cpus_to_pause, IPI_CPU_STOP_NMI); > + > + while (!cpumask_equal(&cpus_to_pause, &paused_cpus)) > + cpu_relax(); This can be a lot of things to compare, specially that you are explicitly mentioning large systems. Why can't this be implemented as a counter instead? Overall, this looks like stop_machine() in disguise. Why can't this use the existing infrastructure? Thanks, M. -- Without deviation from the norm, progress is not possible.