From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8561C00A94 for ; Mon, 15 Apr 2024 15:21:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FEAE6B009F; Mon, 15 Apr 2024 11:21:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 539DE6B00A0; Mon, 15 Apr 2024 11:21:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B36A6B00A1; Mon, 15 Apr 2024 11:21:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 20AD66B009F for ; Mon, 15 Apr 2024 11:21:22 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E5CB94063C for ; Mon, 15 Apr 2024 15:21:21 +0000 (UTC) X-FDA: 82012129962.01.B7140A7 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) by imf21.hostedemail.com (Postfix) with ESMTP id 56DA71C000A for ; Mon, 15 Apr 2024 15:21:19 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=YIpnWYr3; spf=pass (imf21.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713194479; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jqT0VPYbD9D0oVnT2Ukuebptjv54pkYlVzbpc73bxpQ=; b=kzv+JifbHMmPXKLc9iViT7v4yPo+VWZpDUVtyMzdMtLjtlwmpO66QjGSJYNuk3YkuFUtsa jXwOsl5hLOhPQxtFAOV5iDtm8Z49pQreqvHpuH7mfhciqqKTB0uK5LVISLCoygvRl8Q5oe 3JzFVXFR7F7Ox2AI80S/bCligg1mUfc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713194479; a=rsa-sha256; cv=none; b=oKApfwWjfAJKJGfSOC+JIVgY01aeO2QZnvMqy+90eLXH39J67EOtdBi/asi76wL8gpSsjy 4wylCbrkkaV0TzgXfpBb3wsYd6tRojUoX2srW3kN4FceGPqR/bPbAVKEvuoje1hnSB0/w5 VM+dsd7XQBcepr1wVlFy8jNITJIDnS0= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=YIpnWYr3; spf=pass (imf21.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1713194478; bh=osgSadJZvQn1wrb//TYw1z/uIXekqzai4TYartnsTww=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YIpnWYr3OBNaVcOtObbIoF+WFBDFoEQ/+FXjlJGb5vZB+HrgGpmxY8qVUM3yt7rbK pqGEs1MaSgpWZeTEHe7l9yu17qAlGcj5gxBTDhZZ4LLh/htXEt0TMDTv+04thWG6KA e/rZHM8+gRKKta041QgyUJBZ6PCTJyuRk0rI3Kscx8JaOGTMnlVyCCYLAEcsjbY+8Z PXJtu1qOkAqYXDYv+HAUIA+d+DjAeGzD+fSsD6slc5Lqnsq+fP0NRGuMC6qvhllBEx kFI0glyI2mjasoAdAkLHJ5EbBqgqvHT3QtrW/rDNII+MhcDO03FlLwq8JNwPDHaFpp K1ISv5JuAFfPg== Received: from thinkos.internal.efficios.com (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4VJ9qf0ksTzvSD; Mon, 15 Apr 2024 11:21:18 -0400 (EDT) From: Mathieu Desnoyers To: Ingo Molnar , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Steven Rostedt , Vincent Guittot , Juri Lelli , Dietmar Eggemann , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , "levi . yun" , Catalin Marinas , Mark Rutland , Will Deacon , Aaron Lu , Thomas Gleixner , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Arnd Bergmann , Andrew Morton , linux-arch@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, stable@vger.kernel.org Subject: [PATCH 1/2] sched: Add missing memory barrier in switch_mm_cid Date: Mon, 15 Apr 2024 11:21:13 -0400 Message-Id: <20240415152114.59122-2-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240415152114.59122-1-mathieu.desnoyers@efficios.com> References: <20240415152114.59122-1-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: feen9586equyrfsw68mmhcahekrdwqae X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 56DA71C000A X-Rspam-User: X-HE-Tag: 1713194479-136159 X-HE-Meta: U2FsdGVkX1+3AwA23SyXSzTRApXtEllRhBmC4qxmbfN4r4c35VTwCRi9MkFAMtBzaFsk9XHxOx4bxXoTY+RhFWcmxcD3Ueb+VS9TWpinY0njG7tGijPQGulIOO0+SqttP+9eexM4b/27NE/Pa9qsz+OytN+XwDXedJZAlk4R9QJ/YSuO2rQGBAki+J25zUGOHKH+IDl3lxiXkKdHK1AY+NnqIaav8U/QV6xsrP5CspgCOI0SbGEO5/E06YEivZdT16zAO9Y2Xfvqz5G2U5++W6WfUbyRI0miDHdFsYawrZ34nH6w8IXyJCvuujuxefnKzV+G2TPFiiZJib+y/BG6XmfEezuIX4WvG9CxVnoL6cjrLztZy1hWciFccdBaqu6loJUTvUjOTY65Qy7oWxR6PrGgTAqmgh5OG3Xgi5NPIMs9eqU4xodCCIFZN33+jmTfJNk39WtCzm5npzSeOqOZNtWlepixHo4YVLDoUEuLLm9XdY68BWpkzOGeHLnMRkquCqaRR7WARYHSpEiZq4LsqXsqgm3i1hkWIZ8X+tqt++UZQWIa/HdM5mMWjrCddhhGRwfU4NVd0JMZQWgX32VJD1Yi10BOqlTlg6FjAZPa8pdVcmlI6BlDzQX8KQ69VrHbrGLCUXBvXvgM3AXjC2Z1kdLDMjRoSt3ub8/ledvMMD2M7iEroRABPp4Os5m+5tYnpGeLCZsj3+BSQBb96mPc7C1GW9I03ukPHOOXS2CeBcZNmn2W92B2OgsRUt/7oXLTSKmxOvCXLGn7lybSypYw7bXMQooqEbxBEItl+LKixX4/yEA+nXw1Oo0CeJjVFrQRNdjW938cCWiXJX8ahWDuZ/2BltQlVOVfAHn/eRUPp34apzIj2jZzg0E5XsJ2HWzcau1LzI+MicFQ8ekqGuao9AyBQ7Shy0dkiCnXsl7pPk+7RzkD/hZZbC9oDc4kxWtE5VvaChXrxr+07R486n+ K1/3ilhC vowePhyvRvbYR7geNLwOtH/KdZB5RuKpjocRVdD5t16PdWBRo6p9SheK6LCcVQKL3Z8qibfuEaXJ0067X5CJHxpGbFOAJpdXqOFsk09/Uier8RU3PBAH9XIWgo0WnTfoYpq3k41lxKwHTGtiFXn7nyGpdUiyXkTbb7XWCxv00MJ7z+/J22kRBWMlKw5QsLejRq+JG1opqyWPwKrgLDXzOMiI+ZwSpMe11si25MrusJHqsHvDCaj419vrfEMWOh/Qgyc3ELnN/rsLh0zv+zZNW3rc7Nt99b1BpCqd6Ul/kxMces6RaSEGS0p/l2tsvp2EcxL0P5kGPR35Vjgns26RjkYjwZiazqRIcwplUhEbzf3fBWHRV7TbqWj5RrMjt+p7VjcBZgxXtrs1UUEmGX6OFSVCJ9Gvi0jqZj/cJQv25fwXBG54HEBDbAnUabU4ImQp8GVc4u32L/HCOf6VGdWvzA8l9rjvwIAprPIDTgoSVDUZKmVwucBuM8x1kSbBA+RjiOGuHvKx0r9wo248RE4q6lqZ+II2vb68V1CovYS6eBDKwrS/rjIipANofam/X1jeIWOQnZ/YtJGCcxEc62Bj7RRAojPDLuUQ74hSvt1Pw29Dxt+IgJ1YtQ0RgCd+LWoIX+akQSaY0luuHZnbjyTuQ8VjdrOFPoog9Sd+MNJUpUVTj6vYN9EPTXzdLLGegQdodmiGsWekHEKfHJ4MQwIxJiMp9nr1oMU0aIbtxqLS+AhAKFhangbZypYkIzGQpaI/Lm5GKsGw+mx9NC7sRj/z6hTrDljEEiFe3HoyRXdsDLkaRKPbvI/NCOUBwJkgU6MESo375G+wasPIPKcOKitPt7xW/rQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb() which the core scheduler code has depended upon since commit: commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid") If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can unset the actively used cid when it fails to observe active task after it sets lazy_put. There *is* a memory barrier between storing to rq->curr and _return to userspace_ (as required by membarrier), but the rseq mm_cid has stricter requirements: the barrier needs to be issued between store to rq->curr and switch_mm_cid(), which happens earlier than: - spin_unlock(), - switch_to(). So it's fine when the architecture switch_mm() happens to have that barrier already, but less so when the architecture only provides the full barrier in switch_to() or spin_unlock(). It is a bug in the rseq switch_mm_cid() implementation. All architectures that don't have memory barriers in switch_mm(), but rather have the full barrier either in finish_lock_switch() or switch_to() have them too late for the needs of switch_mm_cid(). Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the generic barrier.h header, and use it in switch_mm_cid() for scheduler transitions where switch_mm() is expected to provide a memory barrier. Architectures can override smp_mb__after_switch_mm() if their switch_mm() implementation provides an implicit memory barrier. Override it with a no-op on x86 which implicitly provide this memory barrier by writing to CR3. Link: https://lore.kernel.org/lkml/20240305145335.2696125-1-yeoreum.yun@arm.com/ Reported-by: levi.yun Signed-off-by: Mathieu Desnoyers Reviewed-by: Catalin Marinas # for arm64 Acked-by: Dave Hansen # for x86 Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Cc: # 6.4.x Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Vincent Guittot Cc: Juri Lelli Cc: Dietmar Eggemann Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Valentin Schneider Cc: levi.yun Cc: Mathieu Desnoyers Cc: Catalin Marinas Cc: Mark Rutland Cc: Will Deacon Cc: Aaron Lu Cc: Thomas Gleixner Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Arnd Bergmann Cc: Andrew Morton Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Cc: x86@kernel.org --- arch/x86/include/asm/barrier.h | 3 +++ include/asm-generic/barrier.h | 8 ++++++++ kernel/sched/sched.h | 20 ++++++++++++++------ 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index fe1e7e3cc844..63bdc6b85219 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -79,6 +79,9 @@ do { \ #define __smp_mb__before_atomic() do { } while (0) #define __smp_mb__after_atomic() do { } while (0) +/* Writing to CR3 provides a full memory barrier in switch_mm(). */ +#define smp_mb__after_switch_mm() do { } while (0) + #include #endif /* _ASM_X86_BARRIER_H */ diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h index 0c0695763bea..dc32b96140c1 100644 --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -294,5 +294,13 @@ do { \ #define io_stop_wc() do { } while (0) #endif +/* + * Architectures that guarantee an implicit smp_mb() in switch_mm() + * can override smp_mb__after_switch_mm. + */ +#ifndef smp_mb__after_switch_mm +#define smp_mb__after_switch_mm() smp_mb() +#endif + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_GENERIC_BARRIER_H */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d2242679239e..d2895d264196 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -79,6 +79,8 @@ # include #endif +#include + #include "cpupri.h" #include "cpudeadline.h" @@ -3445,13 +3447,19 @@ static inline void switch_mm_cid(struct rq *rq, * between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu]. * Provide it here. */ - if (!prev->mm) // from kernel + if (!prev->mm) { // from kernel smp_mb(); - /* - * user -> user transition guarantees a memory barrier through - * switch_mm() when current->mm changes. If current->mm is - * unchanged, no barrier is needed. - */ + } else { // from user + /* + * user -> user transition relies on an implicit + * memory barrier in switch_mm() when + * current->mm changes. If the architecture + * switch_mm() does not have an implicit memory + * barrier, it is emitted here. If current->mm + * is unchanged, no barrier is needed. + */ + smp_mb__after_switch_mm(); + } } if (prev->mm_cid_active) { mm_cid_snapshot_time(rq, prev->mm); -- 2.39.2