From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5F87C4345F for ; Fri, 12 Apr 2024 10:23:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 652706B008A; Fri, 12 Apr 2024 06:23:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 602266B008C; Fri, 12 Apr 2024 06:23:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C9D46B0092; Fri, 12 Apr 2024 06:23:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2F4286B008A for ; Fri, 12 Apr 2024 06:23:04 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E62FA1C1368 for ; Fri, 12 Apr 2024 10:23:03 +0000 (UTC) X-FDA: 82000491846.09.5654E8D Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf07.hostedemail.com (Postfix) with ESMTP id 2938E40014 for ; Fri, 12 Apr 2024 10:23:01 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ffQ5kkAd; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf07.hostedemail.com: domain of mingo.kernel.org@gmail.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=mingo.kernel.org@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712917382; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dzOdBlKYaerdid9lqA521iJH25UlFopxA3ir8MLFtZ8=; b=HkIxhfLBbv+e3IIvLdxynLvPvMEYaurCd+v2FQ/Tfec7+yehjKS/J6+GZnOgalod1EeEWy OyMjnAUkBrNHkLnjpCqyjQKmmtyG0Vd6sW2msekTQmkTq32dOxwl9ValsAYBUfLPvPBbae nsEPHZ11l4hcjix0YN+/kshX+c4xUT4= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ffQ5kkAd; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=kernel.org (policy=none); spf=pass (imf07.hostedemail.com: domain of mingo.kernel.org@gmail.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=mingo.kernel.org@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712917382; a=rsa-sha256; cv=none; b=YDikhQCGI5napyIUlZQjGhVF1+dlwa3KvMpcmZo8bdrzoB/Nuso6vPYDnPZH+QwcTz4AKs 5+I3tk3/Fm4sNWHQK6sCLd7VRzssAGKG/B2Mmg7W+I0BS11wYLJy70oiDjRAnPSMNuPOOK VNFZ410BO7k+yXfzaAUiFE+rdhW8Zss= Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-56c5d05128dso756766a12.0 for ; Fri, 12 Apr 2024 03:23:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712917380; x=1713522180; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=dzOdBlKYaerdid9lqA521iJH25UlFopxA3ir8MLFtZ8=; b=ffQ5kkAdE7K0BUBOq68AxLegmXiL1RsmPQxNVzrA5maCa/3IjcfNw/wet/0cfPdRVC FW+mMbxE873oGlcmZsLwmoQrHAaBH2kYvU2uyQC5g+W9aAO8g/kvFxFaR719PWyBrfDJ 6ULSSaGzQDUIUtXaVzO8QWyiD8rv6lXw5XvNPxeO5N0zZPUJmlNTkY4f7S6UY3GPkXTU hfD9eVYiuzboFXcS+34Yt3IucmeOEiUZ8gT2/H6ZOVUyzbf3lDJIKlZdO6cQHTrpUXkt GI4Zb1OpS0HmzrrdfusucuAED2YAjdqTeWgz3BVMUy1eE7K2vA0fT9nw5bW+JW2zkmwG /eFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712917380; x=1713522180; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dzOdBlKYaerdid9lqA521iJH25UlFopxA3ir8MLFtZ8=; b=AQCtCozCmLuDNQGYO8I2NrT90VBiiXHa2OurkO5gPt/pyvsGVFjaPRdiDWKXNqp5Sz 9eSTtXKWXe5Zn6XsEzUE+qnFAEnZZ+il4LRq2cL07aInWVnymdtvrLzE8Yhdo0l2roja TaW1Ypsvv+6Lnnf6UeBO9tFHC/p3F8H7YlrGeEWtnJ6Jhch4E0+ZyV1erTWK71iaGqXX 2Rqn6MRD5YY5DVSLbR5RN1eU67p/Ou9eK86Lgfat5GBncfr2reeoDP88I6eIz228zGZF KiGUJjw978YLJW3HQK5aPv7ZsTE1r/zCBIQRydMmauZ/T4REDlV9vduin++WOlheHDEF 1iSg== X-Forwarded-Encrypted: i=1; AJvYcCUN4mambel0s7/tBPfWKy6A0IdaIzVLHe+T3j+/TLa+ChkXXIsW/svqGe09NKj5ykk8020UI7LU3e9cD1sxIa4nZTM= X-Gm-Message-State: AOJu0Yw/EE6V8X7SMhrKNdG5rMgQFIiDk2HNjv19N9cLWrrEA+wfV74d raPbmbc743mD42S4xQgMCfLm1h8YyMHZgNkiLbuj7Trp+9zErLdo X-Google-Smtp-Source: AGHT+IEcQ9QlbNTawUPcRzL9ZfPgkeLxXYdp8hA6uGuutwA12jpqExjzZsC94ItVvXq6MAtW7xeb+w== X-Received: by 2002:a50:cd5c:0:b0:56c:5a49:730 with SMTP id d28-20020a50cd5c000000b0056c5a490730mr1497914edj.19.1712917380143; Fri, 12 Apr 2024 03:23:00 -0700 (PDT) Received: from gmail.com (1F2EF1A5.nat.pool.telekom.hu. [31.46.241.165]) by smtp.gmail.com with ESMTPSA id fj11-20020a0564022b8b00b0056e6a0ec702sm1515248edb.65.2024.04.12.03.22.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Apr 2024 03:22:59 -0700 (PDT) Date: Fri, 12 Apr 2024 12:22:56 +0200 From: Ingo Molnar To: Mathieu Desnoyers Cc: Ingo Molnar , Peter Zijlstra , Andrew Morton , linux-kernel@vger.kernel.org, "levi . yun" , Catalin Marinas , Dave Hansen , stable@vger.kernel.org, Steven Rostedt , Vincent Guittot , Juri Lelli , Dietmar Eggemann , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Mark Rutland , Will Deacon , Aaron Lu , Thomas Gleixner , Borislav Petkov , "H. Peter Anvin" , Arnd Bergmann , linux-arch@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Subject: Re: [PATCH] sched: Add missing memory barrier in switch_mm_cid Message-ID: References: <20240411174302.353889-1-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240411174302.353889-1-mathieu.desnoyers@efficios.com> X-Rspamd-Queue-Id: 2938E40014 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: n8zwxz3rsu6q1y874hhtd711cdagk3eq X-HE-Tag: 1712917381-163792 X-HE-Meta: U2FsdGVkX18KkyQx7xtnDAUbRPe5TpQxRgkaqUuAsQ4wqL/Fu4UKHNuFeeVEGZtQTnOoodEYkb5R9YZnr7AgZztsgDYotDWLfMEIhX6fdILSGkqMZ65P42Rv64ntr/1sXaGyVpBReW4yzSyeKRAhxrYa0pVuVPZjwnnIaC2h/y9IspVkqzBT4tUJgHD5FF0QNtygTMSM6jbFXK6zVqJVMXH54vyS2m6aOy+EvWF467ve4CtitxmLVXcDkICIcBD30aYSZSlxAOQfL0jvCXcX2LdC8j617SFcuwKPNU+Urym7lN8fow+BGSaKt88sc3y4rCs8eMmEyhC480botYbm0FgJK8KZ2vnUJNsKQGqplKatkgW/TyZdCLJYGe1PE7VKk3yFDkkG/DFW+nCl4+QLcxyxIaD9kTqivPHlk4hQfCbFR2DV0ZFskz7781gsYDZMPy2kx5xQzaIDgKiKa1/uxfmjjyQWA+3is9rrlBi/yon0iWrn8GUBD5SQfvhRfdlqkXjMfm9VAwMvH98Mwy3WWL1VclDW80DdiAeoqiEZUPFKes737xYSW8YCemOMj0SRz1GIC/T0B0SCSbN0Om8QZ2JRxDZHc7l1rPY233fPu/VmamezUtQbOEGdVOFiuw5oUyvzXLLuY7Lj52kf+Fn003ieA3KDw6ZtjaJRjjosRFus0azTvT74m0yi6hjW+mAQeeOsen9uHCkZoqUtDaJmhZtQc9FRLD1GXJxzdA45BgyKb+8jN9Y8WreXTzN1Lc6t9aoh5ZVuSRO+G6IkDwDytgolfluHYhOzLxx3kZmozEWSfSiuWwB10lmcRCFR0ZFFW+WRYPAvGHdY5dCSpGNnNoNtKw+XB7j2Ofoj7ZNLFr6F/B08w6cRwTRFMzeBZCUPXSP3b2sd5CY6S0F53hI10Fy5NP5SuwCI/4ywxzKWBrobnFycO2hGV/eIjzahMgdph/L0cS1ZNraz8NpqpLH wXeqirzy IZdZ9SqwT0KfjDjGYaiv6Y48VB6qAw2hNimDDKAfBNKcY0bIUrKp59LfohaCnTOHlILlSMN+5QPUh8XhIhKP0pC8HBnDNXwogfHr5DAqU9fVgoVXwo8g4oLAXty5MvdBzB6rmEA9tY2e134FElvYPH7GTOoOEPOKvIkJ7bp+6CFV+wFy/44gOZRNUvXIOpNYjt7D0OmngqMqG6uXXKOO3MiOq+moc6Q0tjwXtbyvWSG+bCr9+ruQrCYr04tDPsssarJ3XO88xibYgnWrmkiuIHvKeEvh7Zw+N98XsD/kG6MsLsggm5+kbxoHa1GNqlZ4fUBDVrOcrCt9uMcqmt09NTAp1Kg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: * Mathieu Desnoyers wrote: > Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb() > which the core scheduler code has depended upon since commit: > > commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid") > > If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can > unset the actively used cid when it fails to observe active task after it > sets lazy_put. > > There *is* a memory barrier between storing to rq->curr and _return to > userspace_ (as required by membarrier), but the rseq mm_cid has stricter > requirements: the barrier needs to be issued between store to rq->curr > and switch_mm_cid(), which happens earlier than: > > - spin_unlock(), > - switch_to(). > > So it's fine when the architecture switch_mm() happens to have that > barrier already, but less so when the architecture only provides the > full barrier in switch_to() or spin_unlock(). > > It is a bug in the rseq switch_mm_cid() implementation. All architectures > that don't have memory barriers in switch_mm(), but rather have the full > barrier either in finish_lock_switch() or switch_to() have them too late > for the needs of switch_mm_cid(). > > Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the > generic barrier.h header, and use it in switch_mm_cid() for scheduler > transitions where switch_mm() is expected to provide a memory barrier. > > Architectures can override smp_mb__after_switch_mm() if their > switch_mm() implementation provides an implicit memory barrier. > Override it with a no-op on x86 which implicitly provide this memory > barrier by writing to CR3. > > Link: https://lore.kernel.org/lkml/20240305145335.2696125-1-yeoreum.yun@arm.com/ > Reported-by: levi.yun > Signed-off-by: Mathieu Desnoyers > Reviewed-by: Catalin Marinas # for arm64 > Acked-by: Dave Hansen # for x86 > Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") > Cc: # 6.4.x > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Steven Rostedt > Cc: Vincent Guittot > Cc: Juri Lelli > Cc: Dietmar Eggemann > Cc: Ben Segall > Cc: Mel Gorman > Cc: Daniel Bristot de Oliveira > Cc: Valentin Schneider > Cc: levi.yun > Cc: Mathieu Desnoyers > Cc: Catalin Marinas > Cc: Mark Rutland > Cc: Will Deacon > Cc: Aaron Lu > Cc: Thomas Gleixner > Cc: Borislav Petkov > Cc: Dave Hansen > Cc: "H. Peter Anvin" > Cc: Arnd Bergmann > Cc: Andrew Morton > Cc: linux-arch@vger.kernel.org > Cc: linux-mm@kvack.org > Cc: x86@kernel.org > --- > arch/x86/include/asm/barrier.h | 3 +++ > include/asm-generic/barrier.h | 8 ++++++++ > kernel/sched/sched.h | 20 ++++++++++++++------ > 3 files changed, 25 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h > index 0216f63a366b..d0795b5fab46 100644 > --- a/arch/x86/include/asm/barrier.h > +++ b/arch/x86/include/asm/barrier.h > @@ -79,6 +79,9 @@ do { \ > #define __smp_mb__before_atomic() do { } while (0) > #define __smp_mb__after_atomic() do { } while (0) > > +/* Writing to CR3 provides a full memory barrier in switch_mm(). */ > +#define smp_mb__after_switch_mm() do { } while (0) > + > #include > > #endif /* _ASM_X86_BARRIER_H */ > diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h > index 961f4d88f9ef..5a6c94d7a598 100644 > --- a/include/asm-generic/barrier.h > +++ b/include/asm-generic/barrier.h > @@ -296,5 +296,13 @@ do { \ > #define io_stop_wc() do { } while (0) > #endif > > +/* > + * Architectures that guarantee an implicit smp_mb() in switch_mm() > + * can override smp_mb__after_switch_mm. > + */ > +#ifndef smp_mb__after_switch_mm > +#define smp_mb__after_switch_mm() smp_mb() > +#endif > + > #endif /* !__ASSEMBLY__ */ > #endif /* __ASM_GENERIC_BARRIER_H */ > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 001fe047bd5d..35717359d3ca 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -79,6 +79,8 @@ > # include > #endif > > +#include > + > #include "cpupri.h" > #include "cpudeadline.h" > > @@ -3445,13 +3447,19 @@ static inline void switch_mm_cid(struct rq *rq, > * between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu]. > * Provide it here. > */ > - if (!prev->mm) // from kernel > + if (!prev->mm) { // from kernel > smp_mb(); > - /* > - * user -> user transition guarantees a memory barrier through > - * switch_mm() when current->mm changes. If current->mm is > - * unchanged, no barrier is needed. > - */ > + } else { // from user > + /* > + * user -> user transition relies on an implicit > + * memory barrier in switch_mm() when > + * current->mm changes. If the architecture > + * switch_mm() does not have an implicit memory > + * barrier, it is emitted here. If current->mm > + * is unchanged, no barrier is needed. > + */ > + smp_mb__after_switch_mm(); > + } > } > if (prev->mm_cid_active) { > mm_cid_snapshot_time(rq, prev->mm); Please move switch_mm_cid() from sched.h to core.c, where its only user resides. Thanks, Ingo