From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6B62C4345F for ; Thu, 18 Apr 2024 02:20:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 58DFB6B0098; Wed, 17 Apr 2024 22:20:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5170D6B0099; Wed, 17 Apr 2024 22:20:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 390DF6B009A; Wed, 17 Apr 2024 22:20:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1A1876B0098 for ; Wed, 17 Apr 2024 22:20:04 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 95BD51C15B2 for ; Thu, 18 Apr 2024 02:20:03 +0000 (UTC) X-FDA: 82021047486.05.BFF2B6A Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by imf02.hostedemail.com (Postfix) with ESMTP id B0EA48000C for ; Thu, 18 Apr 2024 02:20:01 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QReGLqTf; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713406801; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/ZUBtcOuh4tZX+VBSdand1Gk2+djVxkGCB8r1DIsuKU=; b=P/ewpKFB/qjMCwsq0Jy+Eukv18vAYayKDw243MJYrTx0XGDr8Vd7/nSZKU9XE9ULGVspez Ani0LunWUIbAwuZ2pk65WtRWhV6FiaRpG160wVKUFcEded+XFCElwbSy040B5gNT+qbF9W jo1EHbYaWq49JB5MA1o3bVegvvhevWc= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=QReGLqTf; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713406801; a=rsa-sha256; cv=none; b=15M1Z8tEiHt6M1uMMuXGI83NTDgXZYISiLrWjdaZV1YU4WslNTE/ZB6i5gaVLTErjzaqAd HwEkKVyFpkzVwCqSBr/zqgMkOFLaDgzjVJnr3mQqkeIdpwra6ENyi4YSgvu7zyOdva5F/0 h9hmhUuL7SPhNrXcKfM0U93totGBdZs= Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-5194cebd6caso384846e87.0 for ; Wed, 17 Apr 2024 19:20:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713406800; x=1714011600; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/ZUBtcOuh4tZX+VBSdand1Gk2+djVxkGCB8r1DIsuKU=; b=QReGLqTflyMh23HD6Ucu8katIqAGKPs4N0HgZTPVOhbnauo34qW9mV4Crf14XGpw5s RrbLBL8xqhqhXH+XJgVRRO1BworTlyeqgfzYGFvtZN1xWCQGhKVJsVVvtut4bJpd1O/E 8BEQrGtIeEd6aX5kAiDr6dgn6DT0nXPEEyQqCUEXPHQ+6zJwOnq6Mq3FXoMW6Sv7xzDR QBQ+RDxWjrzdtTgDbSHVxNcnDonlPbqdvylDzdVFX+rdbeL4K/PQhJIKTOvSA3/dMXIn 1oi/5z9bbwUwmRI0RvEkX9D5uucYLUzUZ56pwtqKN2y4Dg0jLwtGk2r0a+dcwtz9eRsB In9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713406800; x=1714011600; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/ZUBtcOuh4tZX+VBSdand1Gk2+djVxkGCB8r1DIsuKU=; b=gFE9t+GafCi59UoupVzJxcvq7Bx8kSvUwh9gvZ/F7B9PP6p73L6y4vmiRLZaSqx6UI KO6zyHNwtIBjAj64oAESiVD0m9itWgEdRxZMO8BA3N1V50lZwHZhuBtb+fLBnJvKbc5V O3AumaD6DvWCH0E/N3iPs2E5cyHOg3spfb9Ab1OTlPwPQngbUMSlz+64Ts1GZO8gG2U7 8zvOf5h2N5wzYIIyXctCJ7+LuTOF2px+ssylXc8BWQkDiaPmtkfVBlVgKY5mrWmx9T3F PZATxq1FfjObpeAho+3Vo0ury/FH1wEMrLlLhBcTigVNbX1C0YHNU1nrY7STGje7zf1W u1Zg== X-Forwarded-Encrypted: i=1; AJvYcCUj8E5SrmvhzOWUwOnUjijGFv9zmoKxix6jyAhFDmTJZu/P424Bx9SDC4hI88OVAiBBn4fg71nc0G8QoxIUKTn+XbA= X-Gm-Message-State: AOJu0YxJtPXM/fm63Ea7n4w2WzEvXry+V4uhq75GcLeNIWe6RnQIbf65 XsAkprpmyHOqCErEsmG2JKMOQGFSXLOL7vzzXOoQWbfdAMtLWDMKtgljTNhD7pJmpQADAtT1Bul SZ/4Psxm+oi5w2DeOo8jKP9m49JCvC7g65Cxc X-Google-Smtp-Source: AGHT+IGcsMlJFNtz4Wb4Xe1MH6eieW6vX+0//7/qp2kTOMG+G32KLjSuQdOumHYhlT0kb5Kfe4k+6rHp4QdNpVzW9z8= X-Received: by 2002:ac2:4356:0:b0:516:7739:354c with SMTP id o22-20020ac24356000000b005167739354cmr487518lfl.58.1713406799629; Wed, 17 Apr 2024 19:19:59 -0700 (PDT) MIME-Version: 1.0 References: <171328983017.3930751.9484082608778623495.stgit@firesoul> <171328989335.3930751.3091577850420501533.stgit@firesoul> In-Reply-To: <171328989335.3930751.3091577850420501533.stgit@firesoul> From: Yosry Ahmed Date: Wed, 17 Apr 2024 19:19:23 -0700 Message-ID: Subject: Re: [PATCH v1 2/3] cgroup/rstat: convert cgroup_rstat_lock back to mutex To: Jesper Dangaard Brouer Cc: tj@kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, cgroups@vger.kernel.org, longman@redhat.com, netdev@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, shakeel.butt@linux.dev, kernel-team@cloudflare.com, Arnaldo Carvalho de Melo , Sebastian Andrzej Siewior , mhocko@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: B0EA48000C X-Stat-Signature: iq51tnz7ay4kzmtktkuneofacts8yf39 X-HE-Tag: 1713406801-867252 X-HE-Meta: U2FsdGVkX1+RXGNuPHUNON9yFCF6FQVgFEFsTz0D7Q/46HDHnvox8DR95AYROIg5d8pq0Gg4PbJ0+fezbl7CInEff9MepWt7jmxZAw340kvPN2TliHa1QYy4I8Jw1HmAP3Vx6iijeoQC3HXLD1XJ53X3ojypADcSZX5woyAFbLdV2wjiLXbsFYi9kXT7hft2b6YU+48vVK+GcYhu1zBG62HjUtxzOGotD4CrNafBsMU0DdzZI8nKZm1wiTHYXWVdhTwlzeY8SVkqOQ8kyuoG/qknuvczHpscIi4noRV9ikW2HS8d2ewy0nkZqd6Hx19XJ25vtjuX87rFHs455AaYzEy4WrcDmqSIFSuNvUtRCz8WuSUYekwpy9l0mQUrzOsk3Qf70HshPv/JIiR/kzUUiYphQxy+x4jL+t2oY7sVYWoMJ1LClJjdko3PNd7emVCwe/+eFj3RDAFnS+nspl0OJuktbjO+KnbhGpLRDpQnm+TzuUyWwiR0bBuVDfTnt1V0iuq6C60FhVceg5EJBgW4wjY6yuaO/04cR1M+zsmkYzlxPZGh2FeXi/kApHqcHMhQOoA5hyy8FDFffMvt9GB6fPcnAl6zdQkVPkJ2NPtiaek0ytoeAwzjQy1yvGBU/xuIx1Np21YTBpKuMTpv9Gx5yXsOj4664v/le60CYYYCzwuEhS0GtoUnSURuYa1VAy56wyOtjSpNMT1uWLOpTXncXqoRlOBe0AetUNDhBoJ+SgVUVVnU9mtUgYpynfoHIpdyWVubYgpBb5l+PjAmV84XO3q+h/qOngGQ26LiTWzlPOM21icpIl7IrkIDdZ1Onbd25v7JUoH4MT3oNmhOQHH0G4inW/LuzJoyl5OOoTstXeuHGE791/szyv8spOvqZX2Yb0YRs3JXk726lMWSnyH414X+ef9IX34/lWUJe+54Q0u5tls4ft5F0DwKHpN+UVkkzLfPGuPPRacWIpoZom9 hZXSU+eE E000C2smnCM6zb8G2Q9kUV2sNKHzeTnZnPIyNFtpXQn4GYy5itJzVS3wtQrLBdzRihcm9T7TXkhFAxdganGdAEG6t0UsGX5ygqxXvwoVvFQ3AWjt2VuHJ2PLXGHXgXavYzctHfduneQpjoJp9KEBf5BsmofOnogl1EyQfs3W9kwMitk9gpqg+kavCGgCYIvXsz/Ild2C7FXouOdu8tx4sGcwTMIbHosOk947O/Ex/428ioRheE7cz4r2E6YjDIzUF5CgHl6YSQF70VrEBxh3kw2MwXnj3TJKACocBASIJzwUM+IEKaM/NxIB3H2w1joxrJVuWd9mKV/6H6d+71Z4WDBA0iA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 16, 2024 at 10:51=E2=80=AFAM Jesper Dangaard Brouer wrote: > > Since kernel v4.18, cgroup_rstat_lock has been an IRQ-disabling spinlock, > as introduced by commit 0fa294fb1985 ("cgroup: Replace cgroup_rstat_mutex > with a spinlock"). > > Despite efforts in cgroup_rstat_flush_locked() to yield the lock when > necessary during the collection of per-CPU stats, this approach has led > to several scaling issues observed in production environments. Holding > this IRQ lock has caused starvation of other critical kernel functions, > such as softirq (e.g., timers and netstack). Although kernel v6.8 > introduced optimizations in this area, we continue to observe instances > where the spin_lock is held for 64-128 ms in production. > > This patch converts cgroup_rstat_lock back to being a mutex lock. This > change is made possible thanks to the significant effort by Yosry Ahmed > to eliminate all atomic context use-cases through multiple commits, > ending in 0a2dc6ac3329 ("cgroup: removecgroup_rstat_flush_atomic()"), > included in kernel v6.5. > > After this patch lock contention will be less obvious, as converting this > to a mutex avoids multiple CPUs spinning while waiting for the lock, but > it doesn't remove the lock contention. It is recommended to use the > tracepoints to diagnose this. I will keep the high-level conversation about using the mutex here in the cover letter thread, but I am wondering why we are keeping the lock dropping logic here with the mutex? If this is to reduce lock contention, why does it depend on need_resched()? spin_needbreak() is a good indicator for lock contention, but need_resched() isn't, right? Also, how was this tested? When I did previous changes to the flushing logic I used to make sure that userspace read latency was not impacted, as well as in-kernel flushers (e.g. reclaim). We should make sure there are no regressions on both fronts. > > Signed-off-by: Jesper Dangaard Brouer > --- > kernel/cgroup/rstat.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c > index ff68c904e647..a90d68a7c27f 100644 > --- a/kernel/cgroup/rstat.c > +++ b/kernel/cgroup/rstat.c > @@ -9,7 +9,7 @@ > > #include > > -static DEFINE_SPINLOCK(cgroup_rstat_lock); > +static DEFINE_MUTEX(cgroup_rstat_lock); > static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); > > static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); > @@ -238,10 +238,10 @@ static inline void __cgroup_rstat_lock(struct cgrou= p *cgrp, int cpu_in_loop) > { > bool contended; > > - contended =3D !spin_trylock_irq(&cgroup_rstat_lock); > + contended =3D !mutex_trylock(&cgroup_rstat_lock); > if (contended) { > trace_cgroup_rstat_lock_contended(cgrp, cpu_in_loop, cont= ended); > - spin_lock_irq(&cgroup_rstat_lock); > + mutex_lock(&cgroup_rstat_lock); > } > trace_cgroup_rstat_locked(cgrp, cpu_in_loop, contended); > } > @@ -250,7 +250,7 @@ static inline void __cgroup_rstat_unlock(struct cgrou= p *cgrp, int cpu_in_loop) > __releases(&cgroup_rstat_lock) > { > trace_cgroup_rstat_unlock(cgrp, cpu_in_loop, false); > - spin_unlock_irq(&cgroup_rstat_lock); > + mutex_unlock(&cgroup_rstat_lock); > } > > /* see cgroup_rstat_flush() */ > @@ -278,7 +278,7 @@ static void cgroup_rstat_flush_locked(struct cgroup *= cgrp) > } > > /* play nice and yield if necessary */ > - if (need_resched() || spin_needbreak(&cgroup_rstat_lock))= { > + if (need_resched()) { > __cgroup_rstat_unlock(cgrp, cpu); > if (!cond_resched()) > cpu_relax();