From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0B94C197BF for ; Thu, 27 Feb 2025 22:52:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70718280008; Thu, 27 Feb 2025 17:52:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B6DF280007; Thu, 27 Feb 2025 17:52:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 57DE6280008; Thu, 27 Feb 2025 17:52:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 39FF2280007 for ; Thu, 27 Feb 2025 17:52:41 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id CE348B7245 for ; Thu, 27 Feb 2025 22:52:40 +0000 (UTC) X-FDA: 83167225680.26.DC9DF1C Received: from out-173.mta1.migadu.com (out-173.mta1.migadu.com [95.215.58.173]) by imf05.hostedemail.com (Postfix) with ESMTP id DEC6C100004 for ; Thu, 27 Feb 2025 22:52:38 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=EO4zcRQG; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf05.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.173 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740696759; a=rsa-sha256; cv=none; b=RALiSx8XH5VaSDC+73Q0knLc7b/qDGzhOnlEjVW2atn+4/9HVv+CEzB5dD2mCbSwxpQvVs /uhRVkrlSKBPBvwZvzneTJ5QyYxkU4QFamrHcUJq5rYiUqWfe3ecnw10Uy85GX/exwXF7j /LOMr05bv4p94/ZrtEdOJo1dfFkCMHY= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=EO4zcRQG; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf05.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.173 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740696759; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EzrrEOdP7tby5qwkwQSlCjJg0w9qK7j+XV3Tszu1ts4=; b=dtD65hY6tIkycaHxmB8+oszIMTu815ZKvcITzpHn2baTx2CdIxcu6jvnOvkh3Kqha5mWeR 09P8ana8Hjs2VabUMZfx+4O54kqfXtTNEgmYGezpgDgYFUGshw5zZ2XrI6kj2XJc1w0iFr ckHo38bMa6u8nsRZGXDMtyBxBc6vadI= Date: Thu, 27 Feb 2025 14:52:29 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1740696756; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=EzrrEOdP7tby5qwkwQSlCjJg0w9qK7j+XV3Tszu1ts4=; b=EO4zcRQGRiF4DR9jSr+DaXP5BduNdf1oWUCGZrJAl8B4N5YWdzB/PrwT+/psRK4yzhPau9 V7PWgDIWQ9EGd20xNxRDz4ux/WF8I81fuAF+XcWRdUCy0RuzWVfJAt+wDD8x6E/tAOCt4g rRZVBofg6Idn3uUpmoI2dNmt1gxqfxw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: inwardvessel Cc: tj@kernel.org, yosryahmed@google.com, mhocko@kernel.org, hannes@cmpxchg.org, akpm@linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: Re: [PATCH 3/4 v2] cgroup: separate rstat locks for subsystems Message-ID: References: <20250227215543.49928-1-inwardvessel@gmail.com> <20250227215543.49928-4-inwardvessel@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250227215543.49928-4-inwardvessel@gmail.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: DEC6C100004 X-Stat-Signature: y7s1tfx9hogfqpu4uwe3gi7o9jzmsgcu X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1740696758-33906 X-HE-Meta: U2FsdGVkX18iRNc/HXbepKckh1em1OfeoU11Da0gz1sbCu03dzSH6mCW5lJcGfUWZFdg4Fp0mc039+i9iMzeJRVnJvfK8CqoSQUyK1rlIAgbyVqql6A53AB70Ve44Qa+crP6hpDSznKOUXESmpZHV5WXVEkAvohI5HBQLpCOW49vac5IV/ButpQWj+lhhddup1Yr1S3RFWzeIcM3odk/+dTmeJ3VToAVuGeYkUe0qICRFWmUuwl1Yc/7Cx2dqrkeLpRAGmXdCnU36HAYK0dkl8A2X1Z+la8sRL6ENzOze9e7Jvis1bl8r1kfEj2eACc4AHueunDK+L69hEeLHOoIvg48HOWyN2knavVurEMulGZs787n5k75oaKppAbNyYVmsr9q5EvVrRJOONnwtsEPPRiln3xx8CxdggaTzhWB3APHmxuSWs+z9iihzbesUAeRtraOgzXBq6301lNvet88YCVw/btx68pCve5EbVKAAPDLia1EXPkwD6JUB3qzrQJXn7bwIJuOJL7D2tEPUsh9GygSAHW3/rZu9cxKB2LdG4GLWVphgDhzPzdHVWE46BPuxAYHha32GxcA/6x3/ohro6kdsM6EPuT5Y9xWhm41ksTVu2UUn/ZDVHhfHtMPjLI8I6LcUwaAMtWf1sWaHo3D8DINhkWUi0a87B1ZizdeE1CFuz3oOWHRcn6XBaPr5cm3R5yLRX5A+J456K4IIjiZkylW+W3XWfAHjqZJ5jjltT5d+GhTIeGIfWkO7EotZ3Gwg3DuT6JCzz8+FpfiU9vlANfNlK4VXWgGDXyCEoDHCp2V8k2zt+N3qjYI8iRAn1PrsWA5gOTeGARDqSG6mWw1tuW80X+bM20RBtpNKCrQnhTDYrGqSmPFsCI3P6R9rrApp8KTKmiM90WhZV1uno01CKDKwS77DlDE9tYzPLEAr6OPL5vmVa8OCRs54U3jaPSmrUESXxzdFJYgNGWTJ3D TIXuy/hO w7+iyRs8xRLWSvG/52eLd6VsHRDaGPYh0wuGToxCLSmNfMxy9cl1hsshkKBqLovdMTTg0dV25/MI/+v3FOBZS5cCdeFazGs34TfsyeE/K//trA9UdV7StMED+ztYMvo4/u3ZOybHD5JDGgfsezcXTnLWaFA12+QS33kuvT/UYk8X/Yy3SJZnkCzBZVFMRizRtVgwLyE7BOnW7L5E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 27, 2025 at 01:55:42PM -0800, inwardvessel wrote: > From: JP Kobryn > > Let the existing locks be dedicated to the base stats and rename them as > such. Also add new rstat locks for each enabled subsystem. When handling > cgroup subsystem states, distinguish between formal subsystems (memory, > io, etc) and the base stats subsystem state (represented by > cgroup::self) to decide on which locks to take. This change is made to > prevent contention between subsystems when updating/flushing stats. > > Signed-off-by: JP Kobryn Couple of nits below otherwise: Reviewed-by: Shakeel Butt > --- > kernel/cgroup/rstat.c | 93 +++++++++++++++++++++++++++++++++---------- > 1 file changed, 72 insertions(+), 21 deletions(-) > > diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c > index 88908ef9212d..b3eaefc1fd07 100644 > --- a/kernel/cgroup/rstat.c > +++ b/kernel/cgroup/rstat.c > @@ -9,8 +9,12 @@ > > #include > > -static DEFINE_SPINLOCK(cgroup_rstat_lock); > -static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); > +static DEFINE_SPINLOCK(cgroup_rstat_base_lock); > +static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_base_cpu_lock); > + > +static spinlock_t cgroup_rstat_subsys_lock[CGROUP_SUBSYS_COUNT]; > +static DEFINE_PER_CPU(raw_spinlock_t, > + cgroup_rstat_subsys_cpu_lock[CGROUP_SUBSYS_COUNT]); > The name of these locks are too long and you had to go multi-line. Please just reduce the size of the names. These are local to this file, so maybe you can drop cgroup_rstat_ from them or keep the rstat. > static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); > > @@ -20,8 +24,13 @@ static struct cgroup_rstat_cpu *cgroup_rstat_cpu( > return per_cpu_ptr(css->rstat_cpu, cpu); > } > > +static inline bool is_base_css(struct cgroup_subsys_state *css) > +{ > + return css->ss == NULL; > +} > + > /* > - * Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock). > + * Helper functions for rstat per CPU locks. > * > * This makes it easier to diagnose locking issues and contention in > * production environments. The parameter @fast_path determine the > @@ -36,12 +45,12 @@ unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu, > bool contended; > > /* > - * The _irqsave() is needed because cgroup_rstat_lock is > - * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring > - * this lock with the _irq() suffix only disables interrupts on > - * a non-PREEMPT_RT kernel. The raw_spinlock_t below disables > - * interrupts on both configurations. The _irqsave() ensures > - * that interrupts are always disabled and later restored. > + * The _irqsave() is needed because the locks used for flushing are > + * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring this lock > + * with the _irq() suffix only disables interrupts on a non-PREEMPT_RT > + * kernel. The raw_spinlock_t below disables interrupts on both > + * configurations. The _irqsave() ensures that interrupts are always > + * disabled and later restored. > */ > contended = !raw_spin_trylock_irqsave(cpu_lock, flags); > if (contended) { > @@ -87,7 +96,7 @@ __bpf_kfunc void cgroup_rstat_updated( > struct cgroup_subsys_state *css, int cpu) > { > struct cgroup *cgrp = css->cgroup; > - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); > + raw_spinlock_t *cpu_lock; > unsigned long flags; > > /* > @@ -101,6 +110,12 @@ __bpf_kfunc void cgroup_rstat_updated( > if (data_race(cgroup_rstat_cpu(css, cpu)->updated_next)) > return; > > + if (is_base_css(css)) > + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); > + else > + cpu_lock = per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + > + css->ss->id; Use the array index here like cgroup_rstat_subsys_cpu_lock[css->ss->id]. > + > flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); > > /* put @css and all ancestors on the corresponding updated lists */ > @@ -208,11 +223,17 @@ static struct cgroup_subsys_state *cgroup_rstat_updated_list( > struct cgroup_subsys_state *root, int cpu) > { > struct cgroup *cgrp = root->cgroup; > - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); > struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(root, cpu); > struct cgroup_subsys_state *head = NULL, *parent, *child; > + raw_spinlock_t *cpu_lock; > unsigned long flags; > > + if (is_base_css(root)) > + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); > + else > + cpu_lock = per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + > + root->ss->id; Same here. > + > flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, false); > > /* Return NULL if this subtree is not on-list */ > @@ -315,7 +336,7 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css, > struct cgroup *cgrp = css->cgroup; > int cpu; > > - lockdep_assert_held(&cgroup_rstat_lock); > + lockdep_assert_held(&lock); > > for_each_possible_cpu(cpu) { > struct cgroup_subsys_state *pos; > @@ -356,12 +377,18 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css, > __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) > { > struct cgroup *cgrp = css->cgroup; > + spinlock_t *lock; > + > + if (is_base_css(css)) > + lock = &cgroup_rstat_base_lock; > + else > + lock = &cgroup_rstat_subsys_lock[css->ss->id]; > > might_sleep(); > > - __cgroup_rstat_lock(&cgroup_rstat_lock, cgrp, -1); > - cgroup_rstat_flush_locked(css, &cgroup_rstat_lock); > - __cgroup_rstat_unlock(&cgroup_rstat_lock, cgrp, -1); > + __cgroup_rstat_lock(lock, cgrp, -1); > + cgroup_rstat_flush_locked(css, lock); > + __cgroup_rstat_unlock(lock, cgrp, -1); > } > > /** > @@ -376,10 +403,16 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) > void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) > { > struct cgroup *cgrp = css->cgroup; > + spinlock_t *lock; > + > + if (is_base_css(css)) > + lock = &cgroup_rstat_base_lock; > + else > + lock = &cgroup_rstat_subsys_lock[css->ss->id]; > > might_sleep(); > - __cgroup_rstat_lock(&cgroup_rstat_lock, cgrp, -1); > - cgroup_rstat_flush_locked(css, &cgroup_rstat_lock); > + __cgroup_rstat_lock(lock, cgrp, -1); > + cgroup_rstat_flush_locked(css, lock); > } > > /** > @@ -389,7 +422,14 @@ void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) > void cgroup_rstat_flush_release(struct cgroup_subsys_state *css) > { > struct cgroup *cgrp = css->cgroup; > - __cgroup_rstat_unlock(&cgroup_rstat_lock, cgrp, -1); > + spinlock_t *lock; > + > + if (is_base_css(css)) > + lock = &cgroup_rstat_base_lock; > + else > + lock = &cgroup_rstat_subsys_lock[css->ss->id]; > + > + __cgroup_rstat_unlock(lock, cgrp, -1); > } > > int cgroup_rstat_init(struct cgroup_subsys_state *css) > @@ -435,10 +475,21 @@ void cgroup_rstat_exit(struct cgroup_subsys_state *css) > > void __init cgroup_rstat_boot(void) > { > - int cpu; > + struct cgroup_subsys *ss; > + int cpu, ssid; > > - for_each_possible_cpu(cpu) > - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); > + for_each_subsys(ss, ssid) { > + spin_lock_init(&cgroup_rstat_subsys_lock[ssid]); > + } > + > + for_each_possible_cpu(cpu) { > + raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu)); > + > + for_each_subsys(ss, ssid) { > + raw_spin_lock_init( > + per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + ssid); Same here. > + } > + } > } > > /* > -- > 2.43.5 >