From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEDFEC282D0 for ; Fri, 28 Feb 2025 17:37:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 79185280006; Fri, 28 Feb 2025 12:37:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7420B280001; Fri, 28 Feb 2025 12:37:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E199280006; Fri, 28 Feb 2025 12:37:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 43468280001 for ; Fri, 28 Feb 2025 12:37:15 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id ECFFC1C8CC4 for ; Fri, 28 Feb 2025 17:37:14 +0000 (UTC) X-FDA: 83170059588.06.7B9387C Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf07.hostedemail.com (Postfix) with ESMTP id 06AE840002 for ; Fri, 28 Feb 2025 17:37:12 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lkGrEU1K; spf=pass (imf07.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740764233; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NdMrZTu+JrCHgIRKXV4NO2YpO5ndQ1lOEadMR4OgnYw=; b=s79TstklQYI4kUTezPV3IdsDjGp8EbSGsdeA3JsoJ15y+g5HBTRPSrcqkFMiHld05eA1Pe Mvy36k2r3+CqYy/723G4n/DPc8kzV5fZRFS3J7Qa1LmGSWBelwEFHlu66erwHSmYXDcocq KZZYb3JVSaNcjafRL05HYTl3TOywWRo= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lkGrEU1K; spf=pass (imf07.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740764233; a=rsa-sha256; cv=none; b=GuMvxUpfuLS5DFPRlSN2IRDyCbd1vDd2plyvIJ9zKkJp4WROm5GgUHSXS/bQe3pgw/yg82 CW9KOPbseoF8ab1bOBpEABqwS5lG1+92apGggqpKcWWbrEybshNnHUcyzyzB2h5V4O7KR7 SJ+wbeyHB79a44OYkASQBl2H7o7mxC0= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-223785beedfso13807555ad.1 for ; Fri, 28 Feb 2025 09:37:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740764232; x=1741369032; darn=kvack.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:from:subject:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=NdMrZTu+JrCHgIRKXV4NO2YpO5ndQ1lOEadMR4OgnYw=; b=lkGrEU1KRa+rpbiSRDG/nFWWt+dLk+AS/Q68yNS+5XMl/3nNNkRB06X4G37YZJ4LUc 8uWD/k6aKONhrpSP8kYutjrk64K/fZPz2/G+hl5lm/ZlMnhjT65jr77mWzd9I9xU/NPY H1mDch0FP5TDKe/VOn9KhRCp5ukHXYSnLGJplyjz7JdFWRZv/4Sq1TblUt0rBvL0GObj XgJlU4YUpWnmA5G+CF9T5liquwBQxy3qPuNh8X7gmPTmhvXPc4jbIoZ0EBGbSNy+0nQK /TGxn7F6+knX+ezVaMaNJLQnbFCT0PAYCmXXPCbE3F3K9sNkRNpAnZgYwqKB+MrL5v/r Vuqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740764232; x=1741369032; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:from:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NdMrZTu+JrCHgIRKXV4NO2YpO5ndQ1lOEadMR4OgnYw=; b=WPDq43e8hX+IwpZ149ptWeb8weOAREsTxqjCK7vOg+PoGvZwiO0hVTPN+vejiNtsSF miD0J5nnP/QzNV+ObdW0RA9CtULDnxVxCZ2rtO/9q8OEjwk6wxGxAtXm8LaCQPKFB3fQ CmFWkaTeMpVR/KWYRbMpii7xsKWOBwwUj5E+mow/EbCoZ+SeXdu0sQgV8CskyoWfnq4m 0l6jV5KNKUti8NzSrXhTCwQkjhjIzm0XHfsQX1aqVm1CKVNZhp1yqZJLMIOW6Rb0seHM ydu8DFjLcS8t3LXxkSaUekPILpkclny2MKFYm5U0nIn7Ms4a3CWOi3BTlZ6rCiDFZKW3 +qXQ== X-Gm-Message-State: AOJu0YxuvCs+uL2SGmFJi1JMmizCSiN4R0A8z1aCWAzBM3k6+FDE0LZt DPmLvwKL6Z/7WD+QCQ4L2+gE1eplQpINsN8FgRblNdcu7htMQBnt X-Gm-Gg: ASbGncvQpljeGyKKBMuVbiFv3Ve1Yy4A0WSpPpPHQvg2cUVPFx7S+la5z78BCO2nyhu 7Z3er1gs1kyB+0TsT0QsrC12qwjTRODEyfEc0rZLIvWPuMWssF5LLqe4+gMhFc8cV92Pc6rfkf3 1YDsuSp49H/NKNIi8BTVL2pUL/tBzG+jVr6rF1Dv83kQh2otfTkkzXZeAvqra18Y1dsTmKtavj2 IBvCUHZIlKHrvRLiyjllqJ++d6zqu04UIqZnB/zqSJlF5N/z/4m2IQuD7U81mGkVBrswUq074g1 pOclGO8A1tTEEyAiArzHpLtuAyrxX61skwT0Up4VczsCpXK1V0jU188xbSksUxojuqrIeypE X-Google-Smtp-Source: AGHT+IF8c/xBz9EwykaYLYCvdYKWfCWhuBb+uHTXMxEGYLh/lB4lJDw6IVU4KDMDM+ZawePIRoEaJg== X-Received: by 2002:a17:90b:4a47:b0:2ee:c2b5:97a0 with SMTP id 98e67ed59e1d1-2febabf41a4mr6436141a91.25.1740764231848; Fri, 28 Feb 2025 09:37:11 -0800 (PST) Received: from [192.168.2.117] (c-67-188-127-15.hsd1.ca.comcast.net. [67.188.127.15]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fea67530d0sm4077625a91.10.2025.02.28.09.37.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Feb 2025 09:37:11 -0800 (PST) Message-ID: <084e5bc1-d2cd-4b3d-82ee-7cd83d2462e0@gmail.com> Date: Fri, 28 Feb 2025 09:37:09 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 3/4 v2] cgroup: separate rstat locks for subsystems From: JP Kobryn To: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mhocko@kernel.org, hannes@cmpxchg.org, akpm@linux-foundation.org Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com References: <20250227215543.49928-1-inwardvessel@gmail.com> <20250227215543.49928-4-inwardvessel@gmail.com> Content-Language: en-US In-Reply-To: <20250227215543.49928-4-inwardvessel@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: b3xb3jx9pbkyc7j6ppcng5m66d5opaae X-Rspamd-Queue-Id: 06AE840002 X-Rspamd-Server: rspam06 X-Rspam-User: X-HE-Tag: 1740764232-881031 X-HE-Meta: U2FsdGVkX18/lF17YAY2vJOcX5UpQL9BLdzaniR/GHqwE4yx/SzDuniR9Y8rlOuslwGjkDlirxqoKXyDw5Kn+D+dOFK+6okU1IPHlw38j8l5fN3Z0via7VUYLq0fJrVyhDjClKfriHWlguAvuNUabgljAM6edbReNt86BHz3LYtCqd3sC3y0Oc243aQgy1wmNGYpQsFbuFqJ+ltph4t/NzZlSo3f9YZDFhmWkW0lgrfnsR8pcEg1wP2Po7LPIsLd1KDUHm2XggEGuRH5vjFDYnbs6TN3KmlCC5Nipm6ZDqNXwAuWlxeCubrqmNqRvfi3Hv17P1PL6Lwih7qWEWdQ1Y1ieGVkET3ZfUZPqvd9fT9+Bu18K3fQyhnU+MtPH+IuyJ7kEi7ioEaYtcBcmatnFw6w9Dwf5MqTiev2vBPw8/v/RtluAl3h4PpP4citHE0ROoKkX7eCCnKWyh+5qPrtVumI6KSfqxXN/3A8W1cWLib1ZPqJjkH96mnzZpjpgbBECsjyJN6v2umlj8o5Dmsi5oMI7x1/1vnibpUyN4xwnPlKpVMk9lzpKx9AyPEJ5enVL9NJlDmTOIFcWFIBE4ndc4Mb2nrRhPx954m0lYIDfTLFJoUK5DK/g//8Cu5unnwt1wKWufR+/LKV9o2q39gC/EQ2hXx24EnGTyyJyBRfSOcNmL8CArxk9WVCM+gd1YNpwQGLwIsUIiCdLXHdSBBf3fnLhhbBXLZOrDUH1zrbo0/YKIgNu3WvludP9KEjumluwRqFSkZn2ezH+qA1LrZN8luOYYOJFjKzbNC1wYsrC1WjWIVl6zH07m0geOMACJfar1PlnIhXh4/IFp4yIk7q4LoAAcaT8x1OAb6wjTew3bbWcJL5Yfxm3hl5KJ9CboOFHWHsXVCHGq0F6zWnT21Lxz+adlbXgZxNT1vuFjyWiSJ0CsthmzM5W0bZyFRerMdE0NHUINCG4TJAZg9Xzd9 P4yiOW4R cQtluSxB57pGh6rgG21HXX151gzggonyjy7OBWU6SjyUBpUPGquMEU3LdngxcYwFP+cBNp93MkzeL3NV8Q3/FJUnCpeKD2d8Z9rBjD0J85VbbGHBB+WzOHICCH5y6hBaajxTKxMdio0K7/PXnfEKuiseXFmurgDBbP0qfUXrxlvq3t2QUS1+d6JEA+obsdoiGFW/mCAvA7++csn1iI3dohOj0L1UMNZnWCG2o6AOeIQQ7gVyIo+eSZMDs8ocM+3ORm5VKJOZTZza/IgeqXnZ2AmBoE2dq8hgjFyy+A6hGQ77Q6yTSDVUe9+N3pLTuu1h83MPsLno5SDjgVBG6/Nu3HAlUenvx/CFwTcJ1/RctUSKd2aJuvMj2Ua/SSylEDka4a8vV7K10ayR2XTR4j9hYCwUGOUOLNozCbrs57SVWCt6sqL0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2/27/25 1:55 PM, inwardvessel wrote: > From: JP Kobryn > > Let the existing locks be dedicated to the base stats and rename them as > such. Also add new rstat locks for each enabled subsystem. When handling > cgroup subsystem states, distinguish between formal subsystems (memory, > io, etc) and the base stats subsystem state (represented by > cgroup::self) to decide on which locks to take. This change is made to > prevent contention between subsystems when updating/flushing stats. > > Signed-off-by: JP Kobryn > --- > kernel/cgroup/rstat.c | 93 +++++++++++++++++++++++++++++++++---------- > 1 file changed, 72 insertions(+), 21 deletions(-) > > diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c > index 88908ef9212d..b3eaefc1fd07 100644 > --- a/kernel/cgroup/rstat.c > +++ b/kernel/cgroup/rstat.c > @@ -9,8 +9,12 @@ > > #include > > -static DEFINE_SPINLOCK(cgroup_rstat_lock); > -static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); > +static DEFINE_SPINLOCK(cgroup_rstat_base_lock); > +static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_base_cpu_lock); > + > +static spinlock_t cgroup_rstat_subsys_lock[CGROUP_SUBSYS_COUNT]; > +static DEFINE_PER_CPU(raw_spinlock_t, > + cgroup_rstat_subsys_cpu_lock[CGROUP_SUBSYS_COUNT]); > > static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); > > @@ -20,8 +24,13 @@ static struct cgroup_rstat_cpu *cgroup_rstat_cpu( > return per_cpu_ptr(css->rstat_cpu, cpu); > } > > +static inline bool is_base_css(struct cgroup_subsys_state *css) > +{ > + return css->ss == NULL; > +} > + > /* > - * Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock). > + * Helper functions for rstat per CPU locks. > * > * This makes it easier to diagnose locking issues and contention in > * production environments. The parameter @fast_path determine the > @@ -36,12 +45,12 @@ unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu, > bool contended; > > /* > - * The _irqsave() is needed because cgroup_rstat_lock is > - * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring > - * this lock with the _irq() suffix only disables interrupts on > - * a non-PREEMPT_RT kernel. The raw_spinlock_t below disables > - * interrupts on both configurations. The _irqsave() ensures > - * that interrupts are always disabled and later restored. > + * The _irqsave() is needed because the locks used for flushing are > + * spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring this lock > + * with the _irq() suffix only disables interrupts on a non-PREEMPT_RT > + * kernel. The raw_spinlock_t below disables interrupts on both > + * configurations. The _irqsave() ensures that interrupts are always > + * disabled and later restored. > */ > contended = !raw_spin_trylock_irqsave(cpu_lock, flags); > if (contended) { > @@ -87,7 +96,7 @@ __bpf_kfunc void cgroup_rstat_updated( > struct cgroup_subsys_state *css, int cpu) > { > struct cgroup *cgrp = css->cgroup; > - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); > + raw_spinlock_t *cpu_lock; > unsigned long flags; > > /* > @@ -101,6 +110,12 @@ __bpf_kfunc void cgroup_rstat_updated( > if (data_race(cgroup_rstat_cpu(css, cpu)->updated_next)) > return; > > + if (is_base_css(css)) > + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); > + else > + cpu_lock = per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + > + css->ss->id; > + > flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true); > > /* put @css and all ancestors on the corresponding updated lists */ > @@ -208,11 +223,17 @@ static struct cgroup_subsys_state *cgroup_rstat_updated_list( > struct cgroup_subsys_state *root, int cpu) > { > struct cgroup *cgrp = root->cgroup; > - raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); > struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(root, cpu); > struct cgroup_subsys_state *head = NULL, *parent, *child; > + raw_spinlock_t *cpu_lock; > unsigned long flags; > > + if (is_base_css(root)) > + cpu_lock = per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu); > + else > + cpu_lock = per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + > + root->ss->id; > + > flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, false); > > /* Return NULL if this subtree is not on-list */ > @@ -315,7 +336,7 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css, > struct cgroup *cgrp = css->cgroup; > int cpu; > > - lockdep_assert_held(&cgroup_rstat_lock); > + lockdep_assert_held(&lock); I need to remove the ampersand since the variable is already a pointer. > > for_each_possible_cpu(cpu) { > struct cgroup_subsys_state *pos; > @@ -356,12 +377,18 @@ static void cgroup_rstat_flush_locked(struct cgroup_subsys_state *css, > __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) > { > struct cgroup *cgrp = css->cgroup; > + spinlock_t *lock; > + > + if (is_base_css(css)) > + lock = &cgroup_rstat_base_lock; > + else > + lock = &cgroup_rstat_subsys_lock[css->ss->id]; > > might_sleep(); > > - __cgroup_rstat_lock(&cgroup_rstat_lock, cgrp, -1); > - cgroup_rstat_flush_locked(css, &cgroup_rstat_lock); > - __cgroup_rstat_unlock(&cgroup_rstat_lock, cgrp, -1); > + __cgroup_rstat_lock(lock, cgrp, -1); > + cgroup_rstat_flush_locked(css, lock); > + __cgroup_rstat_unlock(lock, cgrp, -1); > } > > /** > @@ -376,10 +403,16 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup_subsys_state *css) > void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) > { > struct cgroup *cgrp = css->cgroup; > + spinlock_t *lock; > + > + if (is_base_css(css)) > + lock = &cgroup_rstat_base_lock; > + else > + lock = &cgroup_rstat_subsys_lock[css->ss->id]; > > might_sleep(); > - __cgroup_rstat_lock(&cgroup_rstat_lock, cgrp, -1); > - cgroup_rstat_flush_locked(css, &cgroup_rstat_lock); > + __cgroup_rstat_lock(lock, cgrp, -1); > + cgroup_rstat_flush_locked(css, lock); > } > > /** > @@ -389,7 +422,14 @@ void cgroup_rstat_flush_hold(struct cgroup_subsys_state *css) > void cgroup_rstat_flush_release(struct cgroup_subsys_state *css) > { > struct cgroup *cgrp = css->cgroup; > - __cgroup_rstat_unlock(&cgroup_rstat_lock, cgrp, -1); > + spinlock_t *lock; > + > + if (is_base_css(css)) > + lock = &cgroup_rstat_base_lock; > + else > + lock = &cgroup_rstat_subsys_lock[css->ss->id]; > + > + __cgroup_rstat_unlock(lock, cgrp, -1); > } > > int cgroup_rstat_init(struct cgroup_subsys_state *css) > @@ -435,10 +475,21 @@ void cgroup_rstat_exit(struct cgroup_subsys_state *css) > > void __init cgroup_rstat_boot(void) > { > - int cpu; > + struct cgroup_subsys *ss; > + int cpu, ssid; > > - for_each_possible_cpu(cpu) > - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); > + for_each_subsys(ss, ssid) { > + spin_lock_init(&cgroup_rstat_subsys_lock[ssid]); > + } > + > + for_each_possible_cpu(cpu) { > + raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_base_cpu_lock, cpu)); > + > + for_each_subsys(ss, ssid) { > + raw_spin_lock_init( > + per_cpu_ptr(cgroup_rstat_subsys_cpu_lock, cpu) + ssid); > + } > + } > } > > /*