From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BEB7C369D5 for ; Fri, 25 Apr 2025 00:18:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07FC76B002B; Thu, 24 Apr 2025 20:18:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 02DD86B002C; Thu, 24 Apr 2025 20:18:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E37C26B002D; Thu, 24 Apr 2025 20:18:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C69796B002B for ; Thu, 24 Apr 2025 20:18:07 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 953CD5C5F4 for ; Fri, 25 Apr 2025 00:18:07 +0000 (UTC) X-FDA: 83370653814.02.E2BC33A Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf27.hostedemail.com (Postfix) with ESMTP id A87C140005 for ; Fri, 25 Apr 2025 00:18:05 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=nej7xa7o; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745540285; a=rsa-sha256; cv=none; b=7uuFDxVYpg4mZ1mGi2CkjaGqBCj1oKf9n2oUO3jcAKw1zmMX7NEIYlsNSP73qeo7JW3YLS MuUZCSEZ+haEE9hodAfY+4vMSsA7xdDnooZLuMRRVjj301Etymz91dooUGczNpqkcas64g lJjlkV/Ir4x+09fxLHRm3fXeMn8+cpc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=nej7xa7o; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of inwardvessel@gmail.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=inwardvessel@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745540285; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HRDr9PeSGcFtwz0oZ2PUUdR/OCyAfw+nh1U48kUyY2E=; b=g9R3CW7fjPe4rFmLfXyINR57Uq3rfLVpx7ihxNbN//za3CyJEra/Rb+4GjdBvBiRi8RKB7 X8m8T9DsAPBnEn+SMsbSImVdWWDTRyX6Um9/w4rILoeNMDj1gn73R9cqkh5RNsEPgio98k 2aaMUX/rvYzW4ng35IRRnDgFjTaXL+c= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-22409077c06so25766765ad.1 for ; Thu, 24 Apr 2025 17:18:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745540284; x=1746145084; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=HRDr9PeSGcFtwz0oZ2PUUdR/OCyAfw+nh1U48kUyY2E=; b=nej7xa7o3Z2l+IiWfyygopiXx3xP2TsKO6+03Z+FLsuepvqdWYEJiKbcjMN4h49dBU UPAGYnVe55vGNyAzG+tJ/DKK9ZC2Y5Hmlh3fWMO7K1udL61MLU+BMg25xZPzGqqbnBTm U5Ze97Fbf6knyksvhF3zXsqkrJHavHJDyNWvAKULwkg8MQdwfeT0lKeXdV7gTEzjp7Hi gLW7LOSJlvgIqB2EK4tYjVRQmoxdiu+6m22t0MkB5Rdwx5CoqXRWLta4e51hrzw1MJbZ b0KZ7hrY3/pecND1QGOfzBo8b3q0huF0/MWX8BwOCBb3j8NmPNUys+pcZZdVHrUcEUym pWHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745540284; x=1746145084; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HRDr9PeSGcFtwz0oZ2PUUdR/OCyAfw+nh1U48kUyY2E=; b=CyDudOFZaYiQPAHg/aZsI0lLOOX9G0jDOB9Iye8RG+MROj4u6a5BT64I49QBVblSK/ 8wy/unJ4KfwgCyhwWPIuhN2Chjz1kCQvxSkD9zl0OSa//Nt4hijrm5P5LUCB9I0jeJvf +WlbuORly8HlhUHKz+97Imo+MSKuFIgf/1l8LmWuFARmHjBjzhpVsaTijU4EoHEAkF89 uWgCuJH63qxwmDVzEaYzTnXSrN/ZLDZMUU2yGFdFl34GPF7WyPiZaDqcmiupedbb8dyq WbVdbeE4I2MeqPhFmRFwE9DiECREVwF2lZPn+QhGhf3IFEr9+EpRP1bBUGLQkgrSiTgi rtXg== X-Forwarded-Encrypted: i=1; AJvYcCUyX2raxEKgdN17xpgV/ulJxhLH5y0fc53qIU+u1C6SS0UQmcow45lsHyz6LbHiTWfKKNuPCprbPQ==@kvack.org X-Gm-Message-State: AOJu0YwWh4Je5R3Q0SsAPjy3bCXzU/ZMXPMYnirqE5AOg7sjbcDNYjtM Tg6aVPdqf4K2GlyLjRP7rKd1EbzpcmeU3PLeJ4n7WyBxqz3fHxhd X-Gm-Gg: ASbGncuCL8v6VrzGEhPKznKd2rDLGO0UJSi/qVaSiz+jxbUL1Os9sjJUpiJoAnfkI16 2S8vJHL0TX2XSj+yJlFOJqH0PIXRsDmv3o7INpiUEsfQgA1Z4cWRLJFB7eWM8yEaxGiumZdCwla ow3HhrrIRa2i7xXc7C/S4T/IxqulQ2nNwnzelgUQ5dlEtFvLWUxWwBsMN6bPQPDuae0w9hGBEO3 nUhUF5RP9WLjNpE1k6LxKhJa75/RNBrL2sJtDqKj6GvHY17R2k+Yfb7uYL2kk2FO+uaYiV9iSU3 Z9P4pdJnUVUM9E5+fvDRYRtLhbEN9y6b/B5KnmuCr9bjqXX8IanHYn/aiA5XFlV8URA10cNT X-Google-Smtp-Source: AGHT+IHRKmQAD3A5aGZ76IlaEDszcu2IoUYRj8gVhbMRFBOPSHJkXzVXq/4Zg5EmL+uBpbtYJtR22g== X-Received: by 2002:a17:902:ef50:b0:22d:b2c9:7fd7 with SMTP id d9443c01a7336-22dbf5ef7fcmr4819875ad.21.1745540284473; Thu, 24 Apr 2025 17:18:04 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1151:15:fd6c:bb6:36da:5926? ([2620:10d:c090:500::5:5d68]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22db5216998sm19907915ad.229.2025.04.24.17.18.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Apr 2025 17:18:03 -0700 (PDT) Message-ID: <9d261429-a1bc-4ab0-b68b-d18ecdfd0766@gmail.com> Date: Thu, 24 Apr 2025 17:18:02 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 5/5] cgroup: use subsystem-specific rstat locks to avoid contention To: Yosry Ahmed Cc: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com References: <20250404011050.121777-1-inwardvessel@gmail.com> <20250404011050.121777-6-inwardvessel@gmail.com> <6807a132.df0a0220.28dc80.a1f0SMTPIN_ADDED_BROKEN@mx.google.com> Content-Language: en-US From: JP Kobryn In-Reply-To: <6807a132.df0a0220.28dc80.a1f0SMTPIN_ADDED_BROKEN@mx.google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A87C140005 X-Stat-Signature: qspb5ftksw4746onxwh1x6xzd6wu8ehg X-HE-Tag: 1745540285-581206 X-HE-Meta: U2FsdGVkX1/2cnN73s7/uM4i67U7OgCJEhGSntocN6nw+dzm2L5dNUT6VNHxwuKlVJj5eF/LmSUXoyZZLjuWQl2B0eUAej+1CJH7Ve5+M+4/SQVW38qi0ZVZ3ePeP5nSvloKrWqjkjGmq9fc/hsSm6fNfTs96/vdMEKnDkvk691dZ8wNi+jzVyJVPqq6+BCfB2oJT9xl2DdIlswlm0SO/yANTweZBxNOOAPqqObsARjwF/duirJDUrHgQPBfr1S0JsQ50RYexeesJkdZ9L6/9hEGYq0r136bMVPFmse5davDVzdfX/gL4LDoT/GHxTApgZawSd3Sl2kwy42+FTkQyeoV4E7qmU3iv4HKCFTW9S/SlO61jRCt0p3QYIDlMJlN1v1ENZGMBe1AzTLIq7dGfLN3g1fPaSIDuj+YBWL3G7It11XQ0UHh32ybhYT/16yPV8ExamI2ss+27XfZ1VUkuEjyCpq39DJwrR2H5c+f6GL6qBp2EZmHtnt18ewbapxb5cRfe8WI5ZRr57x/O2sonk91yQeLJq6WjG6u/PB46/2EcjWSxk7W65piq0k4ofuV+RLjqf97P3j+Mm3swCx5QsSgt5PSwHL/bAoz7p8E955QLBUa8UuYGGzWdOD53U1RpcippNDiYylPee0h6To67McnmLvkNv8+M2/4m6ZCYtlx29gZ3LVVWz5uPmxSWZe/rRKSIyB/kvIYeneCLUZpSyhTg0IdF5U8LoCF8YLlaaB5pXblxxuPNle5FVtHrD9ltgsB8jzqyYrwbH+qon4pS5s07q2smU1giN65ohImrcmpfIGf4nD2csHzmR7Ji3ljMMAOWS9qqJEID0uPsF25kFak8QB14ifsnsy/+AYHsUbQC0bhtAEk24hD7ytPZHIPJAZZ/iTSFeyqiPt5z2/PZJnuR/eCkN8Xrem63fbVvcfOwMTeKkM15DjKJctRYeVz14+JpwIU4fVTX2s9nvZ 7m2w41iQ yqXUCXwtYVr23pOYqt1s9nboS6i2rJ8QWroaVeElwOI4t53mC8KuG4TsVH481evz8Ijy/ORVgi0uWORLmRTYaNOrV8T1adhjyg7LeT7F/DgW1CH5lrexNyojKujgprLxOrgzOfuAO6z6D/Eluze5ohfuVxusvzBg2w6+apDlMU7c8MzFdh1n92xP+VkiZirUE2PP77b/r0yd670+XwErfyK3T3uhG3USeCh9XZKidRExc2h4UozDMni1oqoz+mjIZhOVK66ThEIP+XnuMDPFk+jAiinPknCoqOn3frwkoeWlZ1La3m2qPeLyOcelj0k7jrGq06eqe8vQ9kFKotL8scl33MPTLrsK3gkKkbvnzwEiOQahF3sHdVtmXzf3BEnePfA7ZiNjE2SFH3mtk0jux+eARSEC8Z42630LzPE/X15g7sBiQbbpRaRc6q3x8RGXh3m8z6X7IoYPvGt0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/22/25 7:01 AM, Yosry Ahmed wrote: > On Thu, Apr 03, 2025 at 06:10:50PM -0700, JP Kobryn wrote: >> It is possible to eliminate contention between subsystems when >> updating/flushing stats by using subsystem-specific locks. Let the existing >> rstat locks be dedicated to the cgroup base stats and rename them to >> reflect that. Add similar locks to the cgroup_subsys struct for use with >> individual subsystems. >> >> Lock initialization is done in the new function ss_rstat_init(ss) which >> replaces cgroup_rstat_boot(void). If NULL is passed to this function, the >> global base stat locks will be initialized. Otherwise, the subsystem locks >> will be initialized. >> >> Change the existing lock helper functions to accept a reference to a css. >> Then within these functions, conditionally select the appropriate locks >> based on the subsystem affiliation of the given css. Add helper functions >> for this selection routine to avoid repeated code. >> >> Signed-off-by: JP Kobryn >> --- >> block/blk-cgroup.c | 2 +- >> include/linux/cgroup-defs.h | 16 +++-- >> include/trace/events/cgroup.h | 12 +++- >> kernel/cgroup/cgroup-internal.h | 2 +- >> kernel/cgroup/cgroup.c | 10 +++- >> kernel/cgroup/rstat.c | 101 +++++++++++++++++++++++--------- >> 6 files changed, 103 insertions(+), 40 deletions(-) >> >> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c >> index 0560ea402856..62d0bf1e1a04 100644 >> --- a/block/blk-cgroup.c >> +++ b/block/blk-cgroup.c >> @@ -1074,7 +1074,7 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu) >> /* >> * For covering concurrent parent blkg update from blkg_release(). >> * >> - * When flushing from cgroup, cgroup_rstat_lock is always held, so >> + * When flushing from cgroup, the subsystem lock is always held, so >> * this lock won't cause contention most of time. >> */ >> raw_spin_lock_irqsave(&blkg_stat_lock, flags); >> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h >> index c58c21c2110a..bb5a355524d6 100644 >> --- a/include/linux/cgroup-defs.h >> +++ b/include/linux/cgroup-defs.h >> @@ -223,7 +223,10 @@ struct cgroup_subsys_state { >> /* >> * A singly-linked list of css structures to be rstat flushed. >> * This is a scratch field to be used exclusively by >> - * css_rstat_flush_locked() and protected by cgroup_rstat_lock. >> + * css_rstat_flush_locked(). >> + * >> + * Protected by rstat_base_lock when css is cgroup::self. >> + * Protected by css->ss->rstat_ss_lock otherwise. >> */ >> struct cgroup_subsys_state *rstat_flush_next; >> }; >> @@ -359,11 +362,11 @@ struct css_rstat_cpu { >> * are linked on the parent's ->updated_children through >> * ->updated_next. >> * >> - * In addition to being more compact, singly-linked list pointing >> - * to the cgroup makes it unnecessary for each per-cpu struct to >> - * point back to the associated cgroup. >> + * In addition to being more compact, singly-linked list pointing to >> + * the css makes it unnecessary for each per-cpu struct to point back >> + * to the associated css. >> * >> - * Protected by per-cpu cgroup_rstat_cpu_lock. >> + * Protected by per-cpu css->ss->rstat_ss_cpu_lock. >> */ >> struct cgroup_subsys_state *updated_children; /* terminated by self cgroup */ > > This rename belongs in the previous patch, also the comment about > updated_children should probably say "self css" now. Thanks, will adjust in next rev. > >> struct cgroup_subsys_state *updated_next; /* NULL iff not on the list */ >> @@ -793,6 +796,9 @@ struct cgroup_subsys { >> * specifies the mask of subsystems that this one depends on. >> */ >> unsigned int depends_on; >> + >> + spinlock_t rstat_ss_lock; >> + raw_spinlock_t __percpu *rstat_ss_cpu_lock; > > Can we use local_lock_t here instead? I guess it would be annoying > because we won't be able to have common code for locking/unlocking. It's > annoying because the local lock is a spinlock under the hood for non-RT > kernels anyway.. > >> }; >> >> extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem; > [..] >> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c >> index 37d9e5012b2d..bcc253aec774 100644 >> --- a/kernel/cgroup/rstat.c >> +++ b/kernel/cgroup/rstat.c >> @@ -9,8 +9,8 @@ >> >> #include >> >> -static DEFINE_SPINLOCK(cgroup_rstat_lock); >> -static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock); >> +static DEFINE_SPINLOCK(rstat_base_lock); >> +static DEFINE_PER_CPU(raw_spinlock_t, rstat_base_cpu_lock); > > Can we do something like this (not sure the macro usage is correct): > > static DEFINE_PER_CPU(raw_spinlock_t, rstat_base_cpu_lock) = __SPIN_LOCK_UNLOCKED(rstat_base_cpu_lock); > > This should initialize the per-CPU spinlocks the same way > DEFINE_SPINLOCK does IIUC. Yes, this could work. There was some earlier discussion and it was decided against though [0]. > >> >> static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); >> > [..] >> @@ -422,12 +443,36 @@ void css_rstat_exit(struct cgroup_subsys_state *css) >> css->rstat_cpu = NULL; >> } >> >> -void __init cgroup_rstat_boot(void) >> +/** >> + * ss_rstat_init - subsystem-specific rstat initialization >> + * @ss: target subsystem >> + * >> + * If @ss is NULL, the static locks associated with the base stats >> + * are initialized. If @ss is non-NULL, the subsystem-specific locks >> + * are initialized. >> + */ >> +int __init ss_rstat_init(struct cgroup_subsys *ss) >> { >> int cpu; >> >> + if (!ss) { >> + spin_lock_init(&rstat_base_lock); > > IIUC locks defined with DEFINE_SPINLOCK() do not need to be initialized, > and I believe we can achieve the same for the per-CPU locks as I > described above and eliminate this branch completely. True. I'll remove in next rev, but will keep the initialization for the per-cpu locks based on [0]. [0] https://lore.kernel.org/all/jtqkpzqv4xqy4vajm6fljin6ospot37qg252dfk3yldzq6aubu@icg3ndtg3k7i/