From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42A12C282C6 for ; Mon, 3 Mar 2025 19:50:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D0016280006; Mon, 3 Mar 2025 14:50:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB0C2280003; Mon, 3 Mar 2025 14:50:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B77A2280006; Mon, 3 Mar 2025 14:50:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 99E0E280003 for ; Mon, 3 Mar 2025 14:50:31 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4B7C81C8DBE for ; Mon, 3 Mar 2025 19:50:31 +0000 (UTC) X-FDA: 83181281862.25.23A2F0E Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) by imf26.hostedemail.com (Postfix) with ESMTP id 49605140016 for ; Mon, 3 Mar 2025 19:50:29 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=mSTW6dm0; spf=pass (imf26.hostedemail.com: domain of yosry.ahmed@linux.dev designates 91.218.175.171 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741031429; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q3fB94flygGoLPiPIqeiergtpOByBNOlrdyc4hB6M+Y=; b=n/Tt7puqKCP01j3mVAulyDZb1Yakqnu//AkyYDanNlnY4Nxt3x9G6MHbrfDOWcGG2uIIQa bokXTjjkvDizw/gCQHpIHI/UmHs2XOs6BD0ywHsgc6w4C5zHLa7CxUdctq8fq73iT/ClEb 6lJtTl0jhOwnmG4DrQ10aMD70r5krNs= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=mSTW6dm0; spf=pass (imf26.hostedemail.com: domain of yosry.ahmed@linux.dev designates 91.218.175.171 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741031429; a=rsa-sha256; cv=none; b=WTCkeflzQHgpAwXmyILnHcPZJPQE4hcPb1LKn8BWCGt8mU3x3p2WgIAxHCvJaJzVH9DO5k xgRU4oGLQvcFGOukubf6ECQAPsxYLq0xTvapiH7HWKk0FAFcju32NgEjL5N+74+ONpfscT 8JduCb5sAFk+sOLrg28thUB3es+wabg= Date: Mon, 3 Mar 2025 19:50:20 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1741031425; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q3fB94flygGoLPiPIqeiergtpOByBNOlrdyc4hB6M+Y=; b=mSTW6dm0PfRZGDt8QUW6jKXrSGRmZeOCERlxq6fbRDZBujkqfDXww33bheGf8317dP02nc ek8x2N4xmYP+FYVXAnhGItZAVk3KnqaQf0jfMMZy71/5JNRxIOnP0Soro9+zxoJPI8KBTz IVA0IwwnIjNhlNaaVl8KAMw1139J8Xc= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: Shakeel Butt Cc: Michal =?utf-8?Q?Koutn=C3=BD?= , inwardvessel , tj@kernel.org, mhocko@kernel.org, hannes@cmpxchg.org, akpm@linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: Re: [PATCH 3/4 v2] cgroup: separate rstat locks for subsystems Message-ID: References: <20250227215543.49928-1-inwardvessel@gmail.com> <20250227215543.49928-4-inwardvessel@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Stat-Signature: u7rdpwabi55uz66rrweorwynany1cn17 X-Rspamd-Queue-Id: 49605140016 X-Rspamd-Server: rspam06 X-Rspam-User: X-HE-Tag: 1741031429-777260 X-HE-Meta: U2FsdGVkX1+qZb8hF8cCORb8nfdNRIhYXmAL0y2Df6aug2P1ckbRyp5iuBkTfsApu0stEBaBEgRkY4cAXpNxh5TNokI/djbQnVmSBujkQgFTh/mzw5EaOjGMTtq4ga0JIrHElNBBBhQq0KMZhn9uX/QOR1o2XqmEEE1anY66WspV6JPR/ZSETcMYm7Sv2Mr+Ckv341uIJ8bwEbgEvSEo2wUT3Zm23mx6nnum408JOgz9/6GgUSsgWnGrtPd9ylG3XxbxSnEn3jQI35ud34CKPVfYSOye3Y9KP2vIryF5K9PYPB6sCirG6Xy2jWcBk7rCeq0JiFJ+XFBuqUgyXvGVdGgV1FzkDEG2X4T8UjeaqVcFOgCduuwqg8DqMIS84oT/Y4ZL5YQbB58RQfjwlDT/CK/MBBGL62bY9xpAOYk5Nhg0mUci13ueiecT2XTKP/LVuyD3nE8R8fDEk+ciyOeoT5bkyKHrb2s3rJ0I36UmJ+GMJqbzHiJKTtVzonE5N2BdOdxtzptrNAq8P1g/eJr2sRfzeJuYdGBAjhhGaQOOKmnXD6YWxSD/2MKTZ+GbHKwQZDIT/Y1qy3D3BqNugd89BAso8Y3GkO6IXRWSfOvHwZQ9ldyWPOk6eqXFKP9qbPC8KhxlU+pGxp70FBuhHiQ3jkIlG2fL99ii2COgeEuPr4oLVOwdGaHq/JkzytacakEediSwUzveWcBUbaJxWu4xvG25XVeLAjBu4MRP7eeohKIJZYfmIe/LyI5lVyBHe9c2rOwV4O6BkTGWxHbxvUKDMI2PDSD+kv4pt1zYLgEl/x71/iBaA7zqrwl1bmAPkct6XZ5CyFqgeZBMYtuob89ap7IWnUAmArUFSCha/JAfo5YqT0YjGg9X93jfD/0hXEk74If4rQy63HKyZZmpqjQvKxQTE9wbDVbPng6ID0z3BU8X5qql0iKvRHldZUyKJRxWxYfcEWBGVNNBt6ksWcl LCY1dKpk mXF6pmvx5hR73+8Bi1x1JHn9px9g4x63CbmM3nNv+t2xVu/aksO8ZLxtIkNnXCEBIy2k417SYepNy7xwYNVVmz5eSzQc0aTYF/gwW6lYlXQYLpXBF4wNj7pcL/3w3yjVLHgnsO/JoWdKjIUkTMORHWlhMu2XKxIZ1+qWTScOgqCSp8KyR5zPEXDoP1uTuHQkCKp9rKI8H0FNmhUEMjL9cnSd9aJ5M0zu4mlr5OcueRCNmuD6aNYEwSAP5Kg8vXzi+MOnPGqZ3q2n3JJ4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 03, 2025 at 10:40:45AM -0800, Shakeel Butt wrote: > On Mon, Mar 03, 2025 at 06:29:53PM +0000, Yosry Ahmed wrote: > > On Mon, Mar 03, 2025 at 04:22:42PM +0100, Michal Koutný wrote: > > > On Thu, Feb 27, 2025 at 01:55:42PM -0800, inwardvessel wrote: > > > > From: JP Kobryn > > > ... > > > > +static inline bool is_base_css(struct cgroup_subsys_state *css) > > > > +{ > > > > + return css->ss == NULL; > > > > +} > > > > > > Similar predicate is also used in cgroup.c (various cgroup vs subsys > > > lifecycle functions, e.g. css_free_rwork_fn()). I think it'd better > > > unified, i.e. open code the predicate here or use the helper in both > > > cases (css_is_cgroup() or similar). > > > > > > > void __init cgroup_rstat_boot(void) > > > > { > > > > - int cpu; > > > > + struct cgroup_subsys *ss; > > > > + int cpu, ssid; > > > > > > > > - for_each_possible_cpu(cpu) > > > > - raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); > > > > + for_each_subsys(ss, ssid) { > > > > + spin_lock_init(&cgroup_rstat_subsys_lock[ssid]); > > > > + } > > > > > > Hm, with this loop I realize it may be worth putting this lock into > > > struct cgroup_subsys_state and initializing them in > > > cgroup_init_subsys() to keep all per-subsys data in one pack. > > > > I thought about this, but this would have unnecessary memory overhead as > > we only need one lock per-subsystem. So having a lock in every single > > css is wasteful. > > > > Maybe we can put the lock in struct cgroup_subsys? Then we can still > > initialize them in cgroup_init_subsys(). > > > > Actually one of things I was thinking about if we can just not have > per-subsystem lock at all. At the moment, it is protecting > rstat_flush_next field (today in cgroup and JP's series it is in css). > What if we make it a per-cpu then we don't need the per-subsystem lock > all? Let me know if I missed something which is being protected by this > lock. I think it protects more than that. I remember locking into this before, and the thing I remember is that stats themselves. Looking at mem_cgroup_stat_aggregate(), we aggregate the stats into the per-cgroup counters non-atomically. This is only protected by the rstat lock (currently global, per-subsystem with the series) AFAICT. Not sure if only the memory subsystem has this dependency or if others do as well. I remember looking into switching these counters to atomics to remove the global lock, but it performed worse IIRC. I remember also looking into partioning the lock into a per-cgroup (or per-css now?) lock, and only holding locks of the parent and child cgroups as we flush each cgroup. I don't remember if I actually implemented this, but it introduces complexity. Perhaps we can defer the locking into the subsystem, if only the memory controller requires it. In this case mem_cgroup_css_rstat_flush() (or mem_cgroup_stat_aggregate()) can hold a memcg-specific spinlock only while aggregating the stats. There could be other things protected by the lock, but that's what I remember.