From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3282C3ABC0 for ; Wed, 7 May 2025 09:24:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01CB46B000A; Wed, 7 May 2025 05:24:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F0EA36B0085; Wed, 7 May 2025 05:24:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD7176B0089; Wed, 7 May 2025 05:24:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BAB586B000A for ; Wed, 7 May 2025 05:24:55 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id B325C1A17C3 for ; Wed, 7 May 2025 09:24:56 +0000 (UTC) X-FDA: 83415577392.17.1EB8CDF Received: from out-178.mta1.migadu.com (out-178.mta1.migadu.com [95.215.58.178]) by imf08.hostedemail.com (Postfix) with ESMTP id B58D816000B for ; Wed, 7 May 2025 09:24:54 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=JHwuDU1D; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf08.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.178 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746609895; a=rsa-sha256; cv=none; b=lnUEvwv9Kd2kLgwWpVGWDWs8ZpfC72BJkrkLcM5q23N3wpycZqCtmtlHU8cB4CtEPN8/e7 TXZqTq+tt3PgswqJmGoLWxwCEGqMbnzyyWJLRuxYhSn6wUno33F6o+EE82ntNM01iQSp+P YGNwDVN5m5S8NYaei3/Wczy/DdmUHq4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=JHwuDU1D; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf08.hostedemail.com: domain of yosry.ahmed@linux.dev designates 95.215.58.178 as permitted sender) smtp.mailfrom=yosry.ahmed@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746609895; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dStHY5fpNcqDWW2Pio95dd+616z2TJmryOCg4BVYuC4=; b=tJW+g2htfeUE3gzPjTwmgYduv2+IPQdaGGkdtE/bse7IUo+k558IvS+47j8fZRV5e0YCFa 14Hz6yHzzetWDoFdfJw0QB+9vqSM1PKoGxB7At3LuLlWmJLrTc5JyiOFgVdik6n00ek4DU 2D4HPXmmzOZb92XQu8u5lKJ4Uv87rog= Date: Wed, 7 May 2025 09:24:43 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1746609892; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=dStHY5fpNcqDWW2Pio95dd+616z2TJmryOCg4BVYuC4=; b=JHwuDU1DP2ERZm/8ZDfb8sA+09tL10Y7R8K0jlLl9Ys2x852/kTbWta+Ng/gnzToPVsQBS w0aX2fjgE4H6638OkeEOSDokBwczgyWMOM536gTBXuqmP5vnrH4Mh22oS3BHVinklEoLcM 46OzfcYzABU/IoH1rLWgrsO8Y77nQ9M= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yosry Ahmed To: JP Kobryn Cc: tj@kernel.org, shakeel.butt@linux.dev, mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: Re: [PATCH v5 2/5] cgroup: use separate rstat trees for each subsystem Message-ID: References: <20250503001222.146355-1-inwardvessel@gmail.com> <20250503001222.146355-3-inwardvessel@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250503001222.146355-3-inwardvessel@gmail.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: B58D816000B X-Stat-Signature: f3yok8j5bytewnwyidkri3u5mkxwmosx X-Rspam-User: X-HE-Tag: 1746609894-429138 X-HE-Meta: U2FsdGVkX1+cHcEPbx4+cfoQ51zQE6EnDj88U/2UprKQtszqwhwpkaiBe3WOb+LA0Y1DRN7ZrnJbM870hNxGFy/b7QGxa/0wl+ZvPzz1jVx3pQjU+sMZp8zqLo3VuYEOCWjCEsGFXwgNyG0FTJ3SUXVeduJlP9R3HfEKQGjKjkP/kHb8v2bks92QFgikgmxGnbmAacCTD1jcWBdV+Xe4L6bCZDlDyWy+rW4+VtT3ZaWPi4cH3idMCYJaNr4frNNkIzoWJ1DApNe41IFnY7I/Is4WZcxHlyoUKYG6l8v8cYG0IPGuTbA1ehzsa20wXZOzPuFuVnf290Y5jz5EB+XEii3Q4zuOyN8hgtFhQgIzNekJTMInpsO6tXK8jhPHRfLkgfsvJzmToFH8mK6OuV4DYFiZGRXwNSO0Pj/2BEvCO00gwPbd009gptsPFzklk9hg8Ko7KsTOLlp3OaBYXBV9NAYdPhjdvTEdMfU/M7AwOuJTBY8I8qPif5a+boEOJnbxDzYH7iYi9ZGfRP8L41dGg892Yc5/QdlwmneLQL4fNBOmUX0f6lQChTW1NyW3kMff1vo6y+bapvzMzhPYzZKdNRdgaU6jG5dIe8qF2uKGky0XwCTSE+DiMxKBxcWtuzTrTAp/b2UPNWQilMYNKxGQC/TYBvgKkYe+t1NlpbQajUDRP1yW7nJbJxXD7xAtK4UE5KOVYAe5O40VhW5wCpEdjFfpH27UOMUtzEtVMs748hk79JG9yWJ9XMbaDrwB1Sn5zwqyn0zik0aGoutdbVEPdm+0H+WZJPDXmyA/hMawlScequExouBcvO+1LNLm7lAD8V2pb0jUfkIOv3gYEY6sJLGnQkTgdpwFRIyeLfWVQq5LQR+zVbysKwg9kWGntqOf3K+mUvanAQVjKNdgHURdCpy+urIeaNmMILm6C05ocCXCnjKWpjF6KZk9BTMeF5XgS2RJsM4CsKUv6Uj2xSf SVCHlCAq 5rxbo0KRtzRGKBUfU4hTzL8Q9zmRdhONgIEv+ablH/Yw5OduXgSYY3N3/plsAdsqL3lIwdNpjC73kxx4LvnpgnJkiwppXz/qUq5z85lDzFc+L9+IW9Bn2aCDHAO64eI7z7/XRuQA17rOPRHMgwk5htkGqpkbjhpLuYOEfPxyRp1edZc/lQNS8ZduRxzcGvWZfww4FD8F1nkak5pt0o6k+ABfVI+NDEzGpNZfJwgbI4AD9o3c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 02, 2025 at 05:12:19PM -0700, JP Kobryn wrote: > Different subsystems may call cgroup_rstat_updated() within the same > cgroup, resulting in a tree of pending updates from multiple subsystems. > When one of these subsystems is flushed via cgroup_rstat_flushed(), all > other subsystems with pending updates on the tree will also be flushed. > > Change the paradigm of having a single rstat tree for all subsystems to > having separate trees for each subsystem. This separation allows for > subsystems to perform flushes without the side effects of other subsystems. > As an example, flushing the cpu stats will no longer cause the memory stats > to be flushed and vice versa. > > In order to achieve subsystem-specific trees, change the tree node type > from cgroup to cgroup_subsys_state pointer. Then remove those pointers from > the cgroup and instead place them on the css. Finally, change update/flush > functions to make use of the different node type (css). These changes allow > a specific subsystem to be associated with an update or flush. Separate > rstat trees will now exist for each unique subsystem. > > Since updating/flushing will now be done at the subsystem level, there is > no longer a need to keep track of updated css nodes at the cgroup level. > The list management of these nodes done within the cgroup (rstat_css_list > and related) has been removed accordingly. > > Conditional guards for checking validity of a given css were placed within > css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc > usage in bpf programs. Guards were also placed within css_rstat_init/exit() > in order to help consolidate calls to them. At call sites for all four > functions, the existing guards were removed. > > Signed-off-by: JP Kobryn > --- > include/linux/cgroup-defs.h | 46 ++-- > kernel/cgroup/cgroup.c | 34 +-- > kernel/cgroup/rstat.c | 200 ++++++++++-------- > .../selftests/bpf/progs/btf_type_tag_percpu.c | 18 +- > 4 files changed, 160 insertions(+), 138 deletions(-) [..] > @@ -6101,6 +6087,8 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early) > } else { > css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL); > BUG_ON(css->id < 0); > + > + BUG_ON(css_rstat_init(css)); We call css_rstat_init() here for subsys css's that are not early initialized, and in cgroup_setup_root() self css's. We can probably move both calls into cgroup_init() as I mentioned earlier? Also, I think this version just skips calling css_rstat_init() for early initialized subsys css's, without adding the patch that you talked about earlier which protects against early initialized subsystems using rstat. > } > > /* Update the init_css_set to contain a subsys [..] > @@ -217,31 +225,32 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head, > } > > /** > - * cgroup_rstat_updated_list - return a list of updated cgroups to be flushed > - * @root: root of the cgroup subtree to traverse > + * css_rstat_updated_list - return a list of updated cgroups to be flushed css's? > + * @root: root of the css subtree to traverse > * @cpu: target cpu > * Return: A singly linked list of cgroups to be flushed > * > * Walks the updated rstat_cpu tree on @cpu from @root. During traversal, > - * each returned cgroup is unlinked from the updated tree. > + * each returned css is unlinked from the updated tree. > * > * The only ordering guarantee is that, for a parent and a child pair > * covered by a given traversal, the child is before its parent in > * the list. > * > * Note that updated_children is self terminated and points to a list of > - * child cgroups if not empty. Whereas updated_next is like a sibling link > - * within the children list and terminated by the parent cgroup. An exception > + * child css's if not empty. Whereas updated_next is like a sibling link > + * within the children list and terminated by the parent css. An exception > * here is the cgroup root whose updated_next can be self terminated. > */ [..] > @@ -383,32 +395,45 @@ __bpf_kfunc void css_rstat_flush(struct cgroup_subsys_state *css) > > int css_rstat_init(struct cgroup_subsys_state *css) > { > - struct cgroup *cgrp = css->cgroup; > + struct cgroup *cgrp; > int cpu; > + bool is_cgroup = css_is_cgroup(css); > > - /* the root cgrp has rstat_cpu preallocated */ > - if (!cgrp->rstat_cpu) { > - cgrp->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu); > - if (!cgrp->rstat_cpu) > - return -ENOMEM; > - } > + if (is_cgroup) { > + cgrp = css->cgroup; You can keep 'cgrp' initialized at the top of the function to avoid the extra level of indentation here, right? > > - if (!cgrp->rstat_base_cpu) { > - cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu); > + /* the root cgrp has rstat_base_cpu preallocated */ > if (!cgrp->rstat_base_cpu) { > - free_percpu(cgrp->rstat_cpu); > + cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu); > + if (!cgrp->rstat_base_cpu) > + return -ENOMEM; > + } > + } else if (css->ss->css_rstat_flush == NULL) > + return 0; We can probably just do this at the beginning of the function to be able to use the helper: if (!css_is_cgroup(css) && css->ss->css_rstat_flush == NULL) return 0; Also, when the return value of css_is_cgroup() is cached as is_cgroup it makes me hate the function name even more, because 'is_cgroup' is very confusing for a css in my opinion since they all represent cgroups. I really think this should be css_is_self() (or css_is_self_cgroup()) and the variable names would be 'is_self'. > + > + /* the root cgrp's self css has rstat_cpu preallocated */ > + if (!css->rstat_cpu) { > + css->rstat_cpu = alloc_percpu(struct css_rstat_cpu); > + if (!css->rstat_cpu) { > + if (is_cgroup) > + free_percpu(cgrp->rstat_base_cpu); > + > return -ENOMEM; > } > } > > /* ->updated_children list is self terminated */ > for_each_possible_cpu(cpu) { > - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); > - struct cgroup_rstat_base_cpu *rstatbc = > - cgroup_rstat_base_cpu(cgrp, cpu); > + struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu); > > - rstatc->updated_children = cgrp; > - u64_stats_init(&rstatbc->bsync); > + rstatc->updated_children = css; > + > + if (is_cgroup) { > + struct cgroup_rstat_base_cpu *rstatbc; > + > + rstatbc = cgroup_rstat_base_cpu(cgrp, cpu); > + u64_stats_init(&rstatbc->bsync); > + } > } > > return 0; [..]