From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: JP Kobryn <inwardvessel@gmail.com>
Cc: tj@kernel.org, shakeel.butt@linux.dev, mkoutny@suse.com,
hannes@cmpxchg.org, akpm@linux-foundation.org,
linux-mm@kvack.org, cgroups@vger.kernel.org,
kernel-team@meta.com
Subject: Re: [PATCH v5 2/5] cgroup: use separate rstat trees for each subsystem
Date: Wed, 7 May 2025 09:24:43 +0000 [thread overview]
Message-ID: <aBsm22A8qWjGJgY9@google.com> (raw)
In-Reply-To: <20250503001222.146355-3-inwardvessel@gmail.com>
On Fri, May 02, 2025 at 05:12:19PM -0700, JP Kobryn wrote:
> Different subsystems may call cgroup_rstat_updated() within the same
> cgroup, resulting in a tree of pending updates from multiple subsystems.
> When one of these subsystems is flushed via cgroup_rstat_flushed(), all
> other subsystems with pending updates on the tree will also be flushed.
>
> Change the paradigm of having a single rstat tree for all subsystems to
> having separate trees for each subsystem. This separation allows for
> subsystems to perform flushes without the side effects of other subsystems.
> As an example, flushing the cpu stats will no longer cause the memory stats
> to be flushed and vice versa.
>
> In order to achieve subsystem-specific trees, change the tree node type
> from cgroup to cgroup_subsys_state pointer. Then remove those pointers from
> the cgroup and instead place them on the css. Finally, change update/flush
> functions to make use of the different node type (css). These changes allow
> a specific subsystem to be associated with an update or flush. Separate
> rstat trees will now exist for each unique subsystem.
>
> Since updating/flushing will now be done at the subsystem level, there is
> no longer a need to keep track of updated css nodes at the cgroup level.
> The list management of these nodes done within the cgroup (rstat_css_list
> and related) has been removed accordingly.
>
> Conditional guards for checking validity of a given css were placed within
> css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc
> usage in bpf programs. Guards were also placed within css_rstat_init/exit()
> in order to help consolidate calls to them. At call sites for all four
> functions, the existing guards were removed.
>
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
> ---
> include/linux/cgroup-defs.h | 46 ++--
> kernel/cgroup/cgroup.c | 34 +--
> kernel/cgroup/rstat.c | 200 ++++++++++--------
> .../selftests/bpf/progs/btf_type_tag_percpu.c | 18 +-
> 4 files changed, 160 insertions(+), 138 deletions(-)
[..]
> @@ -6101,6 +6087,8 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
> } else {
> css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
> BUG_ON(css->id < 0);
> +
> + BUG_ON(css_rstat_init(css));
We call css_rstat_init() here for subsys css's that are not early
initialized, and in cgroup_setup_root() self css's. We can probably move
both calls into cgroup_init() as I mentioned earlier?
Also, I think this version just skips calling css_rstat_init() for early
initialized subsys css's, without adding the patch that you talked about
earlier which protects against early initialized subsystems using rstat.
> }
>
> /* Update the init_css_set to contain a subsys
[..]
> @@ -217,31 +225,32 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head,
> }
>
> /**
> - * cgroup_rstat_updated_list - return a list of updated cgroups to be flushed
> - * @root: root of the cgroup subtree to traverse
> + * css_rstat_updated_list - return a list of updated cgroups to be flushed
css's?
> + * @root: root of the css subtree to traverse
> * @cpu: target cpu
> * Return: A singly linked list of cgroups to be flushed
> *
> * Walks the updated rstat_cpu tree on @cpu from @root. During traversal,
> - * each returned cgroup is unlinked from the updated tree.
> + * each returned css is unlinked from the updated tree.
> *
> * The only ordering guarantee is that, for a parent and a child pair
> * covered by a given traversal, the child is before its parent in
> * the list.
> *
> * Note that updated_children is self terminated and points to a list of
> - * child cgroups if not empty. Whereas updated_next is like a sibling link
> - * within the children list and terminated by the parent cgroup. An exception
> + * child css's if not empty. Whereas updated_next is like a sibling link
> + * within the children list and terminated by the parent css. An exception
> * here is the cgroup root whose updated_next can be self terminated.
> */
[..]
> @@ -383,32 +395,45 @@ __bpf_kfunc void css_rstat_flush(struct cgroup_subsys_state *css)
>
> int css_rstat_init(struct cgroup_subsys_state *css)
> {
> - struct cgroup *cgrp = css->cgroup;
> + struct cgroup *cgrp;
> int cpu;
> + bool is_cgroup = css_is_cgroup(css);
>
> - /* the root cgrp has rstat_cpu preallocated */
> - if (!cgrp->rstat_cpu) {
> - cgrp->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu);
> - if (!cgrp->rstat_cpu)
> - return -ENOMEM;
> - }
> + if (is_cgroup) {
> + cgrp = css->cgroup;
You can keep 'cgrp' initialized at the top of the function to avoid the
extra level of indentation here, right?
>
> - if (!cgrp->rstat_base_cpu) {
> - cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu);
> + /* the root cgrp has rstat_base_cpu preallocated */
> if (!cgrp->rstat_base_cpu) {
> - free_percpu(cgrp->rstat_cpu);
> + cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu);
> + if (!cgrp->rstat_base_cpu)
> + return -ENOMEM;
> + }
> + } else if (css->ss->css_rstat_flush == NULL)
> + return 0;
We can probably just do this at the beginning of the function to be able
to use the helper:
if (!css_is_cgroup(css) && css->ss->css_rstat_flush == NULL)
return 0;
Also, when the return value of css_is_cgroup() is cached as is_cgroup it
makes me hate the function name even more, because 'is_cgroup' is very
confusing for a css in my opinion since they all represent cgroups.
I really think this should be css_is_self() (or css_is_self_cgroup())
and the variable names would be 'is_self'.
> +
> + /* the root cgrp's self css has rstat_cpu preallocated */
> + if (!css->rstat_cpu) {
> + css->rstat_cpu = alloc_percpu(struct css_rstat_cpu);
> + if (!css->rstat_cpu) {
> + if (is_cgroup)
> + free_percpu(cgrp->rstat_base_cpu);
> +
> return -ENOMEM;
> }
> }
>
> /* ->updated_children list is self terminated */
> for_each_possible_cpu(cpu) {
> - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
> - struct cgroup_rstat_base_cpu *rstatbc =
> - cgroup_rstat_base_cpu(cgrp, cpu);
> + struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu);
>
> - rstatc->updated_children = cgrp;
> - u64_stats_init(&rstatbc->bsync);
> + rstatc->updated_children = css;
> +
> + if (is_cgroup) {
> + struct cgroup_rstat_base_cpu *rstatbc;
> +
> + rstatbc = cgroup_rstat_base_cpu(cgrp, cpu);
> + u64_stats_init(&rstatbc->bsync);
> + }
> }
>
> return 0;
[..]
next prev parent reply other threads:[~2025-05-07 9:24 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-03 0:12 [PATCH v5 0/5] cgroup: separate rstat trees JP Kobryn
2025-05-03 0:12 ` [PATCH v5 1/5] cgroup: use helper for distingushing css in callbacks JP Kobryn
2025-05-06 0:52 ` Shakeel Butt
2025-05-07 9:02 ` Yosry Ahmed
2025-05-09 21:46 ` JP Kobryn
2025-05-03 0:12 ` [PATCH v5 2/5] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-05-07 9:24 ` Yosry Ahmed [this message]
2025-05-09 17:53 ` JP Kobryn
2025-05-12 17:30 ` JP Kobryn
2025-05-03 0:12 ` [PATCH v5 3/5] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-05-07 9:37 ` Yosry Ahmed
2025-05-03 0:12 ` [PATCH v5 4/5] cgroup: helper for checking rstat participation of css JP Kobryn
2025-05-07 9:38 ` Yosry Ahmed
2025-05-03 0:12 ` [PATCH v5 5/5] cgroup: document the rstat per-cpu initialization JP Kobryn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBsm22A8qWjGJgY9@google.com \
--to=yosry.ahmed@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=inwardvessel@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-mm@kvack.org \
--cc=mkoutny@suse.com \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox