linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: JP Kobryn <inwardvessel@gmail.com>
Cc: tj@kernel.org, shakeel.butt@linux.dev, mkoutny@suse.com,
	hannes@cmpxchg.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	kernel-team@meta.com
Subject: Re: [PATCH v5 2/5] cgroup: use separate rstat trees for each subsystem
Date: Wed, 7 May 2025 09:24:43 +0000	[thread overview]
Message-ID: <aBsm22A8qWjGJgY9@google.com> (raw)
In-Reply-To: <20250503001222.146355-3-inwardvessel@gmail.com>

On Fri, May 02, 2025 at 05:12:19PM -0700, JP Kobryn wrote:
> Different subsystems may call cgroup_rstat_updated() within the same
> cgroup, resulting in a tree of pending updates from multiple subsystems.
> When one of these subsystems is flushed via cgroup_rstat_flushed(), all
> other subsystems with pending updates on the tree will also be flushed.
> 
> Change the paradigm of having a single rstat tree for all subsystems to
> having separate trees for each subsystem. This separation allows for
> subsystems to perform flushes without the side effects of other subsystems.
> As an example, flushing the cpu stats will no longer cause the memory stats
> to be flushed and vice versa.
> 
> In order to achieve subsystem-specific trees, change the tree node type
> from cgroup to cgroup_subsys_state pointer. Then remove those pointers from
> the cgroup and instead place them on the css. Finally, change update/flush
> functions to make use of the different node type (css). These changes allow
> a specific subsystem to be associated with an update or flush. Separate
> rstat trees will now exist for each unique subsystem.
> 
> Since updating/flushing will now be done at the subsystem level, there is
> no longer a need to keep track of updated css nodes at the cgroup level.
> The list management of these nodes done within the cgroup (rstat_css_list
> and related) has been removed accordingly.
> 
> Conditional guards for checking validity of a given css were placed within
> css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc
> usage in bpf programs. Guards were also placed within css_rstat_init/exit()
> in order to help consolidate calls to them. At call sites for all four
> functions, the existing guards were removed.
> 
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
> ---
>  include/linux/cgroup-defs.h                   |  46 ++--
>  kernel/cgroup/cgroup.c                        |  34 +--
>  kernel/cgroup/rstat.c                         | 200 ++++++++++--------
>  .../selftests/bpf/progs/btf_type_tag_percpu.c |  18 +-
>  4 files changed, 160 insertions(+), 138 deletions(-)
[..]
> @@ -6101,6 +6087,8 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
>  	} else {
>  		css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
>  		BUG_ON(css->id < 0);
> +
> +		BUG_ON(css_rstat_init(css));

We call css_rstat_init() here for subsys css's that are not early
initialized, and in cgroup_setup_root() self css's. We can probably move
both calls into cgroup_init() as I mentioned earlier?

Also, I think this version just skips calling css_rstat_init() for early
initialized subsys css's, without adding the patch that you talked about
earlier which protects against early initialized subsystems using rstat.

>  	}
>  
>  	/* Update the init_css_set to contain a subsys
[..]
> @@ -217,31 +225,32 @@ static struct cgroup *cgroup_rstat_push_children(struct cgroup *head,
>  }
>  
>  /**
> - * cgroup_rstat_updated_list - return a list of updated cgroups to be flushed
> - * @root: root of the cgroup subtree to traverse
> + * css_rstat_updated_list - return a list of updated cgroups to be flushed

css's?

> + * @root: root of the css subtree to traverse
>   * @cpu: target cpu
>   * Return: A singly linked list of cgroups to be flushed
>   *
>   * Walks the updated rstat_cpu tree on @cpu from @root.  During traversal,
> - * each returned cgroup is unlinked from the updated tree.
> + * each returned css is unlinked from the updated tree.
>   *
>   * The only ordering guarantee is that, for a parent and a child pair
>   * covered by a given traversal, the child is before its parent in
>   * the list.
>   *
>   * Note that updated_children is self terminated and points to a list of
> - * child cgroups if not empty. Whereas updated_next is like a sibling link
> - * within the children list and terminated by the parent cgroup. An exception
> + * child css's if not empty. Whereas updated_next is like a sibling link
> + * within the children list and terminated by the parent css. An exception
>   * here is the cgroup root whose updated_next can be self terminated.
>   */
[..]
> @@ -383,32 +395,45 @@ __bpf_kfunc void css_rstat_flush(struct cgroup_subsys_state *css)
>  
>  int css_rstat_init(struct cgroup_subsys_state *css)
>  {
> -	struct cgroup *cgrp = css->cgroup;
> +	struct cgroup *cgrp;
>  	int cpu;
> +	bool is_cgroup = css_is_cgroup(css);
>  
> -	/* the root cgrp has rstat_cpu preallocated */
> -	if (!cgrp->rstat_cpu) {
> -		cgrp->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu);
> -		if (!cgrp->rstat_cpu)
> -			return -ENOMEM;
> -	}
> +	if (is_cgroup) {
> +		cgrp = css->cgroup;

You can keep 'cgrp' initialized at the top of the function to avoid the
extra level of indentation here, right?

>  
> -	if (!cgrp->rstat_base_cpu) {
> -		cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu);
> +		/* the root cgrp has rstat_base_cpu preallocated */
>  		if (!cgrp->rstat_base_cpu) {
> -			free_percpu(cgrp->rstat_cpu);
> +			cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu);
> +			if (!cgrp->rstat_base_cpu)
> +				return -ENOMEM;
> +		}
> +	} else if (css->ss->css_rstat_flush == NULL)
> +		return 0;

We can probably just do this at the beginning of the function to be able
to use the helper:

	if (!css_is_cgroup(css) && css->ss->css_rstat_flush == NULL)
		return 0;

Also, when the return value of css_is_cgroup() is cached as is_cgroup it
makes me hate the function name even more, because 'is_cgroup' is very
confusing for a css in my opinion since they all represent cgroups.

I really think this should be css_is_self() (or css_is_self_cgroup())
and the variable names would be 'is_self'.

> +
> +	/* the root cgrp's self css has rstat_cpu preallocated */
> +	if (!css->rstat_cpu) {
> +		css->rstat_cpu = alloc_percpu(struct css_rstat_cpu);
> +		if (!css->rstat_cpu) {
> +			if (is_cgroup)
> +				free_percpu(cgrp->rstat_base_cpu);
> +
>  			return -ENOMEM;
>  		}
>  	}
>  
>  	/* ->updated_children list is self terminated */
>  	for_each_possible_cpu(cpu) {
> -		struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
> -		struct cgroup_rstat_base_cpu *rstatbc =
> -			cgroup_rstat_base_cpu(cgrp, cpu);
> +		struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu);
>  
> -		rstatc->updated_children = cgrp;
> -		u64_stats_init(&rstatbc->bsync);
> +		rstatc->updated_children = css;
> +
> +		if (is_cgroup) {
> +			struct cgroup_rstat_base_cpu *rstatbc;
> +
> +			rstatbc = cgroup_rstat_base_cpu(cgrp, cpu);
> +			u64_stats_init(&rstatbc->bsync);
> +		}
>  	}
>  
>  	return 0;
[..]


  reply	other threads:[~2025-05-07  9:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-03  0:12 [PATCH v5 0/5] cgroup: separate rstat trees JP Kobryn
2025-05-03  0:12 ` [PATCH v5 1/5] cgroup: use helper for distingushing css in callbacks JP Kobryn
2025-05-06  0:52   ` Shakeel Butt
2025-05-07  9:02   ` Yosry Ahmed
2025-05-09 21:46     ` JP Kobryn
2025-05-03  0:12 ` [PATCH v5 2/5] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-05-07  9:24   ` Yosry Ahmed [this message]
2025-05-09 17:53     ` JP Kobryn
2025-05-12 17:30     ` JP Kobryn
2025-05-03  0:12 ` [PATCH v5 3/5] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-05-07  9:37   ` Yosry Ahmed
2025-05-03  0:12 ` [PATCH v5 4/5] cgroup: helper for checking rstat participation of css JP Kobryn
2025-05-07  9:38   ` Yosry Ahmed
2025-05-03  0:12 ` [PATCH v5 5/5] cgroup: document the rstat per-cpu initialization JP Kobryn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBsm22A8qWjGJgY9@google.com \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=inwardvessel@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-mm@kvack.org \
    --cc=mkoutny@suse.com \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox