From: Ihor Solodrai <ihor.solodrai@linux.dev>
To: JP Kobryn <inwardvessel@gmail.com>,
tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com,
mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
kernel-team@meta.com, bpf@vger.kernel.org
Subject: Re: [PATCH v4 1/5] cgroup: move rstat base stat objects into their own struct
Date: Thu, 29 May 2025 11:58:17 -0700 [thread overview]
Message-ID: <6f688f2e-7d26-423a-9029-d1b1ef1c938a@linux.dev> (raw)
In-Reply-To: <20250404011050.121777-2-inwardvessel@gmail.com>
On 4/3/25 6:10 PM, JP Kobryn wrote:
> This non-functional change serves as preparation for moving to
> subsystem-based rstat trees. The base stats are not an actual subsystem,
> but in future commits they will have exclusive rstat trees just as other
> subsystems will.
>
> Moving the base stat objects into a new struct allows the cgroup_rstat_cpu
> struct to become more compact since it now only contains the minimum amount
> of pointers needed for rstat participation. Subsystems will (in future
> commits) make use of the compact cgroup_rstat_cpu struct while avoiding the
> memory overhead of the base stat objects which they will not use.
>
> An instance of the new struct cgroup_rstat_base_cpu was placed on the
> cgroup struct so it can retain ownership of these base stats common to all
> cgroups. A helper function was added for looking up the cpu-specific base
> stats of a given cgroup. Finally, initialization and variable names were
> adjusted where applicable.
>
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
> ---
> include/linux/cgroup-defs.h | 38 ++++++++++-------
> kernel/cgroup/cgroup.c | 8 +++-
> kernel/cgroup/rstat.c | 84 ++++++++++++++++++++++---------------
> 3 files changed, 79 insertions(+), 51 deletions(-)
>
Hi everyone.
BPF CI started failing after recent upstream merges (tip: 90b83efa6701).
I bisected the issue to this patch, see a log snippet below [1]:
##[error]#44/9 btf_tag/btf_type_tag_percpu_vmlinux_helper
load_btfs:PASS:could not load vmlinux BTF 0 nsec
test_btf_type_tag_vmlinux_percpu:PASS:btf_type_tag_percpu 0 nsec
libbpf: prog 'test_percpu_helper': BPF program load failed: -EACCES
libbpf: prog 'test_percpu_helper': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
; int BPF_PROG(test_percpu_helper, struct cgroup *cgrp, const char
*path) @ btf_type_tag_percpu.c:58
0: (79) r1 = *(u64 *)(r1 +0)
func 'cgroup_mkdir' arg0 has btf_id 437 type STRUCT 'cgroup'
1: R1_w=trusted_ptr_cgroup()
; cpu = bpf_get_smp_processor_id(); @ btf_type_tag_percpu.c:63
1: (7b) *(u64 *)(r10 -8) = r1 ; R1_w=trusted_ptr_cgroup()
R10=fp0 fp-8_w=trusted_ptr_cgroup()
2: (85) call bpf_get_smp_processor_id#8 ;
R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
3: (79) r1 = *(u64 *)(r10 -8) ; R1_w=trusted_ptr_cgroup()
R10=fp0 fp-8_w=trusted_ptr_cgroup()
; cgrp->self.rstat_cpu, cpu); @ btf_type_tag_percpu.c:65
4: (79) r1 = *(u64 *)(r1 +32) ; R1_w=percpu_ptr_css_rstat_cpu()
; rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr( @
btf_type_tag_percpu.c:64
5: (bc) w2 = w0 ;
R0_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0;
0x1))
R2_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
6: (85) call bpf_per_cpu_ptr#153 ;
R0=ptr_or_null_css_rstat_cpu(id=2)
; if (rstat) { @ btf_type_tag_percpu.c:66
7: (15) if r0 == 0x0 goto pc+1 ; R0=ptr_css_rstat_cpu()
; *(volatile int *)rstat; @ btf_type_tag_percpu.c:68
8: (61) r1 = *(u32 *)(r0 +0)
cannot access ptr member updated_children with moff 0 in struct
css_rstat_cpu with off 0 size 4
processed 9 insns (limit 1000000) max_states_per_insn 0
total_states 1 peak_states 1 mark_read 1
-- END PROG LOAD LOG --
libbpf: prog 'test_percpu_helper': failed to load: -EACCES
libbpf: failed to load object 'btf_type_tag_percpu'
libbpf: failed to load BPF skeleton 'btf_type_tag_percpu': -EACCES
test_btf_type_tag_vmlinux_percpu:FAIL:btf_type_tag_percpu_helper
unexpected error: -13 (errno 13)
The test in question [2]:
SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_percpu_helper, struct cgroup *cgrp, const char *path)
{
struct cgroup_rstat_cpu *rstat;
__u32 cpu;
cpu = bpf_get_smp_processor_id();
rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr(cgrp->rstat_cpu, cpu);
if (rstat) {
/* READ_ONCE */
*(volatile int *)rstat; // BPF verification fails here
}
return 0;
}
Any ideas about how to properly fix this?
Thanks.
[1]
https://github.com/kernel-patches/bpf/actions/runs/15316839796/job/43125242673
[2]
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c#n68
> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index 485b651869d9..6d177f770d28 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -344,10 +344,29 @@ struct cgroup_base_stat {
> * frequency decreases the cost of each read.
> *
> * This struct hosts both the fields which implement the above -
> - * updated_children and updated_next - and the fields which track basic
> - * resource statistics on top of it - bsync, bstat and last_bstat.
> + * updated_children and updated_next.
> */
> struct cgroup_rstat_cpu {
> + /*
> + * Child cgroups with stat updates on this cpu since the last read
> + * are linked on the parent's ->updated_children through
> + * ->updated_next.
> + *
> + * In addition to being more compact, singly-linked list pointing
> + * to the cgroup makes it unnecessary for each per-cpu struct to
> + * point back to the associated cgroup.
> + *
> + * Protected by per-cpu cgroup_rstat_cpu_lock.
> + */
> + struct cgroup *updated_children; /* terminated by self cgroup */
> + struct cgroup *updated_next; /* NULL iff not on the list */
> +};
> +
> +/*
> + * This struct hosts the fields which track basic resource statistics on
> + * top of it - bsync, bstat and last_bstat.
> + */
> +struct cgroup_rstat_base_cpu {
> /*
> * ->bsync protects ->bstat. These are the only fields which get
> * updated in the hot path.
> @@ -374,20 +393,6 @@ struct cgroup_rstat_cpu {
> * deltas to propagate to the per-cpu subtree_bstat.
> */
> struct cgroup_base_stat last_subtree_bstat;
> -
> - /*
> - * Child cgroups with stat updates on this cpu since the last read
> - * are linked on the parent's ->updated_children through
> - * ->updated_next.
> - *
> - * In addition to being more compact, singly-linked list pointing
> - * to the cgroup makes it unnecessary for each per-cpu struct to
> - * point back to the associated cgroup.
> - *
> - * Protected by per-cpu cgroup_rstat_cpu_lock.
> - */
> - struct cgroup *updated_children; /* terminated by self cgroup */
> - struct cgroup *updated_next; /* NULL iff not on the list */
> };
>
> struct cgroup_freezer_state {
> @@ -518,6 +523,7 @@ struct cgroup {
>
> /* per-cpu recursive resource statistics */
> struct cgroup_rstat_cpu __percpu *rstat_cpu;
> + struct cgroup_rstat_base_cpu __percpu *rstat_base_cpu;
> struct list_head rstat_css_list;
>
> /*
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index ac2db99941ca..77349d07b117 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -161,10 +161,14 @@ static struct static_key_true *cgroup_subsys_on_dfl_key[] = {
> };
> #undef SUBSYS
>
> -static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu);
> +static DEFINE_PER_CPU(struct cgroup_rstat_cpu, root_rstat_cpu);
> +static DEFINE_PER_CPU(struct cgroup_rstat_base_cpu, root_rstat_base_cpu);
>
> /* the default hierarchy */
> -struct cgroup_root cgrp_dfl_root = { .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu };
> +struct cgroup_root cgrp_dfl_root = {
> + .cgrp.rstat_cpu = &root_rstat_cpu,
> + .cgrp.rstat_base_cpu = &root_rstat_base_cpu,
> +};
> EXPORT_SYMBOL_GPL(cgrp_dfl_root);
>
> /*
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index 4bb587d5d34f..a20e3ab3f7d3 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -19,6 +19,12 @@ static struct cgroup_rstat_cpu *cgroup_rstat_cpu(struct cgroup *cgrp, int cpu)
> return per_cpu_ptr(cgrp->rstat_cpu, cpu);
> }
>
> +static struct cgroup_rstat_base_cpu *cgroup_rstat_base_cpu(
> + struct cgroup *cgrp, int cpu)
> +{
> + return per_cpu_ptr(cgrp->rstat_base_cpu, cpu);
> +}
> +
> /*
> * Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock).
> *
> @@ -351,12 +357,22 @@ int cgroup_rstat_init(struct cgroup *cgrp)
> return -ENOMEM;
> }
>
> + if (!cgrp->rstat_base_cpu) {
> + cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu);
> + if (!cgrp->rstat_cpu) {
> + free_percpu(cgrp->rstat_cpu);
> + return -ENOMEM;
> + }
> + }
> +
> /* ->updated_children list is self terminated */
> for_each_possible_cpu(cpu) {
> struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
> + struct cgroup_rstat_base_cpu *rstatbc =
> + cgroup_rstat_base_cpu(cgrp, cpu);
>
> rstatc->updated_children = cgrp;
> - u64_stats_init(&rstatc->bsync);
> + u64_stats_init(&rstatbc->bsync);
> }
>
> return 0;
> @@ -379,6 +395,8 @@ void cgroup_rstat_exit(struct cgroup *cgrp)
>
> free_percpu(cgrp->rstat_cpu);
> cgrp->rstat_cpu = NULL;
> + free_percpu(cgrp->rstat_base_cpu);
> + cgrp->rstat_base_cpu = NULL;
> }
>
> void __init cgroup_rstat_boot(void)
> @@ -419,9 +437,9 @@ static void cgroup_base_stat_sub(struct cgroup_base_stat *dst_bstat,
>
> static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
> {
> - struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
> + struct cgroup_rstat_base_cpu *rstatbc = cgroup_rstat_base_cpu(cgrp, cpu);
> struct cgroup *parent = cgroup_parent(cgrp);
> - struct cgroup_rstat_cpu *prstatc;
> + struct cgroup_rstat_base_cpu *prstatbc;
> struct cgroup_base_stat delta;
> unsigned seq;
>
> @@ -431,15 +449,15 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
>
> /* fetch the current per-cpu values */
> do {
> - seq = __u64_stats_fetch_begin(&rstatc->bsync);
> - delta = rstatc->bstat;
> - } while (__u64_stats_fetch_retry(&rstatc->bsync, seq));
> + seq = __u64_stats_fetch_begin(&rstatbc->bsync);
> + delta = rstatbc->bstat;
> + } while (__u64_stats_fetch_retry(&rstatbc->bsync, seq));
>
> /* propagate per-cpu delta to cgroup and per-cpu global statistics */
> - cgroup_base_stat_sub(&delta, &rstatc->last_bstat);
> + cgroup_base_stat_sub(&delta, &rstatbc->last_bstat);
> cgroup_base_stat_add(&cgrp->bstat, &delta);
> - cgroup_base_stat_add(&rstatc->last_bstat, &delta);
> - cgroup_base_stat_add(&rstatc->subtree_bstat, &delta);
> + cgroup_base_stat_add(&rstatbc->last_bstat, &delta);
> + cgroup_base_stat_add(&rstatbc->subtree_bstat, &delta);
>
> /* propagate cgroup and per-cpu global delta to parent (unless that's root) */
> if (cgroup_parent(parent)) {
> @@ -448,73 +466,73 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
> cgroup_base_stat_add(&parent->bstat, &delta);
> cgroup_base_stat_add(&cgrp->last_bstat, &delta);
>
> - delta = rstatc->subtree_bstat;
> - prstatc = cgroup_rstat_cpu(parent, cpu);
> - cgroup_base_stat_sub(&delta, &rstatc->last_subtree_bstat);
> - cgroup_base_stat_add(&prstatc->subtree_bstat, &delta);
> - cgroup_base_stat_add(&rstatc->last_subtree_bstat, &delta);
> + delta = rstatbc->subtree_bstat;
> + prstatbc = cgroup_rstat_base_cpu(parent, cpu);
> + cgroup_base_stat_sub(&delta, &rstatbc->last_subtree_bstat);
> + cgroup_base_stat_add(&prstatbc->subtree_bstat, &delta);
> + cgroup_base_stat_add(&rstatbc->last_subtree_bstat, &delta);
> }
> }
>
> -static struct cgroup_rstat_cpu *
> +static struct cgroup_rstat_base_cpu *
> cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp, unsigned long *flags)
> {
> - struct cgroup_rstat_cpu *rstatc;
> + struct cgroup_rstat_base_cpu *rstatbc;
>
> - rstatc = get_cpu_ptr(cgrp->rstat_cpu);
> - *flags = u64_stats_update_begin_irqsave(&rstatc->bsync);
> - return rstatc;
> + rstatbc = get_cpu_ptr(cgrp->rstat_base_cpu);
> + *flags = u64_stats_update_begin_irqsave(&rstatbc->bsync);
> + return rstatbc;
> }
>
> static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp,
> - struct cgroup_rstat_cpu *rstatc,
> + struct cgroup_rstat_base_cpu *rstatbc,
> unsigned long flags)
> {
> - u64_stats_update_end_irqrestore(&rstatc->bsync, flags);
> + u64_stats_update_end_irqrestore(&rstatbc->bsync, flags);
> cgroup_rstat_updated(cgrp, smp_processor_id());
> - put_cpu_ptr(rstatc);
> + put_cpu_ptr(rstatbc);
> }
>
> void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec)
> {
> - struct cgroup_rstat_cpu *rstatc;
> + struct cgroup_rstat_base_cpu *rstatbc;
> unsigned long flags;
>
> - rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags);
> - rstatc->bstat.cputime.sum_exec_runtime += delta_exec;
> - cgroup_base_stat_cputime_account_end(cgrp, rstatc, flags);
> + rstatbc = cgroup_base_stat_cputime_account_begin(cgrp, &flags);
> + rstatbc->bstat.cputime.sum_exec_runtime += delta_exec;
> + cgroup_base_stat_cputime_account_end(cgrp, rstatbc, flags);
> }
>
> void __cgroup_account_cputime_field(struct cgroup *cgrp,
> enum cpu_usage_stat index, u64 delta_exec)
> {
> - struct cgroup_rstat_cpu *rstatc;
> + struct cgroup_rstat_base_cpu *rstatbc;
> unsigned long flags;
>
> - rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags);
> + rstatbc = cgroup_base_stat_cputime_account_begin(cgrp, &flags);
>
> switch (index) {
> case CPUTIME_NICE:
> - rstatc->bstat.ntime += delta_exec;
> + rstatbc->bstat.ntime += delta_exec;
> fallthrough;
> case CPUTIME_USER:
> - rstatc->bstat.cputime.utime += delta_exec;
> + rstatbc->bstat.cputime.utime += delta_exec;
> break;
> case CPUTIME_SYSTEM:
> case CPUTIME_IRQ:
> case CPUTIME_SOFTIRQ:
> - rstatc->bstat.cputime.stime += delta_exec;
> + rstatbc->bstat.cputime.stime += delta_exec;
> break;
> #ifdef CONFIG_SCHED_CORE
> case CPUTIME_FORCEIDLE:
> - rstatc->bstat.forceidle_sum += delta_exec;
> + rstatbc->bstat.forceidle_sum += delta_exec;
> break;
> #endif
> default:
> break;
> }
>
> - cgroup_base_stat_cputime_account_end(cgrp, rstatc, flags);
> + cgroup_base_stat_cputime_account_end(cgrp, rstatbc, flags);
> }
>
> /*
next prev parent reply other threads:[~2025-05-29 18:58 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 1:10 [PATCH v4 0/5] cgroup: separate rstat trees JP Kobryn
2025-04-04 1:10 ` [PATCH v4 1/5] cgroup: move rstat base stat objects into their own struct JP Kobryn
2025-04-15 17:16 ` Michal Koutný
2025-04-22 12:13 ` Yosry Ahmed
2025-05-29 18:58 ` Ihor Solodrai [this message]
2025-05-29 19:11 ` Yonghong Song
2025-04-04 1:10 ` [PATCH v4 2/5] cgroup: add helper for checking when css is cgroup::self JP Kobryn
2025-04-22 12:19 ` Yosry Ahmed
[not found] ` <68078968.5d0a0220.2c3c35.bab3SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 16:59 ` JP Kobryn
2025-04-04 1:10 ` [PATCH v4 3/5] cgroup: change rstat function signatures from cgroup-based to css-based JP Kobryn
2025-04-04 20:00 ` Tejun Heo
2025-04-04 20:09 ` Tejun Heo
2025-04-04 21:21 ` JP Kobryn
2025-04-22 12:35 ` Yosry Ahmed
2025-04-22 12:39 ` Yosry Ahmed
[not found] ` <68078d3c.050a0220.3d37e.6d82SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 17:10 ` JP Kobryn
2025-04-04 1:10 ` [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-04-15 17:15 ` Michal Koutný
2025-04-16 21:43 ` JP Kobryn
2025-04-17 9:26 ` Michal Koutný
2025-04-17 19:05 ` JP Kobryn
2025-04-17 20:10 ` JP Kobryn
2025-04-21 18:18 ` JP Kobryn
2025-04-22 13:33 ` Yosry Ahmed
[not found] ` <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-30 23:43 ` JP Kobryn
2025-05-06 9:37 ` Yosry Ahmed
2025-04-04 1:10 ` [PATCH v4 5/5] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-04-04 20:28 ` Tejun Heo
2025-04-11 3:31 ` JP Kobryn
2025-04-15 17:15 ` Michal Koutný
2025-04-15 19:30 ` Tejun Heo
2025-04-16 9:50 ` Michal Koutný
2025-04-16 18:10 ` JP Kobryn
2025-04-16 18:14 ` Tejun Heo
2025-04-16 18:01 ` JP Kobryn
2025-04-22 14:01 ` Yosry Ahmed
2025-04-24 17:25 ` Shakeel Butt
[not found] ` <6807a132.df0a0220.28dc80.a1f0SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-25 0:18 ` JP Kobryn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6f688f2e-7d26-423a-9029-d1b1ef1c938a@linux.dev \
--to=ihor.solodrai@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=inwardvessel@gmail.com \
--cc=kernel-team@meta.com \
--cc=linux-mm@kvack.org \
--cc=mkoutny@suse.com \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox