Re: [PATCH v4 1/5] cgroup: move rstat base stat objects into their own struct

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ihor Solodrai <ihor.solodrai@linux.dev>
To: JP Kobryn <inwardvessel@gmail.com>,
	tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com,
	mkoutny@suse.com, hannes@cmpxchg.org, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
	kernel-team@meta.com, bpf@vger.kernel.org
Subject: Re: [PATCH v4 1/5] cgroup: move rstat base stat objects into their own struct
Date: Thu, 29 May 2025 11:58:17 -0700	[thread overview]
Message-ID: <6f688f2e-7d26-423a-9029-d1b1ef1c938a@linux.dev> (raw)
In-Reply-To: <20250404011050.121777-2-inwardvessel@gmail.com>

On 4/3/25 6:10 PM, JP Kobryn wrote:
> This non-functional change serves as preparation for moving to
> subsystem-based rstat trees. The base stats are not an actual subsystem,
> but in future commits they will have exclusive rstat trees just as other
> subsystems will.
> 
> Moving the base stat objects into a new struct allows the cgroup_rstat_cpu
> struct to become more compact since it now only contains the minimum amount
> of pointers needed for rstat participation. Subsystems will (in future
> commits) make use of the compact cgroup_rstat_cpu struct while avoiding the
> memory overhead of the base stat objects which they will not use.
> 
> An instance of the new struct cgroup_rstat_base_cpu was placed on the
> cgroup struct so it can retain ownership of these base stats common to all
> cgroups. A helper function was added for looking up the cpu-specific base
> stats of a given cgroup. Finally, initialization and variable names were
> adjusted where applicable.
> 
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
> ---
>   include/linux/cgroup-defs.h | 38 ++++++++++-------
>   kernel/cgroup/cgroup.c      |  8 +++-
>   kernel/cgroup/rstat.c       | 84 ++++++++++++++++++++++---------------
>   3 files changed, 79 insertions(+), 51 deletions(-)
> 

Hi everyone.

BPF CI started failing after recent upstream merges (tip: 90b83efa6701).
I bisected the issue to this patch, see a log snippet below [1]:

     ##[error]#44/9 btf_tag/btf_type_tag_percpu_vmlinux_helper
     load_btfs:PASS:could not load vmlinux BTF 0 nsec
     test_btf_type_tag_vmlinux_percpu:PASS:btf_type_tag_percpu 0 nsec
     libbpf: prog 'test_percpu_helper': BPF program load failed: -EACCES
     libbpf: prog 'test_percpu_helper': -- BEGIN PROG LOAD LOG --
     0: R1=ctx() R10=fp0
     ; int BPF_PROG(test_percpu_helper, struct cgroup *cgrp, const char 
*path) @ btf_type_tag_percpu.c:58
     0: (79) r1 = *(u64 *)(r1 +0)
     func 'cgroup_mkdir' arg0 has btf_id 437 type STRUCT 'cgroup'
     1: R1_w=trusted_ptr_cgroup()
     ; cpu = bpf_get_smp_processor_id(); @ btf_type_tag_percpu.c:63
     1: (7b) *(u64 *)(r10 -8) = r1         ; R1_w=trusted_ptr_cgroup() 
R10=fp0 fp-8_w=trusted_ptr_cgroup()
     2: (85) call bpf_get_smp_processor_id#8       ; 
R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
     3: (79) r1 = *(u64 *)(r10 -8)         ; R1_w=trusted_ptr_cgroup() 
R10=fp0 fp-8_w=trusted_ptr_cgroup()
     ; cgrp->self.rstat_cpu, cpu); @ btf_type_tag_percpu.c:65
     4: (79) r1 = *(u64 *)(r1 +32)         ; R1_w=percpu_ptr_css_rstat_cpu()
     ; rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr( @ 
btf_type_tag_percpu.c:64
     5: (bc) w2 = w0                       ; 
R0_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 
0x1)) 
R2_w=scalar(id=1,smin=smin32=0,smax=umax=smax32=umax32=1,var_off=(0x0; 0x1))
     6: (85) call bpf_per_cpu_ptr#153      ; 
R0=ptr_or_null_css_rstat_cpu(id=2)
     ; if (rstat) { @ btf_type_tag_percpu.c:66
     7: (15) if r0 == 0x0 goto pc+1        ; R0=ptr_css_rstat_cpu()
     ; *(volatile int *)rstat; @ btf_type_tag_percpu.c:68
     8: (61) r1 = *(u32 *)(r0 +0)
     cannot access ptr member updated_children with moff 0 in struct 
css_rstat_cpu with off 0 size 4
     processed 9 insns (limit 1000000) max_states_per_insn 0 
total_states 1 peak_states 1 mark_read 1
     -- END PROG LOAD LOG --
     libbpf: prog 'test_percpu_helper': failed to load: -EACCES
     libbpf: failed to load object 'btf_type_tag_percpu'
     libbpf: failed to load BPF skeleton 'btf_type_tag_percpu': -EACCES
     test_btf_type_tag_vmlinux_percpu:FAIL:btf_type_tag_percpu_helper 
unexpected error: -13 (errno 13)

The test in question [2]:

SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_percpu_helper, struct cgroup *cgrp, const char *path)
{
	struct cgroup_rstat_cpu *rstat;
	__u32 cpu;

	cpu = bpf_get_smp_processor_id();
	rstat = (struct cgroup_rstat_cpu *)bpf_per_cpu_ptr(cgrp->rstat_cpu, cpu);
	if (rstat) {
		/* READ_ONCE */
		*(volatile int *)rstat; // BPF verification fails here
	}

	return 0;
}

Any ideas about how to properly fix this?

Thanks.

[1] 
https://github.com/kernel-patches/bpf/actions/runs/15316839796/job/43125242673
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/btf_type_tag_percpu.c#n68

> diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
> index 485b651869d9..6d177f770d28 100644
> --- a/include/linux/cgroup-defs.h
> +++ b/include/linux/cgroup-defs.h
> @@ -344,10 +344,29 @@ struct cgroup_base_stat {
>    * frequency decreases the cost of each read.
>    *
>    * This struct hosts both the fields which implement the above -
> - * updated_children and updated_next - and the fields which track basic
> - * resource statistics on top of it - bsync, bstat and last_bstat.
> + * updated_children and updated_next.
>    */
>   struct cgroup_rstat_cpu {
> +	/*
> +	 * Child cgroups with stat updates on this cpu since the last read
> +	 * are linked on the parent's ->updated_children through
> +	 * ->updated_next.
> +	 *
> +	 * In addition to being more compact, singly-linked list pointing
> +	 * to the cgroup makes it unnecessary for each per-cpu struct to
> +	 * point back to the associated cgroup.
> +	 *
> +	 * Protected by per-cpu cgroup_rstat_cpu_lock.
> +	 */
> +	struct cgroup *updated_children;	/* terminated by self cgroup */
> +	struct cgroup *updated_next;		/* NULL iff not on the list */
> +};
> +
> +/*
> + * This struct hosts the fields which track basic resource statistics on
> + * top of it - bsync, bstat and last_bstat.
> + */
> +struct cgroup_rstat_base_cpu {
>   	/*
>   	 * ->bsync protects ->bstat.  These are the only fields which get
>   	 * updated in the hot path.
> @@ -374,20 +393,6 @@ struct cgroup_rstat_cpu {
>   	 * deltas to propagate to the per-cpu subtree_bstat.
>   	 */
>   	struct cgroup_base_stat last_subtree_bstat;
> -
> -	/*
> -	 * Child cgroups with stat updates on this cpu since the last read
> -	 * are linked on the parent's ->updated_children through
> -	 * ->updated_next.
> -	 *
> -	 * In addition to being more compact, singly-linked list pointing
> -	 * to the cgroup makes it unnecessary for each per-cpu struct to
> -	 * point back to the associated cgroup.
> -	 *
> -	 * Protected by per-cpu cgroup_rstat_cpu_lock.
> -	 */
> -	struct cgroup *updated_children;	/* terminated by self cgroup */
> -	struct cgroup *updated_next;		/* NULL iff not on the list */
>   };
>   
>   struct cgroup_freezer_state {
> @@ -518,6 +523,7 @@ struct cgroup {
>   
>   	/* per-cpu recursive resource statistics */
>   	struct cgroup_rstat_cpu __percpu *rstat_cpu;
> +	struct cgroup_rstat_base_cpu __percpu *rstat_base_cpu;
>   	struct list_head rstat_css_list;
>   
>   	/*
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index ac2db99941ca..77349d07b117 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -161,10 +161,14 @@ static struct static_key_true *cgroup_subsys_on_dfl_key[] = {
>   };
>   #undef SUBSYS
>   
> -static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu);
> +static DEFINE_PER_CPU(struct cgroup_rstat_cpu, root_rstat_cpu);
> +static DEFINE_PER_CPU(struct cgroup_rstat_base_cpu, root_rstat_base_cpu);
>   
>   /* the default hierarchy */
> -struct cgroup_root cgrp_dfl_root = { .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu };
> +struct cgroup_root cgrp_dfl_root = {
> +	.cgrp.rstat_cpu = &root_rstat_cpu,
> +	.cgrp.rstat_base_cpu = &root_rstat_base_cpu,
> +};
>   EXPORT_SYMBOL_GPL(cgrp_dfl_root);
>   
>   /*
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index 4bb587d5d34f..a20e3ab3f7d3 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -19,6 +19,12 @@ static struct cgroup_rstat_cpu *cgroup_rstat_cpu(struct cgroup *cgrp, int cpu)
>   	return per_cpu_ptr(cgrp->rstat_cpu, cpu);
>   }
>   
> +static struct cgroup_rstat_base_cpu *cgroup_rstat_base_cpu(
> +		struct cgroup *cgrp, int cpu)
> +{
> +	return per_cpu_ptr(cgrp->rstat_base_cpu, cpu);
> +}
> +
>   /*
>    * Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock).
>    *
> @@ -351,12 +357,22 @@ int cgroup_rstat_init(struct cgroup *cgrp)
>   			return -ENOMEM;
>   	}
>   
> +	if (!cgrp->rstat_base_cpu) {
> +		cgrp->rstat_base_cpu = alloc_percpu(struct cgroup_rstat_base_cpu);
> +		if (!cgrp->rstat_cpu) {
> +			free_percpu(cgrp->rstat_cpu);
> +			return -ENOMEM;
> +		}
> +	}
> +
>   	/* ->updated_children list is self terminated */
>   	for_each_possible_cpu(cpu) {
>   		struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
> +		struct cgroup_rstat_base_cpu *rstatbc =
> +			cgroup_rstat_base_cpu(cgrp, cpu);
>   
>   		rstatc->updated_children = cgrp;
> -		u64_stats_init(&rstatc->bsync);
> +		u64_stats_init(&rstatbc->bsync);
>   	}
>   
>   	return 0;
> @@ -379,6 +395,8 @@ void cgroup_rstat_exit(struct cgroup *cgrp)
>   
>   	free_percpu(cgrp->rstat_cpu);
>   	cgrp->rstat_cpu = NULL;
> +	free_percpu(cgrp->rstat_base_cpu);
> +	cgrp->rstat_base_cpu = NULL;
>   }
>   
>   void __init cgroup_rstat_boot(void)
> @@ -419,9 +437,9 @@ static void cgroup_base_stat_sub(struct cgroup_base_stat *dst_bstat,
>   
>   static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
>   {
> -	struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
> +	struct cgroup_rstat_base_cpu *rstatbc = cgroup_rstat_base_cpu(cgrp, cpu);
>   	struct cgroup *parent = cgroup_parent(cgrp);
> -	struct cgroup_rstat_cpu *prstatc;
> +	struct cgroup_rstat_base_cpu *prstatbc;
>   	struct cgroup_base_stat delta;
>   	unsigned seq;
>   
> @@ -431,15 +449,15 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
>   
>   	/* fetch the current per-cpu values */
>   	do {
> -		seq = __u64_stats_fetch_begin(&rstatc->bsync);
> -		delta = rstatc->bstat;
> -	} while (__u64_stats_fetch_retry(&rstatc->bsync, seq));
> +		seq = __u64_stats_fetch_begin(&rstatbc->bsync);
> +		delta = rstatbc->bstat;
> +	} while (__u64_stats_fetch_retry(&rstatbc->bsync, seq));
>   
>   	/* propagate per-cpu delta to cgroup and per-cpu global statistics */
> -	cgroup_base_stat_sub(&delta, &rstatc->last_bstat);
> +	cgroup_base_stat_sub(&delta, &rstatbc->last_bstat);
>   	cgroup_base_stat_add(&cgrp->bstat, &delta);
> -	cgroup_base_stat_add(&rstatc->last_bstat, &delta);
> -	cgroup_base_stat_add(&rstatc->subtree_bstat, &delta);
> +	cgroup_base_stat_add(&rstatbc->last_bstat, &delta);
> +	cgroup_base_stat_add(&rstatbc->subtree_bstat, &delta);
>   
>   	/* propagate cgroup and per-cpu global delta to parent (unless that's root) */
>   	if (cgroup_parent(parent)) {
> @@ -448,73 +466,73 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
>   		cgroup_base_stat_add(&parent->bstat, &delta);
>   		cgroup_base_stat_add(&cgrp->last_bstat, &delta);
>   
> -		delta = rstatc->subtree_bstat;
> -		prstatc = cgroup_rstat_cpu(parent, cpu);
> -		cgroup_base_stat_sub(&delta, &rstatc->last_subtree_bstat);
> -		cgroup_base_stat_add(&prstatc->subtree_bstat, &delta);
> -		cgroup_base_stat_add(&rstatc->last_subtree_bstat, &delta);
> +		delta = rstatbc->subtree_bstat;
> +		prstatbc = cgroup_rstat_base_cpu(parent, cpu);
> +		cgroup_base_stat_sub(&delta, &rstatbc->last_subtree_bstat);
> +		cgroup_base_stat_add(&prstatbc->subtree_bstat, &delta);
> +		cgroup_base_stat_add(&rstatbc->last_subtree_bstat, &delta);
>   	}
>   }
>   
> -static struct cgroup_rstat_cpu *
> +static struct cgroup_rstat_base_cpu *
>   cgroup_base_stat_cputime_account_begin(struct cgroup *cgrp, unsigned long *flags)
>   {
> -	struct cgroup_rstat_cpu *rstatc;
> +	struct cgroup_rstat_base_cpu *rstatbc;
>   
> -	rstatc = get_cpu_ptr(cgrp->rstat_cpu);
> -	*flags = u64_stats_update_begin_irqsave(&rstatc->bsync);
> -	return rstatc;
> +	rstatbc = get_cpu_ptr(cgrp->rstat_base_cpu);
> +	*flags = u64_stats_update_begin_irqsave(&rstatbc->bsync);
> +	return rstatbc;
>   }
>   
>   static void cgroup_base_stat_cputime_account_end(struct cgroup *cgrp,
> -						 struct cgroup_rstat_cpu *rstatc,
> +						 struct cgroup_rstat_base_cpu *rstatbc,
>   						 unsigned long flags)
>   {
> -	u64_stats_update_end_irqrestore(&rstatc->bsync, flags);
> +	u64_stats_update_end_irqrestore(&rstatbc->bsync, flags);
>   	cgroup_rstat_updated(cgrp, smp_processor_id());
> -	put_cpu_ptr(rstatc);
> +	put_cpu_ptr(rstatbc);
>   }
>   
>   void __cgroup_account_cputime(struct cgroup *cgrp, u64 delta_exec)
>   {
> -	struct cgroup_rstat_cpu *rstatc;
> +	struct cgroup_rstat_base_cpu *rstatbc;
>   	unsigned long flags;
>   
> -	rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags);
> -	rstatc->bstat.cputime.sum_exec_runtime += delta_exec;
> -	cgroup_base_stat_cputime_account_end(cgrp, rstatc, flags);
> +	rstatbc = cgroup_base_stat_cputime_account_begin(cgrp, &flags);
> +	rstatbc->bstat.cputime.sum_exec_runtime += delta_exec;
> +	cgroup_base_stat_cputime_account_end(cgrp, rstatbc, flags);
>   }
>   
>   void __cgroup_account_cputime_field(struct cgroup *cgrp,
>   				    enum cpu_usage_stat index, u64 delta_exec)
>   {
> -	struct cgroup_rstat_cpu *rstatc;
> +	struct cgroup_rstat_base_cpu *rstatbc;
>   	unsigned long flags;
>   
> -	rstatc = cgroup_base_stat_cputime_account_begin(cgrp, &flags);
> +	rstatbc = cgroup_base_stat_cputime_account_begin(cgrp, &flags);
>   
>   	switch (index) {
>   	case CPUTIME_NICE:
> -		rstatc->bstat.ntime += delta_exec;
> +		rstatbc->bstat.ntime += delta_exec;
>   		fallthrough;
>   	case CPUTIME_USER:
> -		rstatc->bstat.cputime.utime += delta_exec;
> +		rstatbc->bstat.cputime.utime += delta_exec;
>   		break;
>   	case CPUTIME_SYSTEM:
>   	case CPUTIME_IRQ:
>   	case CPUTIME_SOFTIRQ:
> -		rstatc->bstat.cputime.stime += delta_exec;
> +		rstatbc->bstat.cputime.stime += delta_exec;
>   		break;
>   #ifdef CONFIG_SCHED_CORE
>   	case CPUTIME_FORCEIDLE:
> -		rstatc->bstat.forceidle_sum += delta_exec;
> +		rstatbc->bstat.forceidle_sum += delta_exec;
>   		break;
>   #endif
>   	default:
>   		break;
>   	}
>   
> -	cgroup_base_stat_cputime_account_end(cgrp, rstatc, flags);
> +	cgroup_base_stat_cputime_account_end(cgrp, rstatbc, flags);
>   }
>   
>   /*

next prev parent reply	other threads:[~2025-05-29 18:58 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04  1:10 [PATCH v4 0/5] cgroup: separate rstat trees JP Kobryn
2025-04-04  1:10 ` [PATCH v4 1/5] cgroup: move rstat base stat objects into their own struct JP Kobryn
2025-04-15 17:16   ` Michal Koutný
2025-04-22 12:13   ` Yosry Ahmed
2025-05-29 18:58   ` Ihor Solodrai [this message]
2025-05-29 19:11     ` Yonghong Song
2025-04-04  1:10 ` [PATCH v4 2/5] cgroup: add helper for checking when css is cgroup::self JP Kobryn
2025-04-22 12:19   ` Yosry Ahmed
     [not found]   ` <68078968.5d0a0220.2c3c35.bab3SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 16:59     ` JP Kobryn
2025-04-04  1:10 ` [PATCH v4 3/5] cgroup: change rstat function signatures from cgroup-based to css-based JP Kobryn
2025-04-04 20:00   ` Tejun Heo
2025-04-04 20:09     ` Tejun Heo
2025-04-04 21:21       ` JP Kobryn
2025-04-22 12:35   ` Yosry Ahmed
2025-04-22 12:39   ` Yosry Ahmed
     [not found]   ` <68078d3c.050a0220.3d37e.6d82SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 17:10     ` JP Kobryn
2025-04-04  1:10 ` [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-04-15 17:15   ` Michal Koutný
2025-04-16 21:43     ` JP Kobryn
2025-04-17  9:26       ` Michal Koutný
2025-04-17 19:05         ` JP Kobryn
2025-04-17 20:10           ` JP Kobryn
2025-04-21 18:18   ` JP Kobryn
2025-04-22 13:33   ` Yosry Ahmed
     [not found]   ` <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-30 23:43     ` JP Kobryn
2025-05-06  9:37       ` Yosry Ahmed
2025-04-04  1:10 ` [PATCH v4 5/5] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-04-04 20:28   ` Tejun Heo
2025-04-11  3:31     ` JP Kobryn
2025-04-15 17:15   ` Michal Koutný
2025-04-15 19:30     ` Tejun Heo
2025-04-16  9:50       ` Michal Koutný
2025-04-16 18:10         ` JP Kobryn
2025-04-16 18:14           ` Tejun Heo
2025-04-16 18:01     ` JP Kobryn
2025-04-22 14:01   ` Yosry Ahmed
2025-04-24 17:25     ` Shakeel Butt
     [not found]   ` <6807a132.df0a0220.28dc80.a1f0SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-25  0:18     ` JP Kobryn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6f688f2e-7d26-423a-9029-d1b1ef1c938a@linux.dev \
    --to=ihor.solodrai@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=inwardvessel@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-mm@kvack.org \
    --cc=mkoutny@suse.com \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox