linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: JP Kobryn <inwardvessel@gmail.com>
Cc: tj@kernel.org, shakeel.butt@linux.dev, mkoutny@suse.com,
	hannes@cmpxchg.org, akpm@linux-foundation.org,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	kernel-team@meta.com
Subject: Re: [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem
Date: Tue, 6 May 2025 09:37:42 +0000	[thread overview]
Message-ID: <aBnYZqJtZLDMKQYU@google.com> (raw)
In-Reply-To: <e8de82fe-feea-4be2-93eb-620b8ece6748@gmail.com>

On Wed, Apr 30, 2025 at 04:43:41PM -0700, JP Kobryn wrote:
> On 4/22/25 6:33 AM, Yosry Ahmed wrote:
> > On Thu, Apr 03, 2025 at 06:10:49PM -0700, JP Kobryn wrote:
> [..]
> > > @@ -5425,6 +5417,9 @@ static void css_free_rwork_fn(struct work_struct *work)
> > >   		struct cgroup_subsys_state *parent = css->parent;
> > >   		int id = css->id;
> > > +		if (ss->css_rstat_flush)
> > > +			css_rstat_exit(css);
> > > +
> > 
> > This call now exists in both branches (self css or not), so it's
> > probably best to pull it outside. We should probably also pull the call
> > in cgroup_destroy_root() outside into css_free_rwork_fn() so that we end
> > up with a single call to css_rstat_exit() (apart from failure paths).
> 
> This can be done if css_rstat_exit() is modified to guard against
> invalid css's like being a subsystem css and not having implemented
> css_rstat_flush.
> 
> > 
> > We can probably also use css_is_cgroup() here instead of 'if (ss)' for
> > consistency.
> > 
> > >   		ss->css_free(css);
> > >   		cgroup_idr_remove(&ss->css_idr, id);
> > >   		cgroup_put(cgrp);
> > [..]
> > > @@ -5659,6 +5647,12 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
> > >   		goto err_free_css;
> > >   	css->id = err;
> > > +	if (ss->css_rstat_flush) {
> > > +		err = css_rstat_init(css);
> > > +		if (err)
> > > +			goto err_free_css;
> > > +	}
> > > +
> > >   	/* @css is ready to be brought online now, make it visible */
> > >   	list_add_tail_rcu(&css->sibling, &parent_css->children);
> > >   	cgroup_idr_replace(&ss->css_idr, css, css->id);
> > > @@ -5672,7 +5666,6 @@ static struct cgroup_subsys_state *css_create(struct cgroup *cgrp,
> > >   err_list_del:
> > >   	list_del_rcu(&css->sibling);
> > >   err_free_css:
> > > -	list_del_rcu(&css->rstat_css_node);
> > >   	INIT_RCU_WORK(&css->destroy_rwork, css_free_rwork_fn);
> > >   	queue_rcu_work(cgroup_destroy_wq, &css->destroy_rwork);
> > >   	return ERR_PTR(err);
> > > @@ -6104,11 +6097,17 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
> > >   	css->flags |= CSS_NO_REF;
> > >   	if (early) {
> > > -		/* allocation can't be done safely during early init */
> > > +		/*
> > > +		 * Allocation can't be done safely during early init.
> > > +		 * Defer IDR and rstat allocations until cgroup_init().
> > > +		 */
> > >   		css->id = 1;
> > >   	} else {
> > >   		css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
> > >   		BUG_ON(css->id < 0);
> > > +
> > > +		if (ss->css_rstat_flush)
> > > +			BUG_ON(css_rstat_init(css));
> > >   	}
> > >   	/* Update the init_css_set to contain a subsys
> > > @@ -6207,9 +6206,17 @@ int __init cgroup_init(void)
> > >   			struct cgroup_subsys_state *css =
> > >   				init_css_set.subsys[ss->id];
> > > +			/*
> > > +			 * It is now safe to perform allocations.
> > > +			 * Finish setting up subsystems that previously
> > > +			 * deferred IDR and rstat allocations.
> > > +			 */
> > >   			css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2,
> > >   						   GFP_KERNEL);
> > >   			BUG_ON(css->id < 0);
> > > +
> > > +			if (ss->css_rstat_flush)
> > > +				BUG_ON(css_rstat_init(css));
> > 
> > The calls to css_rstat_init() are really difficult to track. Let's
> > recap, before this change we had two calls:
> > - In cgroup_setup_root(), for root cgroups.
> > - In cgroup_create(), for non-root cgroups.
> > 
> > This patch adds 3 more, so we end up with 5 calls as follows:
> > - In cgroup_setup_root(), for root self css's.
> > - In cgroup_create(), for non-root self css's.
> > - In cgroup_subsys_init(), for root subsys css's without early
> >    initialization.
> > - In cgroup_init(), for root subsys css's with early
> >    initialization, as we cannot call it from cgroup_subsys_init() early
> >    as allocations are not allowed during early init.
> > - In css_create(), for non-root non-self css's.
> > 
> > We should try to consolidate as much as possible. For example:
> > - Can we always make the call for root subsys css's in cgroup_init(),
> >    regardless of early initialization status? Is there a need to make the
> >    call early for subsystems that use early in cgroup_subsys_init()
> >    initialization?
> > 
> > - Can we always make the call for root css's in cgroup_init(),
> >    regardless of whether the css is a self css or a subsys css? I imagine
> >    we'd still need two separate calls, one outside the loop for the self
> >    css's, and one in the loop for subsys css's, but having them in the
> >    same place should make things easier.
> 
> The answer might be the same for the two questions above. I think at
> best, we can eliminate the single call below to css_rstat_init():
> 
> cgroup_init()
> 	for_each_subsys(ss, ssid)
> 		if (ss->early_init)
> 			css_rstat_init(css) /* remove */
> 
> I'm not sure if it's by design and also an undocumented constraint but I
> find that it is not possible to have a cgroup subsystem that is
> designated for "early init" participate in rstat at the same time. The
> necessary ordering of actions should be:
> 
> init_and_link_css(css, ss, ...)
> css_rstat_init(css)
> css_online(css)
> 
> This sequence occurs within cgroup_init_subsys() when ss->early_init is
> false. However, when early_init is true, the last two calls are
> effectively inverted:
> 
> css_online(css)
> css_rstat_init(css) /* too late */
> 
> This needs to be avoided or else failures will occur during boot.
> 
> Note that even before this series, this constraint seems to have
> existed. Using branch for-6.16 as a base, I experimented with a minimal
> custom test cgroup in which I implement css_rstat_flush while early_init
> is on. The system fails during boot because online_css() is called
> before cgroup_rstat_init().
> 
> cgroup_init_early()
> 	for_each_subsys(ss, ssid)
> 		if (ss->early)
> 			cgroup_init_subsys(ss, true)
> 				css = ss->css_alloc()
> 				online_css(css)
> cgroup_init()
> 	cgroup_setup_root()
> 		cgroup_rstat_init(root_cgrp) /* too late */

I am looking at online_css() and cgroup_rstat_init() and I cannot
immediately see the ordering requirement.

Is the idea that once a css is online an rstat flush can occur, and it
would crash if cgroup_rstat_init() hadn't been called yet? I am
wondering when would the flush occur before cgroup_init() is called.

Anyway, if that is possible I think enforcing this is probably
important.

> 
> Unless I"m missing something, do you think this constraint is worthy of
> a separate patch? Something like this that prevents the combination of
> rstat with early_init:
> 
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6130,7 +6130,8 @@ int __init cgroup_init_early(void)
>     for_each_subsys(ss, i) {
> -       WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id,
> +       WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id ||
> +           (ss->early_init && ss->css_rstat_flush),
> 
> > 
> > Ideally if we can do both the above, we'd end up with 3 calling
> > functions only:
> > - cgroup_init() -> for all root css's.
> > - cgroup_create() -> for non-root self css's.
> > - css_create() -> for non-root subsys css's.
> > 
> > Also, we should probably document all the different call paths for
> > css_rstat_init() and css_rstat_exit() somewhere.
> 
> Will do.


  reply	other threads:[~2025-05-06  9:37 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04  1:10 [PATCH v4 0/5] cgroup: separate rstat trees JP Kobryn
2025-04-04  1:10 ` [PATCH v4 1/5] cgroup: move rstat base stat objects into their own struct JP Kobryn
2025-04-15 17:16   ` Michal Koutný
2025-04-22 12:13   ` Yosry Ahmed
2025-05-29 18:58   ` Ihor Solodrai
2025-05-29 19:11     ` Yonghong Song
2025-04-04  1:10 ` [PATCH v4 2/5] cgroup: add helper for checking when css is cgroup::self JP Kobryn
2025-04-22 12:19   ` Yosry Ahmed
     [not found]   ` <68078968.5d0a0220.2c3c35.bab3SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 16:59     ` JP Kobryn
2025-04-04  1:10 ` [PATCH v4 3/5] cgroup: change rstat function signatures from cgroup-based to css-based JP Kobryn
2025-04-04 20:00   ` Tejun Heo
2025-04-04 20:09     ` Tejun Heo
2025-04-04 21:21       ` JP Kobryn
2025-04-22 12:35   ` Yosry Ahmed
2025-04-22 12:39   ` Yosry Ahmed
     [not found]   ` <68078d3c.050a0220.3d37e.6d82SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-24 17:10     ` JP Kobryn
2025-04-04  1:10 ` [PATCH v4 4/5] cgroup: use separate rstat trees for each subsystem JP Kobryn
2025-04-15 17:15   ` Michal Koutný
2025-04-16 21:43     ` JP Kobryn
2025-04-17  9:26       ` Michal Koutný
2025-04-17 19:05         ` JP Kobryn
2025-04-17 20:10           ` JP Kobryn
2025-04-21 18:18   ` JP Kobryn
2025-04-22 13:33   ` Yosry Ahmed
     [not found]   ` <68079aa7.df0a0220.30a1a0.cbb2SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-30 23:43     ` JP Kobryn
2025-05-06  9:37       ` Yosry Ahmed [this message]
2025-04-04  1:10 ` [PATCH v4 5/5] cgroup: use subsystem-specific rstat locks to avoid contention JP Kobryn
2025-04-04 20:28   ` Tejun Heo
2025-04-11  3:31     ` JP Kobryn
2025-04-15 17:15   ` Michal Koutný
2025-04-15 19:30     ` Tejun Heo
2025-04-16  9:50       ` Michal Koutný
2025-04-16 18:10         ` JP Kobryn
2025-04-16 18:14           ` Tejun Heo
2025-04-16 18:01     ` JP Kobryn
2025-04-22 14:01   ` Yosry Ahmed
2025-04-24 17:25     ` Shakeel Butt
     [not found]   ` <6807a132.df0a0220.28dc80.a1f0SMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-25  0:18     ` JP Kobryn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBnYZqJtZLDMKQYU@google.com \
    --to=yosry.ahmed@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=inwardvessel@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-mm@kvack.org \
    --cc=mkoutny@suse.com \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox