Hello Qi. On Thu, Mar 05, 2026 at 07:52:48PM +0800, Qi Zheng wrote: > To ensure that these non-hierarchical stats work properly, we need to > reparent these non-hierarchical stats after reparenting LRU folios. To > this end, this commit makes the following preparations: > > 1. implement reparent_state_local() to reparent non-hierarchical stats > 2. make css_killed_work_fn() to be called in rcu work, and implement > get_non_dying_memcg_start() and get_non_dying_memcg_end() to avoid race > between mod_memcg_state()/mod_memcg_lruvec_state() > and reparent_state_local() css_free_rwork_fn has() already RCU deferal but we discussed some reasons why stats reparenting cannot be done from there. IIUC something like: | reparent_state_local() must be already at css_killed_work_fn() because | waiting until css_free_rwork_fn() would mean the non-hierarchical | stats of the surrogate ancestor are outdated, e.g. underflown. | And the waiting until css_free_rwork_fn() may still be indefinite due | to non-folio references to the offlined memcg. Could this be captured in the commit (if it's correct)? > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -6044,8 +6044,9 @@ int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode) > */ > static void css_killed_work_fn(struct work_struct *work) > { > - struct cgroup_subsys_state *css = > - container_of(work, struct cgroup_subsys_state, destroy_work); > + struct cgroup_subsys_state *css; > + > + css = container_of(to_rcu_work(work), struct cgroup_subsys_state, destroy_rwork); > > cgroup_lock(); > > @@ -6066,8 +6067,8 @@ static void css_killed_ref_fn(struct percpu_ref *ref) > container_of(ref, struct cgroup_subsys_state, refcnt); > > if (atomic_dec_and_test(&css->online_cnt)) { > - INIT_WORK(&css->destroy_work, css_killed_work_fn); > - queue_work(cgroup_offline_wq, &css->destroy_work); > + INIT_RCU_WORK(&css->destroy_rwork, css_killed_work_fn); > + queue_rcu_work(cgroup_offline_wq, &css->destroy_rwork); > } > } > Could this be #ifdef CONFIG_MEMCG_V1 /* See get_non_dying_memcg_start, get_non_dying_memcg_end INIT_RCU_WORK(&css->destroy_rwork, css_killed_work_fn); queue_rcu_work(cgroup_offline_wq, &css->destroy_rwork); #else INIT_WORK(&css->destroy_work, css_killed_work_fn); queue_work(cgroup_offline_wq, &css->destroy_work); #endif ? IOW there's no need to make the cgroup release path even more asynchronous without CONFIG_MEMCG_V1 (all this seems CONFIG_MEMCG_V1 specific). (+not so beautiful css_killed_work_fn ifdefing but given there are the variants of _start, _end) > +#ifdef CONFIG_MEMCG_V1 > +/* > + * Used in mod_memcg_state() and mod_memcg_lruvec_state() to avoid race with > + * reparenting of non-hierarchical state_locals. > + */ > +static inline struct mem_cgroup *get_non_dying_memcg_start(struct mem_cgroup *memcg) Nit: I think the idiomatic names are begin + end (in the meaning of paired parenthesis, don't look at css_task_iter_start ;-).