Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
       [not found] <20230524011935.719659-1-ming.lei@redhat.com>
@ 2023-05-24  2:06 ` Yosry Ahmed
  2023-05-24  2:37   ` Ming Lei
  2023-05-24  4:04   ` Waiman Long
  0 siblings, 2 replies; 10+ messages in thread
From: Yosry Ahmed @ 2023-05-24  2:06 UTC (permalink / raw)
  To: Ming Lei, Linux-MM, Michal Hocko, Shakeel Butt, Johannes Weiner,
	Roman Gushchin, Muchun Song
  Cc: Jens Axboe, linux-block, Waiman Long, cgroups, Tejun Heo, mkoutny

Hi Ming,

On Tue, May 23, 2023 at 6:21 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> As noted by Michal, the blkg_iostat_set's in the lockless list
> hold reference to blkg's to protect against their removal. Those
> blkg's hold reference to blkcg. When a cgroup is being destroyed,
> cgroup_rstat_flush() is only called at css_release_work_fn() which
> is called when the blkcg reference count reaches 0. This circular
> dependency will prevent blkcg and some blkgs from being freed after
> they are made offline.

I am not at all familiar with blkcg, but does calling
cgroup_rstat_flush() in offline_css() fix the problem? or can items be
added to the lockless list(s) after the blkcg is offlined?

>
> It is less a problem if the cgroup to be destroyed also has other
> controllers like memory that will call cgroup_rstat_flush() which will
> clean up the reference count. If block is the only controller that uses
> rstat, these offline blkcg and blkgs may never be freed leaking more
> and more memory over time.
>
> To prevent this potential memory leak:
>
> - a new cgroup_rstat_css_cpu_flush() function is added to flush stats for
> a given css and cpu. This new function will be called in __blkg_release().
>
> - don't grab bio->bi_blkg when adding the stats into blkcg's per-cpu
> stat list, and this kind of handling is the most fragile part of
> original patch
>
> Based on Waiman's patch:
>
> https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.com/
>
> Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
> Cc: Waiman Long <longman@redhat.com>
> Cc: cgroups@vger.kernel.org
> Cc: Tejun Heo <tj@kernel.org>
> Cc: mkoutny@suse.com
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-cgroup.c     | 15 +++++++++++++--
>  include/linux/cgroup.h |  1 +
>  kernel/cgroup/rstat.c  | 18 ++++++++++++++++++
>  3 files changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 0ce64dd73cfe..5437b6af3955 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -163,10 +163,23 @@ static void blkg_free(struct blkcg_gq *blkg)
>  static void __blkg_release(struct rcu_head *rcu)
>  {
>         struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
> +       struct blkcg *blkcg = blkg->blkcg;
> +       int cpu;
>
>  #ifdef CONFIG_BLK_CGROUP_PUNT_BIO
>         WARN_ON(!bio_list_empty(&blkg->async_bios));
>  #endif
> +       /*
> +        * Flush all the non-empty percpu lockless lists before releasing
> +        * us. Meantime no new bio can refer to this blkg any more given
> +        * the refcnt is killed.
> +        */
> +       for_each_possible_cpu(cpu) {
> +               struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
> +
> +               if (!llist_empty(lhead))
> +                       cgroup_rstat_css_cpu_flush(&blkcg->css, cpu);
> +       }
>
>         /* release the blkcg and parent blkg refs this blkg has been holding */
>         css_put(&blkg->blkcg->css);
> @@ -991,7 +1004,6 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
>                 if (parent && parent->parent)
>                         blkcg_iostat_update(parent, &blkg->iostat.cur,
>                                             &blkg->iostat.last);
> -               percpu_ref_put(&blkg->refcnt);
>         }
>
>  out:
> @@ -2075,7 +2087,6 @@ void blk_cgroup_bio_start(struct bio *bio)
>
>                 llist_add(&bis->lnode, lhead);
>                 WRITE_ONCE(bis->lqueued, true);
> -               percpu_ref_get(&bis->blkg->refcnt);
>         }
>
>         u64_stats_update_end_irqrestore(&bis->sync, flags);
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 885f5395fcd0..97d4764d8e6a 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -695,6 +695,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp);
>  void cgroup_rstat_flush_atomic(struct cgroup *cgrp);
>  void cgroup_rstat_flush_hold(struct cgroup *cgrp);
>  void cgroup_rstat_flush_release(void);
> +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu);
>
>  /*
>   * Basic resource stats.
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index 9c4c55228567..96e7a4e6da72 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -281,6 +281,24 @@ void cgroup_rstat_flush_release(void)
>         spin_unlock_irq(&cgroup_rstat_lock);
>  }
>
> +/**
> + * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu
> + * @css: target css to be flush
> + * @cpu: the cpu that holds the stats to be flush
> + *
> + * A lightweight rstat flush operation for a given css and cpu.
> + * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
> + * isn't used.

(Adding linux-mm and memcg maintainers)
+Linux-MM +Michal Hocko +Shakeel Butt +Johannes Weiner +Roman Gushchin
+Muchun Song

I don't think flushing the stats without holding cgroup_rstat_lock is
safe for memcg stats flushing. mem_cgroup_css_rstat_flush() modifies
some non-percpu data (e.g. memcg->vmstats->state,
memcg->vmstats->state_pending).

Perhaps have this be a separate callback than css_rstat_flush() (e.g.
css_rstat_flush_cpu() or something), so that it's clear what
subsystems support this? In this case, only blkcg would implement this
callback.

> + */
> +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu)
> +{
> +       raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
> +
> +       raw_spin_lock_irq(cpu_lock);
> +       css->ss->css_rstat_flush(css, cpu);

I think we need to check that css_rstat_flush() (or a new callback) is
implemented before calling it here.


> +       raw_spin_unlock_irq(cpu_lock);
> +}
> +
>  int cgroup_rstat_init(struct cgroup *cgrp)
>  {
>         int cpu;
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
  2023-05-24  2:06 ` [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq Yosry Ahmed
@ 2023-05-24  2:37   ` Ming Lei
  2023-05-24  2:43     ` Yosry Ahmed
                       ` (2 more replies)
  2023-05-24  4:04   ` Waiman Long
  1 sibling, 3 replies; 10+ messages in thread
From: Ming Lei @ 2023-05-24  2:37 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Linux-MM, Michal Hocko, Shakeel Butt, Johannes Weiner,
	Roman Gushchin, Muchun Song, Jens Axboe, linux-block,
	Waiman Long, cgroups, Tejun Heo, mkoutny, ming.lei

Hi Yosry,

On Tue, May 23, 2023 at 07:06:38PM -0700, Yosry Ahmed wrote:
> Hi Ming,
> 
> On Tue, May 23, 2023 at 6:21 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > As noted by Michal, the blkg_iostat_set's in the lockless list
> > hold reference to blkg's to protect against their removal. Those
> > blkg's hold reference to blkcg. When a cgroup is being destroyed,
> > cgroup_rstat_flush() is only called at css_release_work_fn() which
> > is called when the blkcg reference count reaches 0. This circular
> > dependency will prevent blkcg and some blkgs from being freed after
> > they are made offline.
> 
> I am not at all familiar with blkcg, but does calling
> cgroup_rstat_flush() in offline_css() fix the problem?

Except for offline, this list needs to be flushed after the associated disk
is deleted.

> or can items be
> added to the lockless list(s) after the blkcg is offlined?

Yeah.

percpu_ref_*get(&blkg->refcnt) still can succeed after the percpu refcnt
is killed in blkg_destroy() which is called from both offline css and
removing disk.

> 
> >
> > It is less a problem if the cgroup to be destroyed also has other
> > controllers like memory that will call cgroup_rstat_flush() which will
> > clean up the reference count. If block is the only controller that uses
> > rstat, these offline blkcg and blkgs may never be freed leaking more
> > and more memory over time.
> >
> > To prevent this potential memory leak:
> >
> > - a new cgroup_rstat_css_cpu_flush() function is added to flush stats for
> > a given css and cpu. This new function will be called in __blkg_release().
> >
> > - don't grab bio->bi_blkg when adding the stats into blkcg's per-cpu
> > stat list, and this kind of handling is the most fragile part of
> > original patch
> >
> > Based on Waiman's patch:
> >
> > https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.com/
> >
> > Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
> > Cc: Waiman Long <longman@redhat.com>
> > Cc: cgroups@vger.kernel.org
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: mkoutny@suse.com
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-cgroup.c     | 15 +++++++++++++--
> >  include/linux/cgroup.h |  1 +
> >  kernel/cgroup/rstat.c  | 18 ++++++++++++++++++
> >  3 files changed, 32 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > index 0ce64dd73cfe..5437b6af3955 100644
> > --- a/block/blk-cgroup.c
> > +++ b/block/blk-cgroup.c
> > @@ -163,10 +163,23 @@ static void blkg_free(struct blkcg_gq *blkg)
> >  static void __blkg_release(struct rcu_head *rcu)
> >  {
> >         struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
> > +       struct blkcg *blkcg = blkg->blkcg;
> > +       int cpu;
> >
> >  #ifdef CONFIG_BLK_CGROUP_PUNT_BIO
> >         WARN_ON(!bio_list_empty(&blkg->async_bios));
> >  #endif
> > +       /*
> > +        * Flush all the non-empty percpu lockless lists before releasing
> > +        * us. Meantime no new bio can refer to this blkg any more given
> > +        * the refcnt is killed.
> > +        */
> > +       for_each_possible_cpu(cpu) {
> > +               struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
> > +
> > +               if (!llist_empty(lhead))
> > +                       cgroup_rstat_css_cpu_flush(&blkcg->css, cpu);
> > +       }
> >
> >         /* release the blkcg and parent blkg refs this blkg has been holding */
> >         css_put(&blkg->blkcg->css);
> > @@ -991,7 +1004,6 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> >                 if (parent && parent->parent)
> >                         blkcg_iostat_update(parent, &blkg->iostat.cur,
> >                                             &blkg->iostat.last);
> > -               percpu_ref_put(&blkg->refcnt);
> >         }
> >
> >  out:
> > @@ -2075,7 +2087,6 @@ void blk_cgroup_bio_start(struct bio *bio)
> >
> >                 llist_add(&bis->lnode, lhead);
> >                 WRITE_ONCE(bis->lqueued, true);
> > -               percpu_ref_get(&bis->blkg->refcnt);
> >         }
> >
> >         u64_stats_update_end_irqrestore(&bis->sync, flags);
> > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> > index 885f5395fcd0..97d4764d8e6a 100644
> > --- a/include/linux/cgroup.h
> > +++ b/include/linux/cgroup.h
> > @@ -695,6 +695,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp);
> >  void cgroup_rstat_flush_atomic(struct cgroup *cgrp);
> >  void cgroup_rstat_flush_hold(struct cgroup *cgrp);
> >  void cgroup_rstat_flush_release(void);
> > +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu);
> >
> >  /*
> >   * Basic resource stats.
> > diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> > index 9c4c55228567..96e7a4e6da72 100644
> > --- a/kernel/cgroup/rstat.c
> > +++ b/kernel/cgroup/rstat.c
> > @@ -281,6 +281,24 @@ void cgroup_rstat_flush_release(void)
> >         spin_unlock_irq(&cgroup_rstat_lock);
> >  }
> >
> > +/**
> > + * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu
> > + * @css: target css to be flush
> > + * @cpu: the cpu that holds the stats to be flush
> > + *
> > + * A lightweight rstat flush operation for a given css and cpu.
> > + * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
> > + * isn't used.
> 
> (Adding linux-mm and memcg maintainers)
> +Linux-MM +Michal Hocko +Shakeel Butt +Johannes Weiner +Roman Gushchin
> +Muchun Song
> 
> I don't think flushing the stats without holding cgroup_rstat_lock is
> safe for memcg stats flushing. mem_cgroup_css_rstat_flush() modifies
> some non-percpu data (e.g. memcg->vmstats->state,
> memcg->vmstats->state_pending).
> 
> Perhaps have this be a separate callback than css_rstat_flush() (e.g.
> css_rstat_flush_cpu() or something), so that it's clear what
> subsystems support this? In this case, only blkcg would implement this
> callback.

Also I guess cgroup_rstat_flush() can be used here too.

BTW, cgroup_rstat_flush() is annotated as might_sleep(), however it won't
sleep actually, so can this might_sleep() be removed?

> 
> > + */
> > +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu)
> > +{
> > +       raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
> > +
> > +       raw_spin_lock_irq(cpu_lock);
> > +       css->ss->css_rstat_flush(css, cpu);
> 
> I think we need to check that css_rstat_flush() (or a new callback) is
> implemented before calling it here.

Good catch!



Thanks,
Ming



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
  2023-05-24  2:37   ` Ming Lei
@ 2023-05-24  2:43     ` Yosry Ahmed
  2023-05-24  4:10     ` Waiman Long
  2023-05-25 14:11     ` Michal Koutný
  2 siblings, 0 replies; 10+ messages in thread
From: Yosry Ahmed @ 2023-05-24  2:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Linux-MM, Michal Hocko, Shakeel Butt, Johannes Weiner,
	Roman Gushchin, Muchun Song, Jens Axboe, linux-block,
	Waiman Long, cgroups, Tejun Heo, mkoutny

On Tue, May 23, 2023 at 7:37 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> Hi Yosry,
>
> On Tue, May 23, 2023 at 07:06:38PM -0700, Yosry Ahmed wrote:
> > Hi Ming,
> >
> > On Tue, May 23, 2023 at 6:21 PM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > As noted by Michal, the blkg_iostat_set's in the lockless list
> > > hold reference to blkg's to protect against their removal. Those
> > > blkg's hold reference to blkcg. When a cgroup is being destroyed,
> > > cgroup_rstat_flush() is only called at css_release_work_fn() which
> > > is called when the blkcg reference count reaches 0. This circular
> > > dependency will prevent blkcg and some blkgs from being freed after
> > > they are made offline.
> >
> > I am not at all familiar with blkcg, but does calling
> > cgroup_rstat_flush() in offline_css() fix the problem?
>
> Except for offline, this list needs to be flushed after the associated disk
> is deleted.
>
> > or can items be
> > added to the lockless list(s) after the blkcg is offlined?
>
> Yeah.
>
> percpu_ref_*get(&blkg->refcnt) still can succeed after the percpu refcnt
> is killed in blkg_destroy() which is called from both offline css and
> removing disk.

I see.

>
> >
> > >
> > > It is less a problem if the cgroup to be destroyed also has other
> > > controllers like memory that will call cgroup_rstat_flush() which will
> > > clean up the reference count. If block is the only controller that uses
> > > rstat, these offline blkcg and blkgs may never be freed leaking more
> > > and more memory over time.
> > >
> > > To prevent this potential memory leak:
> > >
> > > - a new cgroup_rstat_css_cpu_flush() function is added to flush stats for
> > > a given css and cpu. This new function will be called in __blkg_release().
> > >
> > > - don't grab bio->bi_blkg when adding the stats into blkcg's per-cpu
> > > stat list, and this kind of handling is the most fragile part of
> > > original patch
> > >
> > > Based on Waiman's patch:
> > >
> > > https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.com/
> > >
> > > Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
> > > Cc: Waiman Long <longman@redhat.com>
> > > Cc: cgroups@vger.kernel.org
> > > Cc: Tejun Heo <tj@kernel.org>
> > > Cc: mkoutny@suse.com
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >  block/blk-cgroup.c     | 15 +++++++++++++--
> > >  include/linux/cgroup.h |  1 +
> > >  kernel/cgroup/rstat.c  | 18 ++++++++++++++++++
> > >  3 files changed, 32 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > > index 0ce64dd73cfe..5437b6af3955 100644
> > > --- a/block/blk-cgroup.c
> > > +++ b/block/blk-cgroup.c
> > > @@ -163,10 +163,23 @@ static void blkg_free(struct blkcg_gq *blkg)
> > >  static void __blkg_release(struct rcu_head *rcu)
> > >  {
> > >         struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
> > > +       struct blkcg *blkcg = blkg->blkcg;
> > > +       int cpu;
> > >
> > >  #ifdef CONFIG_BLK_CGROUP_PUNT_BIO
> > >         WARN_ON(!bio_list_empty(&blkg->async_bios));
> > >  #endif
> > > +       /*
> > > +        * Flush all the non-empty percpu lockless lists before releasing
> > > +        * us. Meantime no new bio can refer to this blkg any more given
> > > +        * the refcnt is killed.
> > > +        */
> > > +       for_each_possible_cpu(cpu) {
> > > +               struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
> > > +
> > > +               if (!llist_empty(lhead))
> > > +                       cgroup_rstat_css_cpu_flush(&blkcg->css, cpu);
> > > +       }
> > >
> > >         /* release the blkcg and parent blkg refs this blkg has been holding */
> > >         css_put(&blkg->blkcg->css);
> > > @@ -991,7 +1004,6 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> > >                 if (parent && parent->parent)
> > >                         blkcg_iostat_update(parent, &blkg->iostat.cur,
> > >                                             &blkg->iostat.last);
> > > -               percpu_ref_put(&blkg->refcnt);
> > >         }
> > >
> > >  out:
> > > @@ -2075,7 +2087,6 @@ void blk_cgroup_bio_start(struct bio *bio)
> > >
> > >                 llist_add(&bis->lnode, lhead);
> > >                 WRITE_ONCE(bis->lqueued, true);
> > > -               percpu_ref_get(&bis->blkg->refcnt);
> > >         }
> > >
> > >         u64_stats_update_end_irqrestore(&bis->sync, flags);
> > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> > > index 885f5395fcd0..97d4764d8e6a 100644
> > > --- a/include/linux/cgroup.h
> > > +++ b/include/linux/cgroup.h
> > > @@ -695,6 +695,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp);
> > >  void cgroup_rstat_flush_atomic(struct cgroup *cgrp);
> > >  void cgroup_rstat_flush_hold(struct cgroup *cgrp);
> > >  void cgroup_rstat_flush_release(void);
> > > +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu);
> > >
> > >  /*
> > >   * Basic resource stats.
> > > diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> > > index 9c4c55228567..96e7a4e6da72 100644
> > > --- a/kernel/cgroup/rstat.c
> > > +++ b/kernel/cgroup/rstat.c
> > > @@ -281,6 +281,24 @@ void cgroup_rstat_flush_release(void)
> > >         spin_unlock_irq(&cgroup_rstat_lock);
> > >  }
> > >
> > > +/**
> > > + * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu
> > > + * @css: target css to be flush
> > > + * @cpu: the cpu that holds the stats to be flush
> > > + *
> > > + * A lightweight rstat flush operation for a given css and cpu.
> > > + * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
> > > + * isn't used.
> >
> > (Adding linux-mm and memcg maintainers)
> > +Linux-MM +Michal Hocko +Shakeel Butt +Johannes Weiner +Roman Gushchin
> > +Muchun Song
> >
> > I don't think flushing the stats without holding cgroup_rstat_lock is
> > safe for memcg stats flushing. mem_cgroup_css_rstat_flush() modifies
> > some non-percpu data (e.g. memcg->vmstats->state,
> > memcg->vmstats->state_pending).
> >
> > Perhaps have this be a separate callback than css_rstat_flush() (e.g.
> > css_rstat_flush_cpu() or something), so that it's clear what
> > subsystems support this? In this case, only blkcg would implement this
> > callback.
>
> Also I guess cgroup_rstat_flush() can be used here too.

If we don't really care about flushing other subsystems, then yeah
that would be a simpler approach.

>
> BTW, cgroup_rstat_flush() is annotated as might_sleep(), however it won't
> sleep actually, so can this might_sleep() be removed?

It will actually sleep if may_sleep=true, and I have a change in
Andrew's MM tree that removes that *_atomic() variant and makes it
always sleepable.

In cgroup_rstat_flush_locked(), if need_resched() or
spin_needbreak(&cgroup_rstat_lock), we release the lock, reschedule,
and then re-acquire it.

>
> >
> > > + */
> > > +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu)
> > > +{
> > > +       raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
> > > +
> > > +       raw_spin_lock_irq(cpu_lock);
> > > +       css->ss->css_rstat_flush(css, cpu);
> >
> > I think we need to check that css_rstat_flush() (or a new callback) is
> > implemented before calling it here.
>
> Good catch!
>
>
>
> Thanks,
> Ming
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
  2023-05-24  2:37   ` Ming Lei
  2023-05-24  2:43     ` Yosry Ahmed
@ 2023-05-24  4:10     ` Waiman Long
  2023-05-24  4:21       ` Ming Lei
  2023-05-25 14:11     ` Michal Koutný
  2 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2023-05-24  4:10 UTC (permalink / raw)
  To: Ming Lei, Yosry Ahmed
  Cc: Linux-MM, Michal Hocko, Shakeel Butt, Johannes Weiner,
	Roman Gushchin, Muchun Song, Jens Axboe, linux-block, cgroups,
	Tejun Heo, mkoutny


On 5/23/23 22:37, Ming Lei wrote:
> Hi Yosry,
>
> On Tue, May 23, 2023 at 07:06:38PM -0700, Yosry Ahmed wrote:
>> Hi Ming,
>>
>> On Tue, May 23, 2023 at 6:21 PM Ming Lei <ming.lei@redhat.com> wrote:
>>> As noted by Michal, the blkg_iostat_set's in the lockless list
>>> hold reference to blkg's to protect against their removal. Those
>>> blkg's hold reference to blkcg. When a cgroup is being destroyed,
>>> cgroup_rstat_flush() is only called at css_release_work_fn() which
>>> is called when the blkcg reference count reaches 0. This circular
>>> dependency will prevent blkcg and some blkgs from being freed after
>>> they are made offline.
>> I am not at all familiar with blkcg, but does calling
>> cgroup_rstat_flush() in offline_css() fix the problem?
> Except for offline, this list needs to be flushed after the associated disk
> is deleted.
>
>> or can items be
>> added to the lockless list(s) after the blkcg is offlined?
> Yeah.
>
> percpu_ref_*get(&blkg->refcnt) still can succeed after the percpu refcnt
> is killed in blkg_destroy() which is called from both offline css and
> removing disk.

As suggested by Tejun, we can use percpu_ref_tryget(&blkg->refcnt) to 
make sure that we can only take a reference when the blkg is online. I 
think it is a bit safer to take a percpu refcnt to avoid use after free. 
My other concern about your patch is that the per cpu list iterations 
will be done multiple times when a blkcg is destroyed if many blkgs are 
attached to the blkcg. I still prefer to do it once in 
blkcg_destroy_blkgs(). I am going to post an updated version tomorrow 
after some more testings.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
  2023-05-24  4:10     ` Waiman Long
@ 2023-05-24  4:21       ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2023-05-24  4:21 UTC (permalink / raw)
  To: Waiman Long
  Cc: Yosry Ahmed, Linux-MM, Michal Hocko, Shakeel Butt,
	Johannes Weiner, Roman Gushchin, Muchun Song, Jens Axboe,
	linux-block, cgroups, Tejun Heo, mkoutny, ming.lei

On Wed, May 24, 2023 at 12:10:55AM -0400, Waiman Long wrote:
> 
> On 5/23/23 22:37, Ming Lei wrote:
> > Hi Yosry,
> > 
> > On Tue, May 23, 2023 at 07:06:38PM -0700, Yosry Ahmed wrote:
> > > Hi Ming,
> > > 
> > > On Tue, May 23, 2023 at 6:21 PM Ming Lei <ming.lei@redhat.com> wrote:
> > > > As noted by Michal, the blkg_iostat_set's in the lockless list
> > > > hold reference to blkg's to protect against their removal. Those
> > > > blkg's hold reference to blkcg. When a cgroup is being destroyed,
> > > > cgroup_rstat_flush() is only called at css_release_work_fn() which
> > > > is called when the blkcg reference count reaches 0. This circular
> > > > dependency will prevent blkcg and some blkgs from being freed after
> > > > they are made offline.
> > > I am not at all familiar with blkcg, but does calling
> > > cgroup_rstat_flush() in offline_css() fix the problem?
> > Except for offline, this list needs to be flushed after the associated disk
> > is deleted.
> > 
> > > or can items be
> > > added to the lockless list(s) after the blkcg is offlined?
> > Yeah.
> > 
> > percpu_ref_*get(&blkg->refcnt) still can succeed after the percpu refcnt
> > is killed in blkg_destroy() which is called from both offline css and
> > removing disk.
> 
> As suggested by Tejun, we can use percpu_ref_tryget(&blkg->refcnt) to make
> sure that we can only take a reference when the blkg is online. I think it
> is a bit safer to take a percpu refcnt to avoid use after free. My other

blkg_release() does guarantee that no new stat associated with this
blkg can be added any more, so what is the use-after-free?

> concern about your patch is that the per cpu list iterations will be done
> multiple times when a blkcg is destroyed if many blkgs are attached to the

Yeah, but is it one big deal? The percpu list should be flushed just in one
of blkg's release handler.

> blkcg. I still prefer to do it once in blkcg_destroy_blkgs(). I am going to

Problem is that new stat still may be added to this list after
percpu_ref_kill() returns.

To be honest, I really hate to grab/put blkg refcnt for adding/consuming
state, and this way is too fragile.

Thanks,
Ming



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
  2023-05-24  2:37   ` Ming Lei
  2023-05-24  2:43     ` Yosry Ahmed
  2023-05-24  4:10     ` Waiman Long
@ 2023-05-25 14:11     ` Michal Koutný
  2023-05-25 15:25       ` Waiman Long
  2 siblings, 1 reply; 10+ messages in thread
From: Michal Koutný @ 2023-05-25 14:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Yosry Ahmed, Linux-MM, Michal Hocko, Shakeel Butt,
	Johannes Weiner, Roman Gushchin, Muchun Song, Jens Axboe,
	linux-block, Waiman Long, cgroups, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 544 bytes --]

On Wed, May 24, 2023 at 10:37:10AM +0800, Ming Lei <ming.lei@redhat.com> wrote:
> > I am not at all familiar with blkcg, but does calling
> > cgroup_rstat_flush() in offline_css() fix the problem?
> 
> Except for offline, this list needs to be flushed after the associated disk
> is deleted.

Why the second flush trigger?
a) To break another ref-dependency cycle (like on the blkcg side)?
b) To avoid stale data upon device removal?

(Because b) should be unnecessary, a future reader would flush when
needed.)

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
  2023-05-25 14:11     ` Michal Koutný
@ 2023-05-25 15:25       ` Waiman Long
  2023-05-26 21:11         ` Michal Koutný
  0 siblings, 1 reply; 10+ messages in thread
From: Waiman Long @ 2023-05-25 15:25 UTC (permalink / raw)
  To: Michal Koutný, Ming Lei
  Cc: Yosry Ahmed, Linux-MM, Michal Hocko, Shakeel Butt,
	Johannes Weiner, Roman Gushchin, Muchun Song, Jens Axboe,
	linux-block, cgroups, Tejun Heo


On 5/25/23 10:11, Michal Koutný wrote:
> On Wed, May 24, 2023 at 10:37:10AM +0800, Ming Lei <ming.lei@redhat.com> wrote:
>>> I am not at all familiar with blkcg, but does calling
>>> cgroup_rstat_flush() in offline_css() fix the problem?
>> Except for offline, this list needs to be flushed after the associated disk
>> is deleted.
> Why the second flush trigger?
> a) To break another ref-dependency cycle (like on the blkcg side)?
> b) To avoid stale data upon device removal?
>
> (Because b) should be unnecessary, a future reader would flush when
> needed.)

Since the percpu blkg_iostat_set's that are linked in the lockless list 
will be freed if the corresponding blkcg_gq is freed, we need to flush 
the lockless list to avoid potential use-after-free in a future 
cgroup_rstat_flush*() call.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
  2023-05-25 15:25       ` Waiman Long
@ 2023-05-26 21:11         ` Michal Koutný
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Koutný @ 2023-05-26 21:11 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ming Lei, Yosry Ahmed, Linux-MM, Michal Hocko, Shakeel Butt,
	Johannes Weiner, Roman Gushchin, Muchun Song, Jens Axboe,
	linux-block, cgroups, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 600 bytes --]

On Thu, May 25, 2023 at 11:25:05AM -0400, Waiman Long <longman@redhat.com> wrote:
> Since the percpu blkg_iostat_set's that are linked in the lockless list will
> be freed if the corresponding blkcg_gq is freed, we need to flush the
> lockless list to avoid potential use-after-free in a future
> cgroup_rstat_flush*() call.

Ah, so that was meant to the situation post-patch (that removes refcnt
of entries on list lockless).

(It sounded like an answer to Yosry's question about
cgroup_rstat_flush in offline_css in pre-patch version. Nevermind, this
would need other adjustments.)

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
  2023-05-24  2:06 ` [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq Yosry Ahmed
  2023-05-24  2:37   ` Ming Lei
@ 2023-05-24  4:04   ` Waiman Long
  2023-05-24  4:13     ` Yosry Ahmed
  1 sibling, 1 reply; 10+ messages in thread
From: Waiman Long @ 2023-05-24  4:04 UTC (permalink / raw)
  To: Yosry Ahmed, Ming Lei, Linux-MM, Michal Hocko, Shakeel Butt,
	Johannes Weiner, Roman Gushchin, Muchun Song
  Cc: Jens Axboe, linux-block, cgroups, Tejun Heo, mkoutny

On 5/23/23 22:06, Yosry Ahmed wrote:
> Hi Ming,
>
> On Tue, May 23, 2023 at 6:21 PM Ming Lei <ming.lei@redhat.com> wrote:
>> As noted by Michal, the blkg_iostat_set's in the lockless list
>> hold reference to blkg's to protect against their removal. Those
>> blkg's hold reference to blkcg. When a cgroup is being destroyed,
>> cgroup_rstat_flush() is only called at css_release_work_fn() which
>> is called when the blkcg reference count reaches 0. This circular
>> dependency will prevent blkcg and some blkgs from being freed after
>> they are made offline.
> I am not at all familiar with blkcg, but does calling
> cgroup_rstat_flush() in offline_css() fix the problem? or can items be
> added to the lockless list(s) after the blkcg is offlined?
>
>> It is less a problem if the cgroup to be destroyed also has other
>> controllers like memory that will call cgroup_rstat_flush() which will
>> clean up the reference count. If block is the only controller that uses
>> rstat, these offline blkcg and blkgs may never be freed leaking more
>> and more memory over time.
>>
>> To prevent this potential memory leak:
>>
>> - a new cgroup_rstat_css_cpu_flush() function is added to flush stats for
>> a given css and cpu. This new function will be called in __blkg_release().
>>
>> - don't grab bio->bi_blkg when adding the stats into blkcg's per-cpu
>> stat list, and this kind of handling is the most fragile part of
>> original patch
>>
>> Based on Waiman's patch:
>>
>> https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.com/
>>
>> Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
>> Cc: Waiman Long <longman@redhat.com>
>> Cc: cgroups@vger.kernel.org
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: mkoutny@suse.com
>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>> ---
>>   block/blk-cgroup.c     | 15 +++++++++++++--
>>   include/linux/cgroup.h |  1 +
>>   kernel/cgroup/rstat.c  | 18 ++++++++++++++++++
>>   3 files changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
>> index 0ce64dd73cfe..5437b6af3955 100644
>> --- a/block/blk-cgroup.c
>> +++ b/block/blk-cgroup.c
>> @@ -163,10 +163,23 @@ static void blkg_free(struct blkcg_gq *blkg)
>>   static void __blkg_release(struct rcu_head *rcu)
>>   {
>>          struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
>> +       struct blkcg *blkcg = blkg->blkcg;
>> +       int cpu;
>>
>>   #ifdef CONFIG_BLK_CGROUP_PUNT_BIO
>>          WARN_ON(!bio_list_empty(&blkg->async_bios));
>>   #endif
>> +       /*
>> +        * Flush all the non-empty percpu lockless lists before releasing
>> +        * us. Meantime no new bio can refer to this blkg any more given
>> +        * the refcnt is killed.
>> +        */
>> +       for_each_possible_cpu(cpu) {
>> +               struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
>> +
>> +               if (!llist_empty(lhead))
>> +                       cgroup_rstat_css_cpu_flush(&blkcg->css, cpu);
>> +       }
>>
>>          /* release the blkcg and parent blkg refs this blkg has been holding */
>>          css_put(&blkg->blkcg->css);
>> @@ -991,7 +1004,6 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
>>                  if (parent && parent->parent)
>>                          blkcg_iostat_update(parent, &blkg->iostat.cur,
>>                                              &blkg->iostat.last);
>> -               percpu_ref_put(&blkg->refcnt);
>>          }
>>
>>   out:
>> @@ -2075,7 +2087,6 @@ void blk_cgroup_bio_start(struct bio *bio)
>>
>>                  llist_add(&bis->lnode, lhead);
>>                  WRITE_ONCE(bis->lqueued, true);
>> -               percpu_ref_get(&bis->blkg->refcnt);
>>          }
>>
>>          u64_stats_update_end_irqrestore(&bis->sync, flags);
>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>> index 885f5395fcd0..97d4764d8e6a 100644
>> --- a/include/linux/cgroup.h
>> +++ b/include/linux/cgroup.h
>> @@ -695,6 +695,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp);
>>   void cgroup_rstat_flush_atomic(struct cgroup *cgrp);
>>   void cgroup_rstat_flush_hold(struct cgroup *cgrp);
>>   void cgroup_rstat_flush_release(void);
>> +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu);
>>
>>   /*
>>    * Basic resource stats.
>> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
>> index 9c4c55228567..96e7a4e6da72 100644
>> --- a/kernel/cgroup/rstat.c
>> +++ b/kernel/cgroup/rstat.c
>> @@ -281,6 +281,24 @@ void cgroup_rstat_flush_release(void)
>>          spin_unlock_irq(&cgroup_rstat_lock);
>>   }
>>
>> +/**
>> + * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu
>> + * @css: target css to be flush
>> + * @cpu: the cpu that holds the stats to be flush
>> + *
>> + * A lightweight rstat flush operation for a given css and cpu.
>> + * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
>> + * isn't used.
> (Adding linux-mm and memcg maintainers)
> +Linux-MM +Michal Hocko +Shakeel Butt +Johannes Weiner +Roman Gushchin
> +Muchun Song
>
> I don't think flushing the stats without holding cgroup_rstat_lock is
> safe for memcg stats flushing. mem_cgroup_css_rstat_flush() modifies
> some non-percpu data (e.g. memcg->vmstats->state,
> memcg->vmstats->state_pending).
>
> Perhaps have this be a separate callback than css_rstat_flush() (e.g.
> css_rstat_flush_cpu() or something), so that it's clear what
> subsystems support this? In this case, only blkcg would implement this
> callback.

That function is added to call blkcg_rstat_flush() only which flush the 
stat in the blkcg and it should be safe. I agree that we should note 
that in the comment to list the preconditions for calling it.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
  2023-05-24  4:04   ` Waiman Long
@ 2023-05-24  4:13     ` Yosry Ahmed
  0 siblings, 0 replies; 10+ messages in thread
From: Yosry Ahmed @ 2023-05-24  4:13 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ming Lei, Linux-MM, Michal Hocko, Shakeel Butt, Johannes Weiner,
	Roman Gushchin, Muchun Song, Jens Axboe, linux-block, cgroups,
	Tejun Heo, mkoutny

On Tue, May 23, 2023 at 9:04 PM Waiman Long <longman@redhat.com> wrote:
>
> On 5/23/23 22:06, Yosry Ahmed wrote:
> > Hi Ming,
> >
> > On Tue, May 23, 2023 at 6:21 PM Ming Lei <ming.lei@redhat.com> wrote:
> >> As noted by Michal, the blkg_iostat_set's in the lockless list
> >> hold reference to blkg's to protect against their removal. Those
> >> blkg's hold reference to blkcg. When a cgroup is being destroyed,
> >> cgroup_rstat_flush() is only called at css_release_work_fn() which
> >> is called when the blkcg reference count reaches 0. This circular
> >> dependency will prevent blkcg and some blkgs from being freed after
> >> they are made offline.
> > I am not at all familiar with blkcg, but does calling
> > cgroup_rstat_flush() in offline_css() fix the problem? or can items be
> > added to the lockless list(s) after the blkcg is offlined?
> >
> >> It is less a problem if the cgroup to be destroyed also has other
> >> controllers like memory that will call cgroup_rstat_flush() which will
> >> clean up the reference count. If block is the only controller that uses
> >> rstat, these offline blkcg and blkgs may never be freed leaking more
> >> and more memory over time.
> >>
> >> To prevent this potential memory leak:
> >>
> >> - a new cgroup_rstat_css_cpu_flush() function is added to flush stats for
> >> a given css and cpu. This new function will be called in __blkg_release().
> >>
> >> - don't grab bio->bi_blkg when adding the stats into blkcg's per-cpu
> >> stat list, and this kind of handling is the most fragile part of
> >> original patch
> >>
> >> Based on Waiman's patch:
> >>
> >> https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.com/
> >>
> >> Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
> >> Cc: Waiman Long <longman@redhat.com>
> >> Cc: cgroups@vger.kernel.org
> >> Cc: Tejun Heo <tj@kernel.org>
> >> Cc: mkoutny@suse.com
> >> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >> ---
> >>   block/blk-cgroup.c     | 15 +++++++++++++--
> >>   include/linux/cgroup.h |  1 +
> >>   kernel/cgroup/rstat.c  | 18 ++++++++++++++++++
> >>   3 files changed, 32 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> >> index 0ce64dd73cfe..5437b6af3955 100644
> >> --- a/block/blk-cgroup.c
> >> +++ b/block/blk-cgroup.c
> >> @@ -163,10 +163,23 @@ static void blkg_free(struct blkcg_gq *blkg)
> >>   static void __blkg_release(struct rcu_head *rcu)
> >>   {
> >>          struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
> >> +       struct blkcg *blkcg = blkg->blkcg;
> >> +       int cpu;
> >>
> >>   #ifdef CONFIG_BLK_CGROUP_PUNT_BIO
> >>          WARN_ON(!bio_list_empty(&blkg->async_bios));
> >>   #endif
> >> +       /*
> >> +        * Flush all the non-empty percpu lockless lists before releasing
> >> +        * us. Meantime no new bio can refer to this blkg any more given
> >> +        * the refcnt is killed.
> >> +        */
> >> +       for_each_possible_cpu(cpu) {
> >> +               struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
> >> +
> >> +               if (!llist_empty(lhead))
> >> +                       cgroup_rstat_css_cpu_flush(&blkcg->css, cpu);
> >> +       }
> >>
> >>          /* release the blkcg and parent blkg refs this blkg has been holding */
> >>          css_put(&blkg->blkcg->css);
> >> @@ -991,7 +1004,6 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
> >>                  if (parent && parent->parent)
> >>                          blkcg_iostat_update(parent, &blkg->iostat.cur,
> >>                                              &blkg->iostat.last);
> >> -               percpu_ref_put(&blkg->refcnt);
> >>          }
> >>
> >>   out:
> >> @@ -2075,7 +2087,6 @@ void blk_cgroup_bio_start(struct bio *bio)
> >>
> >>                  llist_add(&bis->lnode, lhead);
> >>                  WRITE_ONCE(bis->lqueued, true);
> >> -               percpu_ref_get(&bis->blkg->refcnt);
> >>          }
> >>
> >>          u64_stats_update_end_irqrestore(&bis->sync, flags);
> >> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> >> index 885f5395fcd0..97d4764d8e6a 100644
> >> --- a/include/linux/cgroup.h
> >> +++ b/include/linux/cgroup.h
> >> @@ -695,6 +695,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp);
> >>   void cgroup_rstat_flush_atomic(struct cgroup *cgrp);
> >>   void cgroup_rstat_flush_hold(struct cgroup *cgrp);
> >>   void cgroup_rstat_flush_release(void);
> >> +void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu);
> >>
> >>   /*
> >>    * Basic resource stats.
> >> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> >> index 9c4c55228567..96e7a4e6da72 100644
> >> --- a/kernel/cgroup/rstat.c
> >> +++ b/kernel/cgroup/rstat.c
> >> @@ -281,6 +281,24 @@ void cgroup_rstat_flush_release(void)
> >>          spin_unlock_irq(&cgroup_rstat_lock);
> >>   }
> >>
> >> +/**
> >> + * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu
> >> + * @css: target css to be flush
> >> + * @cpu: the cpu that holds the stats to be flush
> >> + *
> >> + * A lightweight rstat flush operation for a given css and cpu.
> >> + * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
> >> + * isn't used.
> > (Adding linux-mm and memcg maintainers)
> > +Linux-MM +Michal Hocko +Shakeel Butt +Johannes Weiner +Roman Gushchin
> > +Muchun Song
> >
> > I don't think flushing the stats without holding cgroup_rstat_lock is
> > safe for memcg stats flushing. mem_cgroup_css_rstat_flush() modifies
> > some non-percpu data (e.g. memcg->vmstats->state,
> > memcg->vmstats->state_pending).
> >
> > Perhaps have this be a separate callback than css_rstat_flush() (e.g.
> > css_rstat_flush_cpu() or something), so that it's clear what
> > subsystems support this? In this case, only blkcg would implement this
> > callback.
>
> That function is added to call blkcg_rstat_flush() only which flush the
> stat in the blkcg and it should be safe. I agree that we should note
> that in the comment to list the preconditions for calling it.

I think it would be a better API if there's a separate callback for
flushing only protected by percpu locks, that would be implemented by
blkcg only in this case. Anyway, I see v2 dropped the cgroup changes
anyway and is calling blkcg_rstat_flush() directly.

>
> Cheers,
> Longman
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-05-26 21:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230524011935.719659-1-ming.lei@redhat.com>
2023-05-24  2:06 ` [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq Yosry Ahmed
2023-05-24  2:37   ` Ming Lei
2023-05-24  2:43     ` Yosry Ahmed
2023-05-24  4:10     ` Waiman Long
2023-05-24  4:21       ` Ming Lei
2023-05-25 14:11     ` Michal Koutný
2023-05-25 15:25       ` Waiman Long
2023-05-26 21:11         ` Michal Koutný
2023-05-24  4:04   ` Waiman Long
2023-05-24  4:13     ` Yosry Ahmed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox