linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	 Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>,
	 Zefan Li <lizefan.x@bytedance.com>,
	Michal Hocko <mhocko@kernel.org>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Vasily Averin <vasily.averin@linux.dev>,
	 cgroups@vger.kernel.org, linux-block@vger.kernel.org,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	bpf@vger.kernel.org
Subject: Re: [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts
Date: Thu, 23 Mar 2023 12:35:20 -0700	[thread overview]
Message-ID: <CALvZod4z6F2Rr3prKdLqBuWUjippOBoLFw3QFFY7Bk=czm5iHg@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkbtHhzOytu3hfN8tjdAyNq0BZXYN8TEipS4NTApUzkL7w@mail.gmail.com>

On Thu, Mar 23, 2023 at 11:08 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Thu, Mar 23, 2023 at 10:27 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Thu, Mar 23, 2023 at 09:01:12AM -0700, Yosry Ahmed wrote:
> > > On Thu, Mar 23, 2023 at 8:56 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > >
> > > > On Thu, Mar 23, 2023 at 04:00:34AM +0000, Yosry Ahmed wrote:
> > > > > @@ -644,26 +644,26 @@ static void __mem_cgroup_flush_stats(void)
> > > > >               return;
> > > > >
> > > > >       flush_next_time = jiffies_64 + 2*FLUSH_TIME;
> > > > > -     cgroup_rstat_flush(root_mem_cgroup->css.cgroup, false);
> > > > > +     cgroup_rstat_flush(root_mem_cgroup->css.cgroup, may_sleep);
> > > >
> > > > How is it safe to call this with may_sleep=true when it's holding the
> > > > stats_flush_lock?
> > >
> > > stats_flush_lock is always called with trylock, it is only used today
> > > so that we can skip flushing if another cpu is already doing a flush
> > > (which is not 100% correct as they may have not finished flushing yet,
> > > but that's orthogonal here). So I think it should be safe to sleep as
> > > no one can be blocked waiting for this spinlock.
> >
> > I see. It still cannot sleep while the lock is held, though, because
> > preemption is disabled. Make sure you have all lock debugging on while
> > testing this.
>
> Thanks for pointing this out, will do.
>
> >
> > > Perhaps it would be better semantically to replace the spinlock with
> > > an atomic test and set, instead of having a lock that can only be used
> > > with trylock?
> >
> > It could be helpful to clarify what stats_flush_lock is protecting
> > first. Keep in mind that locks should protect data, not code paths.
> >
> > Right now it's doing multiple things:
> >
> > 1. It protects updates to stats_flush_threshold
> > 2. It protects updates to flush_next_time
> > 3. It serializes calls to cgroup_rstat_flush() based on those ratelimits
> >
> > However,
> >
> > 1. stats_flush_threshold is already an atomic
> >
> > 2. flush_next_time is not atomic. The writer is locked, but the reader
> >    is lockless. If the reader races with a flush, you could see this:
> >
> >                                         if (time_after(jiffies, flush_next_time))
> >         spin_trylock()
> >         flush_next_time = now + delay
> >         flush()
> >         spin_unlock()
> >                                         spin_trylock()
> >                                         flush_next_time = now + delay
> >                                         flush()
> >                                         spin_unlock()
> >
> >    which means we already can get flushes at a higher frequency than
> >    FLUSH_TIME during races. But it isn't really a problem.
> >
> >    The reader could also see garbled partial updates, so it needs at
> >    least READ_ONCE and WRITE_ONCE protection.
> >
> > 3. Serializing cgroup_rstat_flush() calls against the ratelimit
> >    factors is currently broken because of the race in 2. But the race
> >    is actually harmless, all we might get is the occasional earlier
> >    flush. If there is no delta, the flush won't do much. And if there
> >    is, the flush is justified.
> >
> > In summary, it seems to me the lock can be ditched altogether. All the
> > code needs is READ_ONCE/WRITE_ONCE around flush_next_time.
>
> Thanks a lot for this analysis. I agree that the lock can be removed
> with proper READ_ONCE/WRITE_ONCE, but I think there is another purpose
> of the lock that we are missing here.
>
> I think one other purpose of the lock is avoiding a thundering herd
> problem on cgroup_rstat_lock, particularly from reclaim context, as
> mentioned by the log of  commit aa48e47e3906 ("memcg: infrastructure
> to flush memcg stats").
>
> While testing, I did notice that removing this lock indeed causes a
> thundering herd problem if we have a lot of concurrent reclaimers. The
> trylock makes sure we abort immediately if someone else is flushing --
> which is not ideal because that flusher might have just started, and
> we may end up reading stale data anyway.
>
> This is why I suggested replacing the lock by an atomic, and do
> something like this if we want to maintain the current behavior:
>
> static void __mem_cgroup_flush_stats(void)
> {
>     ...
>     if (atomic_xchg(&ongoing_flush, 1))
>         return;
>     ...
>     atomic_set(&ongoing_flush, 0)
> }
>
> Alternatively, if we want to change the behavior and wait for the
> concurrent flusher to finish flushing, we can maybe spin until
> ongoing_flush goes back to 0 and then return:
>
> static void __mem_cgroup_flush_stats(void)
> {
>     ...
>     if (atomic_xchg(&ongoing_flush, 1)) {
>         /* wait until the ongoing flusher finishes to get updated stats */
>         while (atomic_read(&ongoing_flush) {};
>         return;
>     }
>     /* flush the stats ourselves */
>     ...
>     atomic_set(&ongoing_flush, 0)
> }
>
> WDYT?

I would go with your first approach i.e. no spinning.


  reply	other threads:[~2023-03-23 19:35 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-23  4:00 [RFC PATCH 0/7] Make rstat flushing IRQ and sleep friendly Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock Yosry Ahmed
2023-03-23  4:29   ` Shakeel Butt
2023-03-23  5:15     ` Yosry Ahmed
2023-03-23  6:33       ` Shakeel Butt
2023-03-23 13:35         ` Yosry Ahmed
2023-03-23 15:40           ` Shakeel Butt
2023-03-23 15:42             ` Yosry Ahmed
2023-03-23 15:46               ` Shakeel Butt
2023-03-23 16:09                 ` Shakeel Butt
2023-03-23 16:17                   ` Yosry Ahmed
2023-03-23 16:29                     ` Shakeel Butt
2023-03-23 16:36                       ` Yosry Ahmed
2023-03-23 16:45                         ` Shakeel Butt
2023-03-23 16:51                           ` Yosry Ahmed
2023-03-23 19:09                             ` Shakeel Butt
2023-03-23 17:33                     ` Johannes Weiner
2023-03-23 18:09                       ` Yosry Ahmed
2023-03-23 18:19                         ` Johannes Weiner
2023-03-24  1:39   ` Tejun Heo
2023-03-24  7:22     ` Yosry Ahmed
2023-03-24 14:12       ` Waiman Long
2023-03-24 22:50         ` Yosry Ahmed
2023-03-25  1:54       ` Tejun Heo
2023-03-25  2:17         ` Yosry Ahmed
2023-03-25  4:30           ` Shakeel Butt
2023-03-25  4:37             ` Yosry Ahmed
2023-03-25  4:46               ` Shakeel Butt
2023-03-27 23:23                 ` Yosry Ahmed
2023-03-29 18:53                   ` Tejun Heo
2023-03-29 19:22                     ` Hugh Dickins
2023-03-29 20:00                       ` Tejun Heo
2023-03-29 20:38                         ` Hugh Dickins
2023-03-30  4:26                           ` Yosry Ahmed
2023-03-31  1:51                           ` Tejun Heo
2023-03-23  4:00 ` [RFC PATCH 2/7] memcg: do not disable interrupts when holding stats_flush_lock Yosry Ahmed
2023-03-23  4:32   ` Shakeel Butt
2023-03-23  5:16     ` Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 3/7] cgroup: rstat: remove cgroup_rstat_flush_irqsafe() Yosry Ahmed
2023-03-23 15:43   ` Johannes Weiner
2023-03-23 15:45     ` Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts Yosry Ahmed
2023-03-23 15:56   ` Johannes Weiner
2023-03-23 16:01     ` Yosry Ahmed
2023-03-23 17:27       ` Johannes Weiner
2023-03-23 18:07         ` Yosry Ahmed
2023-03-23 19:35           ` Shakeel Butt [this message]
2023-03-23  4:00 ` [RFC PATCH 5/7] vmscan: memcg: sleep when flushing stats during reclaim Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 6/7] workingset: memcg: sleep when flushing stats in workingset_refault() Yosry Ahmed
2023-03-23 15:50   ` Johannes Weiner
2023-03-23 16:02     ` Yosry Ahmed
2023-03-23 16:00   ` Johannes Weiner
2023-03-23 16:02     ` Yosry Ahmed
2023-03-23  4:00 ` [RFC PATCH 7/7] memcg: do not modify rstat tree for zero updates Yosry Ahmed
2023-03-23  4:10 ` [RFC PATCH 0/7] Make rstat flushing IRQ and sleep friendly Shakeel Butt
2023-03-23  5:07   ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALvZod4z6F2Rr3prKdLqBuWUjippOBoLFw3QFFY7Bk=czm5iHg@mail.gmail.com' \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=tj@kernel.org \
    --cc=vasily.averin@linux.dev \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox