From: Johannes Weiner <hannes@cmpxchg.org>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Tejun Heo <tj@kernel.org>, Josef Bacik <josef@toxicpanda.com>,
Jens Axboe <axboe@kernel.dk>, Zefan Li <lizefan.x@bytedance.com>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeelb@google.com>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Vasily Averin <vasily.averin@linux.dev>,
cgroups@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
bpf@vger.kernel.org
Subject: Re: [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts
Date: Thu, 23 Mar 2023 13:27:32 -0400 [thread overview]
Message-ID: <20230323172732.GE739026@cmpxchg.org> (raw)
In-Reply-To: <CAJD7tkZ7Dz9myftc9bg7jhiaOYcn7qJ+V4sxZ_2kfnb+k=zhJQ@mail.gmail.com>
On Thu, Mar 23, 2023 at 09:01:12AM -0700, Yosry Ahmed wrote:
> On Thu, Mar 23, 2023 at 8:56 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Thu, Mar 23, 2023 at 04:00:34AM +0000, Yosry Ahmed wrote:
> > > @@ -644,26 +644,26 @@ static void __mem_cgroup_flush_stats(void)
> > > return;
> > >
> > > flush_next_time = jiffies_64 + 2*FLUSH_TIME;
> > > - cgroup_rstat_flush(root_mem_cgroup->css.cgroup, false);
> > > + cgroup_rstat_flush(root_mem_cgroup->css.cgroup, may_sleep);
> >
> > How is it safe to call this with may_sleep=true when it's holding the
> > stats_flush_lock?
>
> stats_flush_lock is always called with trylock, it is only used today
> so that we can skip flushing if another cpu is already doing a flush
> (which is not 100% correct as they may have not finished flushing yet,
> but that's orthogonal here). So I think it should be safe to sleep as
> no one can be blocked waiting for this spinlock.
I see. It still cannot sleep while the lock is held, though, because
preemption is disabled. Make sure you have all lock debugging on while
testing this.
> Perhaps it would be better semantically to replace the spinlock with
> an atomic test and set, instead of having a lock that can only be used
> with trylock?
It could be helpful to clarify what stats_flush_lock is protecting
first. Keep in mind that locks should protect data, not code paths.
Right now it's doing multiple things:
1. It protects updates to stats_flush_threshold
2. It protects updates to flush_next_time
3. It serializes calls to cgroup_rstat_flush() based on those ratelimits
However,
1. stats_flush_threshold is already an atomic
2. flush_next_time is not atomic. The writer is locked, but the reader
is lockless. If the reader races with a flush, you could see this:
if (time_after(jiffies, flush_next_time))
spin_trylock()
flush_next_time = now + delay
flush()
spin_unlock()
spin_trylock()
flush_next_time = now + delay
flush()
spin_unlock()
which means we already can get flushes at a higher frequency than
FLUSH_TIME during races. But it isn't really a problem.
The reader could also see garbled partial updates, so it needs at
least READ_ONCE and WRITE_ONCE protection.
3. Serializing cgroup_rstat_flush() calls against the ratelimit
factors is currently broken because of the race in 2. But the race
is actually harmless, all we might get is the occasional earlier
flush. If there is no delta, the flush won't do much. And if there
is, the flush is justified.
In summary, it seems to me the lock can be ditched altogether. All the
code needs is READ_ONCE/WRITE_ONCE around flush_next_time.
next prev parent reply other threads:[~2023-03-23 17:27 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-23 4:00 [RFC PATCH 0/7] Make rstat flushing IRQ and sleep friendly Yosry Ahmed
2023-03-23 4:00 ` [RFC PATCH 1/7] cgroup: rstat: only disable interrupts for the percpu lock Yosry Ahmed
2023-03-23 4:29 ` Shakeel Butt
2023-03-23 5:15 ` Yosry Ahmed
2023-03-23 6:33 ` Shakeel Butt
2023-03-23 13:35 ` Yosry Ahmed
2023-03-23 15:40 ` Shakeel Butt
2023-03-23 15:42 ` Yosry Ahmed
2023-03-23 15:46 ` Shakeel Butt
2023-03-23 16:09 ` Shakeel Butt
2023-03-23 16:17 ` Yosry Ahmed
2023-03-23 16:29 ` Shakeel Butt
2023-03-23 16:36 ` Yosry Ahmed
2023-03-23 16:45 ` Shakeel Butt
2023-03-23 16:51 ` Yosry Ahmed
2023-03-23 19:09 ` Shakeel Butt
2023-03-23 17:33 ` Johannes Weiner
2023-03-23 18:09 ` Yosry Ahmed
2023-03-23 18:19 ` Johannes Weiner
2023-03-24 1:39 ` Tejun Heo
2023-03-24 7:22 ` Yosry Ahmed
2023-03-24 14:12 ` Waiman Long
2023-03-24 22:50 ` Yosry Ahmed
2023-03-25 1:54 ` Tejun Heo
2023-03-25 2:17 ` Yosry Ahmed
2023-03-25 4:30 ` Shakeel Butt
2023-03-25 4:37 ` Yosry Ahmed
2023-03-25 4:46 ` Shakeel Butt
2023-03-27 23:23 ` Yosry Ahmed
2023-03-29 18:53 ` Tejun Heo
2023-03-29 19:22 ` Hugh Dickins
2023-03-29 20:00 ` Tejun Heo
2023-03-29 20:38 ` Hugh Dickins
2023-03-30 4:26 ` Yosry Ahmed
2023-03-31 1:51 ` Tejun Heo
2023-03-23 4:00 ` [RFC PATCH 2/7] memcg: do not disable interrupts when holding stats_flush_lock Yosry Ahmed
2023-03-23 4:32 ` Shakeel Butt
2023-03-23 5:16 ` Yosry Ahmed
2023-03-23 4:00 ` [RFC PATCH 3/7] cgroup: rstat: remove cgroup_rstat_flush_irqsafe() Yosry Ahmed
2023-03-23 15:43 ` Johannes Weiner
2023-03-23 15:45 ` Yosry Ahmed
2023-03-23 4:00 ` [RFC PATCH 4/7] memcg: sleep during flushing stats in safe contexts Yosry Ahmed
2023-03-23 15:56 ` Johannes Weiner
2023-03-23 16:01 ` Yosry Ahmed
2023-03-23 17:27 ` Johannes Weiner [this message]
2023-03-23 18:07 ` Yosry Ahmed
2023-03-23 19:35 ` Shakeel Butt
2023-03-23 4:00 ` [RFC PATCH 5/7] vmscan: memcg: sleep when flushing stats during reclaim Yosry Ahmed
2023-03-23 4:00 ` [RFC PATCH 6/7] workingset: memcg: sleep when flushing stats in workingset_refault() Yosry Ahmed
2023-03-23 15:50 ` Johannes Weiner
2023-03-23 16:02 ` Yosry Ahmed
2023-03-23 16:00 ` Johannes Weiner
2023-03-23 16:02 ` Yosry Ahmed
2023-03-23 4:00 ` [RFC PATCH 7/7] memcg: do not modify rstat tree for zero updates Yosry Ahmed
2023-03-23 4:10 ` [RFC PATCH 0/7] Make rstat flushing IRQ and sleep friendly Shakeel Butt
2023-03-23 5:07 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230323172732.GE739026@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=josef@toxicpanda.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=tj@kernel.org \
--cc=vasily.averin@linux.dev \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox