From: Yosry Ahmed <yosryahmed@google.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Hocko" <mhocko@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Muchun Song" <muchun.song@linux.dev>,
"Ivan Babrou" <ivan@cloudflare.com>, "Tejun Heo" <tj@kernel.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Waiman Long" <longman@redhat.com>,
kernel-team@cloudflare.com, "Wei Xu" <weixugc@google.com>,
"Greg Thelen" <gthelen@google.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy
Date: Thu, 14 Sep 2023 10:56:52 -0700 [thread overview]
Message-ID: <CAJD7tkbb5zxEPSW6vvijDpQKCZiDcUJqe8SPvLHfo0RX7iHXAw@mail.gmail.com> (raw)
In-Reply-To: <CALvZod6j5E2MXFz463LRcrP6XY8FedzLUKW88c=ZY39B6mYyMA@mail.gmail.com>
On Thu, Sep 14, 2023 at 10:36 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Sep 13, 2023 at 12:38 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > Stats flushing for memcg currently follows the following rules:
> > - Always flush the entire memcg hierarchy (i.e. flush the root).
> > - Only one flusher is allowed at a time. If someone else tries to flush
> > concurrently, they skip and return immediately.
> > - A periodic flusher flushes all the stats every 2 seconds.
> >
> > The reason this approach is followed is because all flushes are
> > serialized by a global rstat spinlock. On the memcg side, flushing is
> > invoked from userspace reads as well as in-kernel flushers (e.g.
> > reclaim, refault, etc). This approach aims to avoid serializing all
> > flushers on the global lock, which can cause a significant performance
> > hit under high concurrency.
> >
> > This approach has the following problems:
> > - Occasionally a userspace read of the stats of a non-root cgroup will
> > be too expensive as it has to flush the entire hierarchy [1].
>
> This is a real world workload exhibiting the issue which is good.
>
> > - Sometimes the stats accuracy are compromised if there is an ongoing
> > flush, and we skip and return before the subtree of interest is
> > actually flushed. This is more visible when reading stats from
> > userspace, but can also affect in-kernel flushers.
>
> Please provide similar data/justification for the above. In addition:
>
> 1. How much delayed/stale stats have you observed on real world workload?
I am not really sure. We don't have a wide deployment of kernels with
rstat yet. These are problems observed in testing and/or concerns
expressed by our userspace team.
I am trying to solve this now because any problems that result from
this staleness will be very hard to debug and link back to stale
stats.
>
> 2. What is acceptable staleness in the stats for your use-case?
Again, unfortunately I am not sure, but right now it can be O(seconds)
which is not acceptable as we have workloads querying the stats every
1s (and sometimes more frequently).
>
> 3. What is your use-case?
A few use cases we have that may be affected by this:
- System overhead: calculations using memory.usage and some stats from
memory.stat. If one of them is fresh and the other one isn't we have
an inconsistent view of the system.
- Userspace OOM killing: We use some stats in memory.stat to gauge the
amount of memory that will be freed by killing a task as sometimes
memory.usage includes shared resources that wouldn't be freed anyway.
- Proactive reclaim: we read memory.stat in a proactive reclaim
feedback loop, stale stats may cause us to mistakenly think reclaim is
ineffective and prematurely stop.
>
> 4. Does your use-case care about staleness of all the stats in
> memory.stat or some specific stats?
We have multiple use cases that can be affected by this, so I don't
think there are some specific stats. I am also not aware of all
possibly affected use cases.
>
> 5. If some specific stats in memory.stat, does it make sense to
> decouple them from rstat and just pay the price up front to maintain
> them accurately?
>
> Most importantly please please please be concise in your responses.
I try, sometimes I am not sure how much detail is needed. Sorry about that :)
>
> I know I am going back on some of the previous agreements but this
> whole locking back and forth has made in question the original
> motivation.
That's okay. Taking a step back, having flushing being indeterministic
in this way is a time bomb in my opinion. Note that this also affects
in-kernel flushers like reclaim or dirty isolation, which I suspect
will be more affected by staleness. No one complained yet AFAICT, but
I think it's a time bomb. The worst part is that if/when a problem
happens, we won't be able to easily tell what went wrong.
next prev parent reply other threads:[~2023-09-14 17:57 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-13 7:38 [PATCH 0/3] memcg: more sophisticated stats flushing Yosry Ahmed
2023-09-13 7:38 ` [PATCH 1/3] mm: memcg: change flush_next_time to flush_last_time Yosry Ahmed
2023-09-13 7:38 ` [PATCH 2/3] mm: memcg: rename stats_flush_threshold to stats_updates_order Yosry Ahmed
2023-09-13 7:38 ` [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy Yosry Ahmed
2023-09-13 15:37 ` Johannes Weiner
2023-09-13 16:26 ` Yosry Ahmed
2023-09-14 16:06 ` Johannes Weiner
2023-09-14 17:22 ` Yosry Ahmed
2023-09-14 17:26 ` Yosry Ahmed
2023-09-19 5:46 ` Yosry Ahmed
2023-09-14 17:19 ` Waiman Long
2023-09-14 17:23 ` Yosry Ahmed
2023-09-14 17:36 ` Waiman Long
2023-09-14 17:36 ` Shakeel Butt
2023-09-14 17:56 ` Yosry Ahmed [this message]
2023-09-14 22:58 ` Shakeel Butt
2023-09-14 23:30 ` Yosry Ahmed
2023-09-15 1:01 ` Shakeel Butt
2023-09-19 5:29 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJD7tkbb5zxEPSW6vvijDpQKCZiDcUJqe8SPvLHfo0RX7iHXAw@mail.gmail.com \
--to=yosryahmed@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=ivan@cloudflare.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=tj@kernel.org \
--cc=weixugc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox