From: Shakeel Butt <shakeelb@google.com>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Hocko" <mhocko@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Muchun Song" <muchun.song@linux.dev>,
"Ivan Babrou" <ivan@cloudflare.com>, "Tejun Heo" <tj@kernel.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Waiman Long" <longman@redhat.com>,
kernel-team@cloudflare.com, "Wei Xu" <weixugc@google.com>,
"Greg Thelen" <gthelen@google.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy
Date: Fri, 15 Sep 2023 01:01:38 +0000 [thread overview]
Message-ID: <20230915010138.knjli6ovpozxbpss@google.com> (raw)
In-Reply-To: <CAJD7tkb_pDxqqhB25aZpbfur=YodBRb1nmdpNZt1DB_1ozDpPQ@mail.gmail.com>
On Thu, Sep 14, 2023 at 04:30:56PM -0700, Yosry Ahmed wrote:
[...]
> >
> > I think first you need to show if this (2 sec stale stats) is really a
> > problem.
>
> That's the thing, my main concern is that if this causes a problem, we
> probably won't be able to tell it was because of stale stats. It's
> very hard to make that connection.
>
Please articulate what the problem would look like which you did in the
use-cases description below, let's discuss there.
> Pre-rstat, reading stats would always yield fresh stats (as much as
> possible). Now the stats can be up to 2s stale, and we don't really
> know how this will affect our existing workloads.
>
Pre-rstat the stat read would traverse the memcg tree. With rstat
the tradeoff was made between expensive read and staleness.
Yeah there
might still be memcg update tree traversal which I would like to remove
completely. However you are saying to
[...]
> >
> > I don't see why userspace OOM killing and proactive reclaim need
> > subsecond accuracy. Please explain.
>
> For proactive reclaim it is not about sub-second accuracy. It is about
> doing the reclaim then reading the stats immediately to see the
> effect. Naturally one would expect that a stat read after reclaim
> would show the system state after reclaim.
>
> For userspace OOM killing I am not really sure. It depends on how
> dynamic the workload is. If a task recently had a spike in memory
> usage causing a threshold to be hit, userspace can kill a different
> task if the stats are stale.
>
Please add above reasoning in your commit message (though I am not
convinced but let's leave it at that).
> I think the whole point is *not* about the amount of staleness. It is
> more about that you expect a stats read after an event to reflect the
> system state after the event.
The whole point is to understand the tradeoff between accuracy and cost
of accuracy. I don't think you want to pay the cost of strong
consistency/ordering between stats reading and an event. My worry is
that you are enforcing a tradeoff which *might* be just applicable to
your use-cases. Anyways this is not something that can not be changed
later.
>
> > Same for system overhead but I can
> > see the complication of two different sources for stats. Can you provide
> > the formula of system overhead? I am wondering why do you need to read
> > stats from memory.stat files. Why not the memory.current of top level
> > cgroups and /proc/meminfo be enough. Something like:
> >
> > Overhead = MemTotal - MemFree - SumOfTopCgroups(memory.current)
>
> We use the amount of compressed memory in zswap from memory.stat,
> which is not accounted as memory usage in cgroup v1.
>
There are zswap stats in /proc/meminfo. Will those work for you?
[...]
> > Fix the in-kernel flushers separately.
>
> The in-kernel flushers are basically facing the same problem. For
> instance, reclaim would expect a stats read after a reclaim iteration
> to reflect the system state after the reclaim iteration.
>
I have not seen any complains on memory reclaim recently. Maybe
reclaim does not really need that such accuracy :P
> > Also the problem Cloudflare is facing does not need to be tied with this.
>
> When we try to wait for flushing to complete we run into the same
> latency problem of the root flush.
Not sure what wait for flushing has to do with Cloudflare's report. They
are ok with no sync flushing at all stat read.
next prev parent reply other threads:[~2023-09-15 1:01 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-13 7:38 [PATCH 0/3] memcg: more sophisticated stats flushing Yosry Ahmed
2023-09-13 7:38 ` [PATCH 1/3] mm: memcg: change flush_next_time to flush_last_time Yosry Ahmed
2023-09-13 7:38 ` [PATCH 2/3] mm: memcg: rename stats_flush_threshold to stats_updates_order Yosry Ahmed
2023-09-13 7:38 ` [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy Yosry Ahmed
2023-09-13 15:37 ` Johannes Weiner
2023-09-13 16:26 ` Yosry Ahmed
2023-09-14 16:06 ` Johannes Weiner
2023-09-14 17:22 ` Yosry Ahmed
2023-09-14 17:26 ` Yosry Ahmed
2023-09-19 5:46 ` Yosry Ahmed
2023-09-14 17:19 ` Waiman Long
2023-09-14 17:23 ` Yosry Ahmed
2023-09-14 17:36 ` Waiman Long
2023-09-14 17:36 ` Shakeel Butt
2023-09-14 17:56 ` Yosry Ahmed
2023-09-14 22:58 ` Shakeel Butt
2023-09-14 23:30 ` Yosry Ahmed
2023-09-15 1:01 ` Shakeel Butt [this message]
2023-09-19 5:29 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230915010138.knjli6ovpozxbpss@google.com \
--to=shakeelb@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=ivan@cloudflare.com \
--cc=kernel-team@cloudflare.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=tj@kernel.org \
--cc=weixugc@google.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox