Re: [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosryahmed@google.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Ivan Babrou" <ivan@cloudflare.com>, "Tejun Heo" <tj@kernel.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Waiman Long" <longman@redhat.com>,
	kernel-team@cloudflare.com, "Wei Xu" <weixugc@google.com>,
	"Greg Thelen" <gthelen@google.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy
Date: Mon, 18 Sep 2023 22:29:38 -0700	[thread overview]
Message-ID: <CAJD7tka-Rzn77J4cDwVb1jqiMF0XFsTHpJLAVsMbVTTyxZZVew@mail.gmail.com> (raw)
In-Reply-To: <20230915010138.knjli6ovpozxbpss@google.com>

On Thu, Sep 14, 2023 at 6:01 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Thu, Sep 14, 2023 at 04:30:56PM -0700, Yosry Ahmed wrote:
> [...]
> > >
> > > I think first you need to show if this (2 sec stale stats) is really a
> > > problem.
> >
> > That's the thing, my main concern is that if this causes a problem, we
> > probably won't be able to tell it was because of stale stats. It's
> > very hard to make that connection.
> >
>
> Please articulate what the problem would look like which you did in the
> use-cases description below, let's discuss there.
>
> > Pre-rstat, reading stats would always yield fresh stats (as much as
> > possible). Now the stats can be up to 2s stale, and we don't really
> > know how this will affect our existing workloads.
> >
>
> Pre-rstat the stat read would traverse the memcg tree. With rstat
> the tradeoff was made between expensive read and staleness.
> Yeah there
> might still be memcg update tree traversal which I would like to remove
> completely. However you are saying to

I think this sentence is truncated.

>
> [...]
> > >
> > > I don't see why userspace OOM killing and proactive reclaim need
> > > subsecond accuracy. Please explain.
> >
> > For proactive reclaim it is not about sub-second accuracy. It is about
> > doing the reclaim then reading the stats immediately to see the
> > effect. Naturally one would expect that a stat read after reclaim
> > would show the system state after reclaim.
> >
> > For userspace OOM killing I am not really sure. It depends on how
> > dynamic the workload is. If a task recently had a spike in memory
> > usage causing a threshold to be hit, userspace can kill a different
> > task if the stats are stale.
> >
>
> Please add above reasoning in your commit message (though I am not
> convinced but let's leave it at that).

Will do in the next version, thanks.

>
> > I think the whole point is *not* about the amount of staleness. It is
> > more about that you expect a stats read after an event to reflect the
> > system state after the event.
>
> The whole point is to understand the tradeoff between accuracy and cost
> of accuracy. I don't think you want to pay the cost of strong
> consistency/ordering between stats reading and an event. My worry is
> that you are enforcing a tradeoff which *might* be just applicable to
> your use-cases. Anyways this is not something that can not be changed
> later.

Given the numbers I got with the patch, it doesn't seem like we are
paying a significant cost for the accuracy. Anyway, as you say, it's
not something that can not be changed. In fact, I have another
proposal that I am currently testing, please see my next response to
Johannes.

>
> >
> > > Same for system overhead but I can
> > > see the complication of two different sources for stats. Can you provide
> > > the formula of system overhead? I am wondering why do you need to read
> > > stats from memory.stat files. Why not the memory.current of top level
> > > cgroups and /proc/meminfo be enough. Something like:
> > >
> > > Overhead = MemTotal - MemFree - SumOfTopCgroups(memory.current)
> >
> > We use the amount of compressed memory in zswap from memory.stat,
> > which is not accounted as memory usage in cgroup v1.
> >
>
> There are zswap stats in /proc/meminfo. Will those work for you?

Yeah this should work for this specific use case, thanks.

>
> [...]
> > > Fix the in-kernel flushers separately.
> >
> > The in-kernel flushers are basically facing the same problem. For
> > instance, reclaim would expect a stats read after a reclaim iteration
> > to reflect the system state after the reclaim iteration.
> >
>
> I have not seen any complains on memory reclaim recently. Maybe
> reclaim does not really need that such accuracy :P

Perhaps, it's full of heuristics anyway :)

>
> > > Also the problem Cloudflare is facing does not need to be tied with this.
> >
> > When we try to wait for flushing to complete we run into the same
> > latency problem of the root flush.
>
> Not sure what wait for flushing has to do with Cloudflare's report. They
> are ok with no sync flushing at all stat read.

Oh I am not saying the wait benefits their use case. I am saying when
the wait is implemented, we face the same problem (expensive flush of
the entire hierarchy), so we need to mitigate it anyway -- hence the
relevance to Cloudflare's use case.

Anyway, I have an alternative that I will propose shortly in response
to Johannes's reply.

     prev parent reply	other threads:[~2023-09-19  5:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-13  7:38 [PATCH 0/3] memcg: more sophisticated stats flushing Yosry Ahmed
2023-09-13  7:38 ` [PATCH 1/3] mm: memcg: change flush_next_time to flush_last_time Yosry Ahmed
2023-09-13  7:38 ` [PATCH 2/3] mm: memcg: rename stats_flush_threshold to stats_updates_order Yosry Ahmed
2023-09-13  7:38 ` [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy Yosry Ahmed
2023-09-13 15:37   ` Johannes Weiner
2023-09-13 16:26     ` Yosry Ahmed
2023-09-14 16:06       ` Johannes Weiner
2023-09-14 17:22         ` Yosry Ahmed
2023-09-14 17:26           ` Yosry Ahmed
2023-09-19  5:46           ` Yosry Ahmed
2023-09-14 17:19   ` Waiman Long
2023-09-14 17:23     ` Yosry Ahmed
2023-09-14 17:36       ` Waiman Long
2023-09-14 17:36   ` Shakeel Butt
2023-09-14 17:56     ` Yosry Ahmed
2023-09-14 22:58       ` Shakeel Butt
2023-09-14 23:30         ` Yosry Ahmed
2023-09-15  1:01           ` Shakeel Butt
2023-09-19  5:29             ` Yosry Ahmed [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tka-Rzn77J4cDwVb1jqiMF0XFsTHpJLAVsMbVTTyxZZVew@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=ivan@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=weixugc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox