Re: [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosryahmed@google.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Ivan Babrou" <ivan@cloudflare.com>, "Tejun Heo" <tj@kernel.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Waiman Long" <longman@redhat.com>,
	kernel-team@cloudflare.com, "Wei Xu" <weixugc@google.com>,
	"Greg Thelen" <gthelen@google.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy
Date: Thu, 14 Sep 2023 10:56:52 -0700	[thread overview]
Message-ID: <CAJD7tkbb5zxEPSW6vvijDpQKCZiDcUJqe8SPvLHfo0RX7iHXAw@mail.gmail.com> (raw)
In-Reply-To: <CALvZod6j5E2MXFz463LRcrP6XY8FedzLUKW88c=ZY39B6mYyMA@mail.gmail.com>

On Thu, Sep 14, 2023 at 10:36 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Sep 13, 2023 at 12:38 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > Stats flushing for memcg currently follows the following rules:
> > - Always flush the entire memcg hierarchy (i.e. flush the root).
> > - Only one flusher is allowed at a time. If someone else tries to flush
> >   concurrently, they skip and return immediately.
> > - A periodic flusher flushes all the stats every 2 seconds.
> >
> > The reason this approach is followed is because all flushes are
> > serialized by a global rstat spinlock. On the memcg side, flushing is
> > invoked from userspace reads as well as in-kernel flushers (e.g.
> > reclaim, refault, etc). This approach aims to avoid serializing all
> > flushers on the global lock, which can cause a significant performance
> > hit under high concurrency.
> >
> > This approach has the following problems:
> > - Occasionally a userspace read of the stats of a non-root cgroup will
> >   be too expensive as it has to flush the entire hierarchy [1].
>
> This is a real world workload exhibiting the issue which is good.
>
> > - Sometimes the stats accuracy are compromised if there is an ongoing
> >   flush, and we skip and return before the subtree of interest is
> >   actually flushed. This is more visible when reading stats from
> >   userspace, but can also affect in-kernel flushers.
>
> Please provide similar data/justification for the above. In addition:
>
> 1. How much delayed/stale stats have you observed on real world workload?

I am not really sure. We don't have a wide deployment of kernels with
rstat yet. These are problems observed in testing and/or concerns
expressed by our userspace team.

I am trying to solve this now because any problems that result from
this staleness will be very hard to debug and link back to stale
stats.

>
> 2. What is acceptable staleness in the stats for your use-case?

Again, unfortunately I am not sure, but right now it can be O(seconds)
which is not acceptable as we have workloads querying the stats every
1s (and sometimes more frequently).

>
> 3. What is your use-case?

A few use cases we have that may be affected by this:
- System overhead: calculations using memory.usage and some stats from
memory.stat. If one of them is fresh and the other one isn't we have
an inconsistent view of the system.
- Userspace OOM killing: We use some stats in memory.stat to gauge the
amount of memory that will be freed by killing a task as sometimes
memory.usage includes shared resources that wouldn't be freed anyway.
- Proactive reclaim: we read memory.stat in a proactive reclaim
feedback loop, stale stats may cause us to mistakenly think reclaim is
ineffective and prematurely stop.

>
> 4. Does your use-case care about staleness of all the stats in
> memory.stat or some specific stats?

We have multiple use cases that can be affected by this, so I don't
think there are some specific stats. I am also not aware of all
possibly affected use cases.

>
> 5. If some specific stats in memory.stat, does it make sense to
> decouple them from rstat and just pay the price up front to maintain
> them accurately?
>
> Most importantly please please please be concise in your responses.

I try, sometimes I am not sure how much detail is needed. Sorry about that :)

>
> I know I am going back on some of the previous agreements but this
> whole locking back and forth has made in question the original
> motivation.

That's okay. Taking a step back, having flushing being indeterministic
in this way is a time bomb in my opinion. Note that this also affects
in-kernel flushers like reclaim or dirty isolation, which I suspect
will be more affected by staleness. No one complained yet AFAICT, but
I think it's a time bomb. The worst part is that if/when a problem
happens, we won't be able to easily tell what went wrong.

next prev parent reply	other threads:[~2023-09-14 17:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-13  7:38 [PATCH 0/3] memcg: more sophisticated stats flushing Yosry Ahmed
2023-09-13  7:38 ` [PATCH 1/3] mm: memcg: change flush_next_time to flush_last_time Yosry Ahmed
2023-09-13  7:38 ` [PATCH 2/3] mm: memcg: rename stats_flush_threshold to stats_updates_order Yosry Ahmed
2023-09-13  7:38 ` [PATCH 3/3] mm: memcg: optimize stats flushing for latency and accuracy Yosry Ahmed
2023-09-13 15:37   ` Johannes Weiner
2023-09-13 16:26     ` Yosry Ahmed
2023-09-14 16:06       ` Johannes Weiner
2023-09-14 17:22         ` Yosry Ahmed
2023-09-14 17:26           ` Yosry Ahmed
2023-09-19  5:46           ` Yosry Ahmed
2023-09-14 17:19   ` Waiman Long
2023-09-14 17:23     ` Yosry Ahmed
2023-09-14 17:36       ` Waiman Long
2023-09-14 17:36   ` Shakeel Butt
2023-09-14 17:56     ` Yosry Ahmed [this message]
2023-09-14 22:58       ` Shakeel Butt
2023-09-14 23:30         ` Yosry Ahmed
2023-09-15  1:01           ` Shakeel Butt
2023-09-19  5:29             ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkbb5zxEPSW6vvijDpQKCZiDcUJqe8SPvLHfo0RX7iHXAw@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=ivan@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=weixugc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox