linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: "Michal Koutný" <mkoutny@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Shakeel Butt <shakeelb@google.com>,
	Muchun Song <muchun.song@linux.dev>,
	 Ivan Babrou <ivan@cloudflare.com>, Tejun Heo <tj@kernel.org>,
	linux-mm@kvack.org,  cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/3] memcg: non-unified flushing for userspace stats
Date: Tue, 22 Aug 2023 08:43:13 -0700	[thread overview]
Message-ID: <CAJD7tkYrbs8Eqb-AbTmb5JeQb+KeLZrU4C2kJk8pjva-6p6bHQ@mail.gmail.com> (raw)
In-Reply-To: <a4opjlmm5it3ujoypcgjfljfamjvr7gu34sc7lrjldfbqzz4lj@6w4lqdcfd3zu>

On Tue, Aug 22, 2023 at 6:01 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hello.
>
> On Mon, Aug 21, 2023 at 08:54:55PM +0000, Yosry Ahmed <yosryahmed@google.com> wrote:
> > For userspace reads, unified flushing leads to non-deterministic stats
> > staleness and reading cost.
>
> I only skimed previous threads but I don't remember if it was resolved:
> a) periodic flushing was too much spaced for user space readers (i.e. 2s
>    delay is too much [1]),
> b) periodic flushing didn't catch up (i.e. full tree flush can
>    occassionaly take more than 2s) leading to extra staleness?

The idea is that having stats anywhere between 0-2 seconds stale is
inconsistent, especially when compared to other values (such as
memory.usage_in_bytes), which can give an inconsistent and stale view
of the system. For some readers, 2s is too spaced (we have readers
that read every second). A time-only bound is scary because on large
systems a lot can happen in a second. There will always be a delay
anyway, but ideally we want to minimize it.

I think 2s is also not a strict bound (e.g. flushing is taking a lot
of time, a flush started but the cgroup you care about hasn't been
flushed yet, etc).

There is also the problem of variable cost of reading.

>
> [1] Assuming that nr_cpus*MEMCG_CHARGE_BATCH error bound is also too
>     much for userspace readers, correct?

I can't tell for sure to be honest, but given the continuously
increased number of cpus on modern systems, and the fact that
nr_cpus*MEMCG_CHARGE_BATCH can be localized in one cgroup or spread
across the hierarchy, I think it's better if we drop it for userspace
reads if possible.

>
> > The cost of userspace reads are now determinstic, and depend on the
> > size of the subtree being read. This should fix both the *sometimes*
> > expensive reads (due to flushing the entire tree) and occasional
> > staless (due to skipping flushing).
>
> This is nice, thanks to the atomic removal in the commit 0a2dc6ac3329
> ("cgroup: remove cgroup_rstat_flush_atomic()"). I think the smaller
> chunks with yielding could be universally better (last words :-).

I was hoping we can remove unified flushing completely, but stress
testing with hundreds of concurrent reclaimers shows a lot of
regression. Maybe it doesn't matter in practice, but I don't want to
pull that trigger :)

FWIW, with unified flushing we are getting great concurrency for
in-kernel flushers like reclaim or refault, but at the expense of
stats staleness. I really wonder what hidden cost we are paying due to
the stale stats. I would imagine any problems that manifest from this
would be difficult to tie back to the stale stats.

>
> Thanks,
> Michal
>


      reply	other threads:[~2023-08-22 15:43 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 20:54 Yosry Ahmed
2023-08-21 20:54 ` [PATCH 1/3] mm: memcg: properly name and document unified stats flushing Yosry Ahmed
2023-08-21 20:54 ` [PATCH 2/3] mm: memcg: add a helper for non-unified " Yosry Ahmed
2023-08-22 13:01   ` Michal Koutný
2023-08-22 16:00     ` Yosry Ahmed
2023-08-22 16:35       ` Michal Koutný
2023-08-22 16:48         ` Yosry Ahmed
2023-08-21 20:54 ` [PATCH 3/3] mm: memcg: use non-unified stats flushing for userspace reads Yosry Ahmed
2023-08-22  9:06   ` Michal Hocko
2023-08-22 15:30     ` Yosry Ahmed
2023-08-23  7:33       ` Michal Hocko
2023-08-23 14:55         ` Yosry Ahmed
2023-08-24  7:13           ` Michal Hocko
2023-08-24 18:15             ` Yosry Ahmed
2023-08-24 18:50               ` Yosry Ahmed
2023-08-25  7:05                 ` Michal Hocko
2023-08-25 15:14                   ` Yosry Ahmed
2023-08-25 18:17                     ` Michal Hocko
2023-08-25 18:21                       ` Yosry Ahmed
2023-08-25 18:43                         ` Michal Hocko
2023-08-25 18:44                           ` Michal Hocko
2023-08-28 15:47                     ` Michal Hocko
2023-08-28 16:15                       ` Yosry Ahmed
2023-08-28 17:00                         ` Shakeel Butt
2023-08-28 17:07                           ` Yosry Ahmed
2023-08-28 17:27                             ` Waiman Long
2023-08-28 17:28                               ` Yosry Ahmed
2023-08-28 17:35                                 ` Waiman Long
2023-08-28 17:43                                   ` Waiman Long
2023-08-28 18:35                                     ` Yosry Ahmed
2023-08-29  7:27                               ` Michal Hocko
2023-08-29 15:05                                 ` Waiman Long
2023-08-29 15:17                                   ` Michal Hocko
2023-08-29 16:04                                     ` Yosry Ahmed
2023-08-29 18:44                   ` Tejun Heo
2023-08-29 19:13                     ` Yosry Ahmed
2023-08-29 19:36                       ` Tejun Heo
2023-08-29 19:54                         ` Yosry Ahmed
2023-08-29 20:12                           ` Tejun Heo
2023-08-29 20:20                             ` Yosry Ahmed
2023-08-31  9:05                               ` Michal Hocko
2023-08-22 13:00 ` [PATCH 0/3] memcg: non-unified flushing for userspace stats Michal Koutný
2023-08-22 15:43   ` Yosry Ahmed [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkYrbs8Eqb-AbTmb5JeQb+KeLZrU4C2kJk8pjva-6p6bHQ@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=ivan@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox