Re: [RFC] memcg rstat flushing optimization

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosryahmed@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: "Zefan Li" <lizefan.x@bytedance.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>, Cgroups <cgroups@vger.kernel.org>,
	"Greg Thelen" <gthelen@google.com>
Subject: Re: [RFC] memcg rstat flushing optimization
Date: Mon, 10 Oct 2022 17:15:33 -0700	[thread overview]
Message-ID: <CAJD7tkZZuDwGHDjAsOde0VjDm9YcKWnWUGHg43q79hcffZH5Xw@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkZOw9hrc0jKYqYW1ysGZNjSVDgjhCyownBRmpS+UUCP3A@mail.gmail.com>

On Wed, Oct 5, 2022 at 11:38 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Wed, Oct 5, 2022 at 11:22 AM Tejun Heo <tj@kernel.org> wrote:
> >
> > Hello,
> >
> > On Wed, Oct 05, 2022 at 11:02:23AM -0700, Yosry Ahmed wrote:
> > > > I was thinking more that being done inside the flush function.
> > >
> > > I think the flush function already does that in some sense if
> > > might_sleep is true, right? The problem here is that we are using
> >
> > Oh I forgot about that. Right.
> >
> > ...
> > > I took a couple of crashed machines kdumps and ran a script to
> > > traverse updated memcgs and check how many cpus have updates and how
> > > many updates are there on each cpu. I found that on average only a
> > > couple of stats are updated per-cpu per-cgroup, and less than 25% of
> > > cpus (but this is on a large machine, I expect the number to go higher
> > > on smaller machines). Which is why I suggested a bitmask. I understand
> > > though that this depends on whatever workloads were running on those
> > > machines, and that in case where most stats are updated the bitmask
> > > will actually make things slightly worse.
> >
> > One worry I have about selective flushing is that it's only gonna improve
> > things by some multiples while we can reasonably increase the problem size
> > by orders of magnitude.
>
> I think we would usually want to flush a few stats (< 5?) in irqsafe
> contexts out of over 100, so I would say the improvement would be
> good, but yeah, the problem size can reasonably increase more than
> that. It also depends on which stats we selectively flush. If they are
> not in the same cache line we might end up bringing in a lot of stats
> anyway into the cpu cache.
>
> >
> > The only real ways out I can think of are:
> >
> > * Implement a periodic flusher which keeps the stats needed in irqsafe path
> >   acceptably uptodate to avoid flushing with irq disabled. We can make this
> >   adaptive too - no reason to do all this if the number to flush isn't huge.
>
> We do have a periodic flusher today for memcg stats (see
> flush_memcg_stats_dwork). It calls __mem_cgroup_flush_stas() which
> only flushes if the total number of updates is over a certain
> threshold.
> mem_cgroup_flush_stas_delayed(), which is called in the page fault
> path, only does a flush if the last flush was a certain while ago. We
> don't use the delayed version in all irqsafe contexts though, and I am
> not the right person to tell if we can.
>
> But I think this is not what you meant. I think you meant only
> flushing the specific stats needed in irqsafe contexts more frequently
> and not invoking a flush at all in irqsafe contexts (or using
> mem_cgroup_flush_stas_delayed()..?). Right?
>
> I am not the right person to judge what is acceptably up-to-date to be
> honest, so I would wait for other memcgs folks to chime in on this.
>
> >
> > * Shift some work to the updaters. e.g. in many cases, propagating per-cpu
> >   updates a couple levels up from update path will significantly reduce the
> >   fanouts and thus the number of entries which need to be flushed later. It
> >   does add on-going overhead, so it prolly should adaptive or configurable,
> >   hopefully the former.
>
> If we are adding overhead to the updaters, would it be better to
> maintain a bitmask of updated stats, or do you think it would be more
> effective to propagate updates a couple of levels up? I think to
> propagate updates up in updaters context we would need percpu versions
> of the "pending" stats, which would also add memory consumption.
>

Any thoughts here, Tejun or anyone?

> >
> > Thanks.
> >
> > --
> > tejun

next prev parent reply	other threads:[~2022-10-11  0:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-05  1:17 Yosry Ahmed
2022-10-05 16:30 ` Tejun Heo
2022-10-05 17:20   ` Yosry Ahmed
2022-10-05 17:42     ` Tejun Heo
2022-10-05 18:02       ` Yosry Ahmed
2022-10-05 18:22         ` Tejun Heo
2022-10-05 18:38           ` Yosry Ahmed
2022-10-06  2:13             ` Yosry Ahmed
2022-10-11  0:15             ` Yosry Ahmed [this message]
2022-10-11  0:19               ` Tejun Heo
2022-10-17 18:52 ` Michal Koutný
2022-10-17 21:30   ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkZZuDwGHDjAsOde0VjDm9YcKWnWUGHg43q79hcffZH5Xw@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox