From: Suren Baghdasaryan <surenb@google.com>
To: Luigi Semenzato <semenzato@google.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: PSI vs. CPU overhead for client computing
Date: Wed, 24 Apr 2019 07:49:34 -0700 [thread overview]
Message-ID: <CAJuCfpHMEVHYpodjsote2Gp0y_G1=Hi66xzdhXfOgtcMMiiL9g@mail.gmail.com> (raw)
In-Reply-To: <CAA25o9Rzcqts7oCpwyRq2yBALkHQVwgzgFDVYv08Z0UUhY+qhw@mail.gmail.com>
On Tue, Apr 23, 2019 at 9:54 PM Luigi Semenzato <semenzato@google.com> wrote:
>
> Thank you very much Suren.
>
> On Tue, Apr 23, 2019 at 3:04 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > Hi Luigi,
> >
> > On Tue, Apr 23, 2019 at 11:58 AM Luigi Semenzato <semenzato@google.com> wrote:
> > >
> > > I and others are working on improving system behavior under memory
> > > pressure on Chrome OS. We use zram, which swaps to a
> > > statically-configured compressed RAM disk. One challenge that we have
> > > is that the footprint of our workloads is highly variable. With zram,
> > > we have to set the size of the swap partition at boot time. When the
> > > (logical) swap partition is full, we're left with some amount of RAM
> > > usable by file and anonymous pages (we can ignore the rest). We don't
> > > get to control this amount dynamically. Thus if the workload fits
> > > nicely in it, everything works well. If it doesn't, then the rate of
> > > anonymous page faults can be quite high, causing large CPU overhead
> > > for compression/decompression (as well as for other parts of the MM).
> > >
> > > In Chrome OS and Android, we have the luxury that we can reduce
> > > pressure by terminating processes (tab discard in Chrome OS, app kill
> > > in Android---which incidentally also runs in parallel with Chrome OS
> > > on some chromebooks). To help decide when to reduce pressure, we
> > > would like to have a reliable and device-independent measure of MM CPU
> > > overhead. I have looked into PSI and have a few questions. I am also
> > > looking for alternative suggestions.
> > >
> > > PSI measures the times spent when some and all tasks are blocked by
> > > memory allocation. In some experiments, this doesn't seem to
> > > correlate too well with CPU overhead (which instead correlates fairly
> > > well with page fault rates). Could this be because it includes
> > > pressure from file page faults?
> >
> > This might be caused by thrashing (see:
> > https://elixir.bootlin.com/linux/v5.1-rc6/source/mm/filemap.c#L1114).
> >
> > > Is there some way of interpreting PSI
> > > numbers so that the pressure from file pages is ignored?
> >
> > I don't think so but I might be wrong. Notice here
> > https://elixir.bootlin.com/linux/v5.1-rc6/source/mm/filemap.c#L1111
> > you could probably use delayacct to distinguish file thrashing,
> > however remember that PSI takes into account the number of CPUs and
> > the number of currently non-idle tasks in its pressure calculations,
> > so the raw delay numbers might not be very useful here.
>
> OK.
>
> > > What is the purpose of "some" and "full" in the PSI measurements? The
> > > chrome browser is a multi-process app and there is a lot of IPC. When
> > > process A is blocked on memory allocation, it cannot respond to IPC
> > > from process B, thus effectively both processes are blocked on
> > > allocation, but we don't see that.
> >
> > I don't think PSI would account such an indirect stall when A is
> > waiting for B and B is blocked on memory access. B's stall will be
> > accounted for but I don't think A's blocked time will go into PSI
> > calculations. The process inter-dependencies are probably out of scope
> > for PSI.
>
> Right, that's what I was also saying. It would be near impossible to
> figure it out. It may also be that statistically it doesn't matter,
> as long as the workload characteristics don't change dramatically.
> Which unfortunately they might...
>
> > > Also, there are situations in
> > > which some "uninteresting" process keep running. So it's not clear we
> > > can rely on "full". Or maybe I am misunderstanding? "Some" may be a
> > > better measure, but again it doesn't measure indirect blockage.
> >
> > Johannes explains the SOME and FULL calculations here:
> > https://elixir.bootlin.com/linux/v5.1-rc6/source/kernel/sched/psi.c#L76
> > and includes couple examples with the last one showing FULL>0 and some
> > tasks still running.
>
> Thank you, yes, those are good explanation. I am still not sure how
> to use this in our case.
>
> I thought about using the page fault rate as a proxy for the
> allocation overhead. Unfortunately it is difficult to figure out the
> baseline, because: 1. it is device-dependent (that's not
> insurmountable: we could compute a per-device baseline offline); 2.
> the CPUs can go in and out of turbo mode, or temperature-throttling,
> and the notion of a constant "baseline" fails miserably.
>
> > > The kernel contains various cpustat measurements, including some
> > > slightly esoteric ones such as CPUTIME_GUEST and CPUTIME_GUEST_NICE.
> > > Would adding a CPUTIME_MEM be out of the question?
>
> Any opinion on CPUTIME_MEM?
I guess some description of how you plan to calculate it would be
helpful. A simple raw delay counter might not be very useful, that's
why PSI performs more elaborate calculations.
Maybe posting a small RFC patch with code would get more attention and
you can collect more feedback.
> Thanks again!
>
> > > Thanks!
> > >
> >
> > Just my 2 cents and Johannes being the author might have more to say here.
next prev parent reply other threads:[~2019-04-24 14:49 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-23 18:57 Luigi Semenzato
2019-04-23 22:04 ` Suren Baghdasaryan
2019-04-24 4:54 ` Luigi Semenzato
2019-04-24 14:49 ` Suren Baghdasaryan [this message]
2019-04-25 17:31 ` Luigi Semenzato
2019-04-24 16:36 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpHMEVHYpodjsote2Gp0y_G1=Hi66xzdhXfOgtcMMiiL9g@mail.gmail.com' \
--to=surenb@google.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=semenzato@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox