Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Singh, Balbir" <bsingharora@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>,
	surenb@google.com, Vinayak Menon <vinmenon@codeaurora.org>,
	Christoph Lameter <cl@linux.com>, Mike Galbraith <efault@gmx.de>,
	Shakeel Butt <shakeelb@google.com>, linux-mm <linux-mm@kvack.org>,
	cgroups@vger.kernel.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	kernel-team@fb.com
Subject: Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2
Date: Thu, 26 Jul 2018 11:07:32 +1000	[thread overview]
Message-ID: <268c2b08-6c90-de2b-d693-1270bb186713@gmail.com> (raw)
In-Reply-To: <20180724151519.GA11598@cmpxchg.org>



On 7/25/18 1:15 AM, Johannes Weiner wrote:
> Hi Balbir,
> 
> On Tue, Jul 24, 2018 at 07:14:02AM +1000, Balbir Singh wrote:
>> Does the mechanism scale? I am a little concerned about how frequently
>> this infrastructure is monitored/read/acted upon.
> 
> I expect most users to poll in the frequency ballpark of the running
> averages (10s, 1m, 5m). Our OOMD defaults to 5s polling of the 10s
> average; we collect the 1m average once per minute from our machines
> and cgroups to log the system/workload health trends in our fleet.
> 
> Suren has been experimenting with adaptive polling down to the
> millisecond range on Android.
> 

I think this is a bad way of doing things, polling only adds to overheads, there needs to be an event driven mechanism and the selection of the events need to happen in user space.

>> Why aren't existing mechanisms sufficient
> 
> Our existing stuff gives a lot of indication when something *may* be
> an issue, like the rate of page reclaim, the number of refaults, the
> average number of active processes, one task waiting on a resource.
> 
> But the real difference between an issue and a non-issue is how much
> it affects your overall goal of making forward progress or reacting to
> a request in time. And that's the only thing users really care
> about. It doesn't matter whether my system is doing 2314 or 6723 page
> refaults per minute, or scanned 8495 pages recently. I need to know
> whether I'm losing 1% or 20% of my time on overcommitted memory.
> 
> Delayacct is time-based, so it's a step in the right direction, but it
> doesn't aggregate tasks and CPUs into compound productivity states to
> tell you if only parts of your workload are seeing delays (which is
> often tolerable for the purpose of ensuring maximum HW utilization) or
> your system overall is not making forward progress. That aggregation
> isn't something you can do in userspace with polled delayacct data.

By aggregation you mean cgroup aggregation?

> 
>> -- why is the avg delay calculation in the kernel?
> 
> For one, as per above, most users will probably be using the standard
> averaging windows, and we already have this highly optimizd
> infrastructure from the load average. I don't see why we shouldn't use
> that instead of exporting an obscure number that requires most users
> to have an additional library or copy-paste the loadavg code.
> 
> I also mentioned the OOM killer as a likely in-kernel user of the
> pressure percentages to protect from memory livelocks out of the box,
> in which case we have to do this calculation in the kernel anyway.
> 
>> There is no talk about the overhead this introduces in general, may be
>> the details are in the patches. I'll read through them
> 
> I sent an email on benchmarks and overhead in one of the subthreads, I
> will include that information in the cover letter in v3.
> 
> https://lore.kernel.org/lkml/20180718215644.GB2838@cmpxchg.org/

Thanks, I'll take a look

Balbir Singh.

next prev parent reply	other threads:[~2018-07-26  1:07 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-12 17:29 Johannes Weiner
2018-07-12 17:29 ` [PATCH 01/10] mm: workingset: don't drop refault information prematurely Johannes Weiner
2018-07-12 17:29 ` [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing Johannes Weiner
2018-07-23 13:36   ` Arnd Bergmann
2018-07-23 15:23     ` Johannes Weiner
2018-07-23 15:35       ` Arnd Bergmann
2018-07-23 16:27         ` Johannes Weiner
2018-07-24 15:04           ` Will Deacon
2018-07-25 16:06             ` Will Deacon
2018-07-12 17:29 ` [PATCH 03/10] delayacct: track delays from thrashing cache pages Johannes Weiner
2018-07-12 17:29 ` [PATCH 04/10] sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD Johannes Weiner
2018-07-12 17:29 ` [PATCH 05/10] sched: loadavg: make calc_load_n() public Johannes Weiner
2018-07-12 17:29 ` [PATCH 06/10] sched: sched.h: make rq locking and clock functions available in stats.h Johannes Weiner
2018-07-12 17:29 ` [PATCH 07/10] sched: introduce this_rq_lock_irq() Johannes Weiner
2018-07-12 17:29 ` [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO Johannes Weiner
2018-07-13  9:21   ` Peter Zijlstra
2018-07-13 16:17     ` Johannes Weiner
2018-07-14  8:48       ` Peter Zijlstra
2018-07-14  9:02       ` Peter Zijlstra
2018-07-17 10:03   ` Peter Zijlstra
2018-07-18 21:56     ` Johannes Weiner
2018-07-17 14:16   ` Peter Zijlstra
2018-07-18 22:00     ` Johannes Weiner
2018-07-17 14:21   ` Peter Zijlstra
2018-07-18 22:03     ` Johannes Weiner
2018-07-17 15:01   ` Peter Zijlstra
2018-07-18 22:06     ` Johannes Weiner
2018-07-20 14:13       ` Johannes Weiner
2018-07-17 15:17   ` Peter Zijlstra
2018-07-18 22:11     ` Johannes Weiner
2018-07-17 15:32   ` Peter Zijlstra
2018-07-18 12:03   ` Peter Zijlstra
2018-07-18 12:22     ` Peter Zijlstra
2018-07-18 22:36     ` Johannes Weiner
2018-07-19 13:58       ` Peter Zijlstra
2018-07-19  9:26     ` Peter Zijlstra
2018-07-19 12:50       ` Johannes Weiner
2018-07-19 13:18         ` Peter Zijlstra
2018-07-19 15:08     ` Linus Torvalds
2018-07-19 17:54       ` Johannes Weiner
2018-07-19 18:47     ` Johannes Weiner
2018-07-19 20:31       ` Peter Zijlstra
2018-07-24 16:01         ` Johannes Weiner
2018-07-18 12:46   ` Peter Zijlstra
2018-07-18 13:56     ` Johannes Weiner
2018-07-18 16:31       ` Peter Zijlstra
2018-07-18 16:46         ` Johannes Weiner
2018-07-20 20:35   ` Peter Zijlstra
2018-07-12 17:29 ` [PATCH 09/10] psi: cgroup support Johannes Weiner
2018-07-12 20:08   ` Tejun Heo
2018-07-17 15:40   ` Peter Zijlstra
2018-07-24 15:54     ` Johannes Weiner
2018-07-12 17:29 ` [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure Johannes Weiner
2018-07-12 23:45   ` Andrew Morton
2018-07-13 22:17     ` Johannes Weiner
2018-07-13 22:13   ` Suren Baghdasaryan
2018-07-13 22:49     ` Johannes Weiner
2018-07-13 23:34       ` Suren Baghdasaryan
2018-07-17 15:13   ` Peter Zijlstra
2018-07-12 17:37 ` [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 Linus Torvalds
2018-07-12 23:44 ` Andrew Morton
2018-07-13 22:14   ` Johannes Weiner
2018-07-16 15:57 ` Daniel Drake
2018-07-17 11:25   ` Michal Hocko
2018-07-17 12:13     ` Daniel Drake
2018-07-17 12:23       ` Michal Hocko
2018-07-25 22:57         ` Daniel Drake
2018-07-18 22:21     ` Johannes Weiner
2018-07-19 11:29       ` peter enderborg
2018-07-19 12:18         ` Johannes Weiner
2018-07-23 21:14 ` Balbir Singh
2018-07-24 15:15   ` Johannes Weiner
2018-07-26  1:07     ` Singh, Balbir [this message]
2018-07-26 20:07       ` Johannes Weiner
2018-07-27 23:40         ` Suren Baghdasaryan
2018-07-27 22:01 ` Pavel Machek
2018-07-30 15:40   ` Johannes Weiner
2018-07-30 17:39     ` Pavel Machek
2018-07-30 17:51       ` Tejun Heo
2018-07-30 17:54         ` Randy Dunlap
2018-07-30 18:05           ` Tejun Heo
2018-07-30 17:59         ` Pavel Machek
2018-07-30 18:07           ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=268c2b08-6c90-de2b-d693-1270bb186713@gmail.com \
    --to=bsingharora@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=efault@gmx.de \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vinmenon@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox