From: Johannes Weiner <hannes@cmpxchg.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Tejun Heo <tj@kernel.org>, Suren Baghdasaryan <surenb@google.com>,
Vinayak Menon <vinmenon@codeaurora.org>,
Christopher Lameter <cl@linux.com>,
Mike Galbraith <efault@gmx.de>,
Shakeel Butt <shakeelb@google.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO
Date: Wed, 18 Jul 2018 09:56:33 -0400 [thread overview]
Message-ID: <20180718135633.GA5161@cmpxchg.org> (raw)
In-Reply-To: <20180718124627.GD2476@hirez.programming.kicks-ass.net>
Hi Peter,
thanks for the feedback so far, I'll get to the other emails
later. I'm currently running A/B tests against our production traffic
to get uptodate numbers in particular on the optimizations you
suggested for the cacheline packing, time_state(), ffs() etc.
On Wed, Jul 18, 2018 at 02:46:27PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 12, 2018 at 01:29:40PM -0400, Johannes Weiner wrote:
>
> > +static inline void psi_enqueue(struct task_struct *p, u64 now, bool wakeup)
> > +{
> > + int clear = 0, set = TSK_RUNNING;
> > +
> > + if (psi_disabled)
> > + return;
> > +
> > + if (!wakeup || p->sched_psi_wake_requeue) {
> > + if (p->flags & PF_MEMSTALL)
> > + set |= TSK_MEMSTALL;
> > + if (p->sched_psi_wake_requeue)
> > + p->sched_psi_wake_requeue = 0;
> > + } else {
> > + if (p->in_iowait)
> > + clear |= TSK_IOWAIT;
> > + }
> > +
> > + psi_task_change(p, now, clear, set);
> > +}
> > +
> > +static inline void psi_dequeue(struct task_struct *p, u64 now, bool sleep)
> > +{
> > + int clear = TSK_RUNNING, set = 0;
> > +
> > + if (psi_disabled)
> > + return;
> > +
> > + if (!sleep) {
> > + if (p->flags & PF_MEMSTALL)
> > + clear |= TSK_MEMSTALL;
> > + } else {
> > + if (p->in_iowait)
> > + set |= TSK_IOWAIT;
> > + }
> > +
> > + psi_task_change(p, now, clear, set);
> > +}
>
> > +/**
> > + * psi_memstall_enter - mark the beginning of a memory stall section
> > + * @flags: flags to handle nested sections
> > + *
> > + * Marks the calling task as being stalled due to a lack of memory,
> > + * such as waiting for a refault or performing reclaim.
> > + */
> > +void psi_memstall_enter(unsigned long *flags)
> > +{
> > + struct rq_flags rf;
> > + struct rq *rq;
> > +
> > + if (psi_disabled)
> > + return;
> > +
> > + *flags = current->flags & PF_MEMSTALL;
> > + if (*flags)
> > + return;
> > + /*
> > + * PF_MEMSTALL setting & accounting needs to be atomic wrt
> > + * changes to the task's scheduling state, otherwise we can
> > + * race with CPU migration.
> > + */
> > + rq = this_rq_lock_irq(&rf);
> > +
> > + update_rq_clock(rq);
> > +
> > + current->flags |= PF_MEMSTALL;
> > + psi_task_change(current, rq_clock(rq), 0, TSK_MEMSTALL);
> > +
> > + rq_unlock_irq(rq, &rf);
> > +}
>
> I'm confused by this whole MEMSTALL thing... I thought the idea was to
> account the time we were _blocked_ because of memstall, but you seem to
> count the time we're _running_ with PF_MEMSTALL.
Under heavy memory pressure, a lot of active CPU time is spent
scanning and rotating through the LRU lists, which we do want to
capture in the pressure metric. What we really want to know is the
time in which CPU potential goes to waste due to a lack of
resources. That's the CPU going idle due to a memstall, but it's also
a CPU doing *work* which only occurs due to a lack of memory. We want
to know about both to judge how productive system and workload are.
> And esp. the wait_on_page_bit_common caller seems performance sensitive,
> and the above function is quite expensive.
Right, but we don't call it on every invocation, only when waiting for
the IO to read back a page that was recently deactivated and evicted:
if (bit_nr == PG_locked &&
!PageUptodate(page) && PageWorkingset(page)) {
if (!PageSwapBacked(page))
delayacct_thrashing_start();
psi_memstall_enter(&pflags);
thrashing = true;
}
That means the page cache workingset/file active list is thrashing, in
which case the IO itself is our biggest concern, not necessarily a few
additional cycles before going to sleep to wait on its completion.
next prev parent reply other threads:[~2018-07-18 13:53 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-12 17:29 [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 Johannes Weiner
2018-07-12 17:29 ` [PATCH 01/10] mm: workingset: don't drop refault information prematurely Johannes Weiner
2018-07-12 17:29 ` [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing Johannes Weiner
2018-07-23 13:36 ` Arnd Bergmann
2018-07-23 15:23 ` Johannes Weiner
2018-07-23 15:35 ` Arnd Bergmann
2018-07-23 16:27 ` Johannes Weiner
2018-07-24 15:04 ` Will Deacon
2018-07-25 16:06 ` Will Deacon
2018-07-12 17:29 ` [PATCH 03/10] delayacct: track delays from thrashing cache pages Johannes Weiner
2018-07-12 17:29 ` [PATCH 04/10] sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD Johannes Weiner
2018-07-12 17:29 ` [PATCH 05/10] sched: loadavg: make calc_load_n() public Johannes Weiner
2018-07-12 17:29 ` [PATCH 06/10] sched: sched.h: make rq locking and clock functions available in stats.h Johannes Weiner
2018-07-12 17:29 ` [PATCH 07/10] sched: introduce this_rq_lock_irq() Johannes Weiner
2018-07-12 17:29 ` [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO Johannes Weiner
2018-07-13 9:21 ` Peter Zijlstra
2018-07-13 16:17 ` Johannes Weiner
2018-07-14 8:48 ` Peter Zijlstra
2018-07-14 9:02 ` Peter Zijlstra
2018-07-17 10:03 ` Peter Zijlstra
2018-07-18 21:56 ` Johannes Weiner
2018-07-17 14:16 ` Peter Zijlstra
2018-07-18 22:00 ` Johannes Weiner
2018-07-17 14:21 ` Peter Zijlstra
2018-07-18 22:03 ` Johannes Weiner
2018-07-17 15:01 ` Peter Zijlstra
2018-07-18 22:06 ` Johannes Weiner
2018-07-20 14:13 ` Johannes Weiner
2018-07-17 15:17 ` Peter Zijlstra
2018-07-18 22:11 ` Johannes Weiner
2018-07-17 15:32 ` Peter Zijlstra
2018-07-18 12:03 ` Peter Zijlstra
2018-07-18 12:22 ` Peter Zijlstra
2018-07-18 22:36 ` Johannes Weiner
2018-07-19 13:58 ` Peter Zijlstra
2018-07-19 9:26 ` Peter Zijlstra
2018-07-19 12:50 ` Johannes Weiner
2018-07-19 13:18 ` Peter Zijlstra
2018-07-19 15:08 ` Linus Torvalds
2018-07-19 17:54 ` Johannes Weiner
2018-07-19 18:47 ` Johannes Weiner
2018-07-19 20:31 ` Peter Zijlstra
2018-07-24 16:01 ` Johannes Weiner
2018-07-18 12:46 ` Peter Zijlstra
2018-07-18 13:56 ` Johannes Weiner [this message]
2018-07-18 16:31 ` Peter Zijlstra
2018-07-18 16:46 ` Johannes Weiner
2018-07-20 20:35 ` Peter Zijlstra
2018-07-12 17:29 ` [PATCH 09/10] psi: cgroup support Johannes Weiner
2018-07-12 20:08 ` Tejun Heo
2018-07-17 15:40 ` Peter Zijlstra
2018-07-24 15:54 ` Johannes Weiner
2018-07-12 17:29 ` [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure Johannes Weiner
2018-07-12 23:45 ` Andrew Morton
2018-07-13 22:17 ` Johannes Weiner
2018-07-13 22:13 ` Suren Baghdasaryan
2018-07-13 22:49 ` Johannes Weiner
2018-07-13 23:34 ` Suren Baghdasaryan
2018-07-17 15:13 ` Peter Zijlstra
2018-07-12 17:37 ` [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 Linus Torvalds
2018-07-12 23:44 ` Andrew Morton
2018-07-13 22:14 ` Johannes Weiner
2018-07-16 15:57 ` Daniel Drake
2018-07-17 11:25 ` Michal Hocko
2018-07-17 12:13 ` Daniel Drake
2018-07-17 12:23 ` Michal Hocko
2018-07-25 22:57 ` Daniel Drake
2018-07-18 22:21 ` Johannes Weiner
2018-07-19 11:29 ` peter enderborg
2018-07-19 12:18 ` Johannes Weiner
2018-07-23 21:14 ` Balbir Singh
2018-07-24 15:15 ` Johannes Weiner
2018-07-26 1:07 ` Singh, Balbir
2018-07-26 20:07 ` Johannes Weiner
2018-07-27 23:40 ` Suren Baghdasaryan
2018-07-27 22:01 ` Pavel Machek
2018-07-30 15:40 ` Johannes Weiner
2018-07-30 17:39 ` Pavel Machek
2018-07-30 17:51 ` Tejun Heo
2018-07-30 17:54 ` Randy Dunlap
2018-07-30 18:05 ` Tejun Heo
2018-07-30 17:59 ` Pavel Machek
2018-07-30 18:07 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180718135633.GA5161@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux.com \
--cc=efault@gmx.de \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=shakeelb@google.com \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vinmenon@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox