From: Michal Hocko <mhocko@kernel.org>
To: Daniel Drake <drake@endlessm.com>
Cc: hannes@cmpxchg.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, cgroups@vger.kernel.org, linux@endlessm.com,
linux-block@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linuxfoundation.org>,
Tejun Heo <tj@kernel.org>, Balbir Singh <bsingharora@gmail.com>,
Mike Galbraith <efault@gmx.de>, Oliver Yang <yangoliver@me.com>,
Shakeel Butt <shakeelb@google.com>, xxx xxx <x.qendo@gmail.com>,
Taras Kondratiuk <takondra@cisco.com>,
Daniel Walker <danielwa@cisco.com>,
Vinayak Menon <vinmenon@codeaurora.org>,
Ruslan Ruslichenko <rruslich@cisco.com>,
kernel-team@fb.com
Subject: Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2
Date: Tue, 17 Jul 2018 13:25:15 +0200 [thread overview]
Message-ID: <20180717112515.GE7193@dhcp22.suse.cz> (raw)
In-Reply-To: <20180716155745.10368-1-drake@endlessm.com>
On Mon 16-07-18 10:57:45, Daniel Drake wrote:
> Hi Johannes,
>
> Thanks for your work on psi!
>
> We have also been investigating the "thrashing problem" on our Endless
> desktop OS. We have seen that systems can easily get into a state where the
> UI becomes unresponsive to input, and the mouse cursor becomes extremely
> slow or stuck when the system is running out of memory. We are working with
> a full GNOME desktop environment on systems with only 2GB RAM, and
> sometimes no real swap (although zram-swap helps mitigate the problem to
> some extent).
>
> My analysis so far indicates that when the system is low on memory and hits
> this condition, the system is spending much of the time under
> __alloc_pages_direct_reclaim. "perf trace -F" shows many many page faults
> in executable code while this is going on. I believe the kernel is
> swapping out executable code in order to satisfy memory allocation
> requests, but then that swapped-out code is needed a moment later so it
> gets swapped in again via the page fault handler, and all this activity
> severely starves the system from being able to respond to user input.
>
> I appreciate the kernel's attempt to keep processes alive, but in the
> desktop case we see that the system rarely recovers from this situation,
> so you have to hard shutdown. In this case we view it as desirable that
> the OOM killer would step in (it is not doing so because direct reclaim
> is not actually failing).
Yes this is really unfortunate. One thing that could help would be to
consider a trashing level during the reclaim (get_scan_count) to simply
forget about LRUs which are constantly refaulting pages back. We already
have the infrastructure for that. We just need to plumb it in.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-07-17 11:25 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-12 17:29 Johannes Weiner
2018-07-12 17:29 ` [PATCH 01/10] mm: workingset: don't drop refault information prematurely Johannes Weiner
2018-07-12 17:29 ` [PATCH 02/10] mm: workingset: tell cache transitions from workingset thrashing Johannes Weiner
2018-07-23 13:36 ` Arnd Bergmann
2018-07-23 15:23 ` Johannes Weiner
2018-07-23 15:35 ` Arnd Bergmann
2018-07-23 16:27 ` Johannes Weiner
2018-07-24 15:04 ` Will Deacon
2018-07-25 16:06 ` Will Deacon
2018-07-12 17:29 ` [PATCH 03/10] delayacct: track delays from thrashing cache pages Johannes Weiner
2018-07-12 17:29 ` [PATCH 04/10] sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD Johannes Weiner
2018-07-12 17:29 ` [PATCH 05/10] sched: loadavg: make calc_load_n() public Johannes Weiner
2018-07-12 17:29 ` [PATCH 06/10] sched: sched.h: make rq locking and clock functions available in stats.h Johannes Weiner
2018-07-12 17:29 ` [PATCH 07/10] sched: introduce this_rq_lock_irq() Johannes Weiner
2018-07-12 17:29 ` [PATCH 08/10] psi: pressure stall information for CPU, memory, and IO Johannes Weiner
2018-07-13 9:21 ` Peter Zijlstra
2018-07-13 16:17 ` Johannes Weiner
2018-07-14 8:48 ` Peter Zijlstra
2018-07-14 9:02 ` Peter Zijlstra
2018-07-17 10:03 ` Peter Zijlstra
2018-07-18 21:56 ` Johannes Weiner
2018-07-17 14:16 ` Peter Zijlstra
2018-07-18 22:00 ` Johannes Weiner
2018-07-17 14:21 ` Peter Zijlstra
2018-07-18 22:03 ` Johannes Weiner
2018-07-17 15:01 ` Peter Zijlstra
2018-07-18 22:06 ` Johannes Weiner
2018-07-20 14:13 ` Johannes Weiner
2018-07-17 15:17 ` Peter Zijlstra
2018-07-18 22:11 ` Johannes Weiner
2018-07-17 15:32 ` Peter Zijlstra
2018-07-18 12:03 ` Peter Zijlstra
2018-07-18 12:22 ` Peter Zijlstra
2018-07-18 22:36 ` Johannes Weiner
2018-07-19 13:58 ` Peter Zijlstra
2018-07-19 9:26 ` Peter Zijlstra
2018-07-19 12:50 ` Johannes Weiner
2018-07-19 13:18 ` Peter Zijlstra
2018-07-19 15:08 ` Linus Torvalds
2018-07-19 17:54 ` Johannes Weiner
2018-07-19 18:47 ` Johannes Weiner
2018-07-19 20:31 ` Peter Zijlstra
2018-07-24 16:01 ` Johannes Weiner
2018-07-18 12:46 ` Peter Zijlstra
2018-07-18 13:56 ` Johannes Weiner
2018-07-18 16:31 ` Peter Zijlstra
2018-07-18 16:46 ` Johannes Weiner
2018-07-20 20:35 ` Peter Zijlstra
2018-07-12 17:29 ` [PATCH 09/10] psi: cgroup support Johannes Weiner
2018-07-12 20:08 ` Tejun Heo
2018-07-17 15:40 ` Peter Zijlstra
2018-07-24 15:54 ` Johannes Weiner
2018-07-12 17:29 ` [RFC PATCH 10/10] psi: aggregate ongoing stall events when somebody reads pressure Johannes Weiner
2018-07-12 23:45 ` Andrew Morton
2018-07-13 22:17 ` Johannes Weiner
2018-07-13 22:13 ` Suren Baghdasaryan
2018-07-13 22:49 ` Johannes Weiner
2018-07-13 23:34 ` Suren Baghdasaryan
2018-07-17 15:13 ` Peter Zijlstra
2018-07-12 17:37 ` [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2 Linus Torvalds
2018-07-12 23:44 ` Andrew Morton
2018-07-13 22:14 ` Johannes Weiner
2018-07-16 15:57 ` Daniel Drake
2018-07-17 11:25 ` Michal Hocko [this message]
2018-07-17 12:13 ` Daniel Drake
2018-07-17 12:23 ` Michal Hocko
2018-07-25 22:57 ` Daniel Drake
2018-07-18 22:21 ` Johannes Weiner
2018-07-19 11:29 ` peter enderborg
2018-07-19 12:18 ` Johannes Weiner
2018-07-23 21:14 ` Balbir Singh
2018-07-24 15:15 ` Johannes Weiner
2018-07-26 1:07 ` Singh, Balbir
2018-07-26 20:07 ` Johannes Weiner
2018-07-27 23:40 ` Suren Baghdasaryan
2018-07-27 22:01 ` Pavel Machek
2018-07-30 15:40 ` Johannes Weiner
2018-07-30 17:39 ` Pavel Machek
2018-07-30 17:51 ` Tejun Heo
2018-07-30 17:54 ` Randy Dunlap
2018-07-30 18:05 ` Tejun Heo
2018-07-30 17:59 ` Pavel Machek
2018-07-30 18:07 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180717112515.GE7193@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linuxfoundation.org \
--cc=bsingharora@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=danielwa@cisco.com \
--cc=drake@endlessm.com \
--cc=efault@gmx.de \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@endlessm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rruslich@cisco.com \
--cc=shakeelb@google.com \
--cc=takondra@cisco.com \
--cc=tj@kernel.org \
--cc=vinmenon@codeaurora.org \
--cc=x.qendo@gmail.com \
--cc=yangoliver@me.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox