Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re: Self nomination

ksummit.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Rik van Riel <riel@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
	ksummit-discuss@lists.linuxfoundation.org
Subject: Re: [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was Re:  Self nomination
Date: Thu, 28 Jul 2016 20:25:45 -0400	[thread overview]
Message-ID: <1469751945.13905.6.camel@redhat.com> (raw)
In-Reply-To: <20160728185523.GA16390@cmpxchg.org>

[-- Attachment #1: Type: text/plain, Size: 3148 bytes --]

On Thu, 2016-07-28 at 14:55 -0400, Johannes Weiner wrote:
> On Mon, Jul 25, 2016 at 01:11:42PM -0400, Johannes Weiner wrote:
> > Most recently I have been working on reviving swap for SSDs and
> > persistent memory devices (https://lwn.net/Articles/690079/) as
> > part
> > of a bigger anti-thrashing effort to make the VM recover swiftly
> > and
> > predictably from load spikes.
> 
> A bit of context, in case we want to discuss this at KS:
> 
> We frequently have machines hang and stop responding indefinitely
> after they experience memory load spikes. On closer look, we find
> most
> tasks either in page reclaim or majorfaulting parts of an executable
> or library. It's a typical thrashing pattern, where everybody
> cannibalizes everybody else. The problem is that with fast storage
> the
> cache reloads can be fast enough that there are never enough in-
> flight
> pages at a time to cause page reclaim to fail and trigger the OOM
> killer. The livelock persists until external remediation reboots the
> box or we get lucky and non-cache allocations eventually suck up the
> remaining page cache and trigger the OOM killer.
> 
> To avoid hitting this situation, we currently have to keep a generous
> memory reserve for occasional spikes, which sucks for utilization the
> rest of the time. Swap would be useful here, but the swapout code is
> basically only triggering when memory pressure rises - which again
> doesn't happen - so I've been working on the swap code to balance
> cache reclaim vs. swap based on relative thrashing between the two.
> 
> There is usually some cold/unused anonymous memory lying around that
> can be unloaded into swap during workload spikes, so that allows us
> to
> drive up the average memory utilization without increasing the risk
> at
> least. But if we screw up and there are not enough unused anon pages,
> we are back to thrashing - only now it involves swapping too.
> 
> So how do we address this?
> 
> A pathological thrashing situation is very obvious to any user, but
> it's not quite clear how to quantify it inside the kernel and have it
> trigger the OOM killer. It might be useful to talk about
> metrics. Could we quantify application progress? Could we quantify
> the
> amount of time a task or the system spends thrashing, and somehow
> express it as a percentage of overall execution time? Maybe something
> comparable to IO wait time, except tracking the time spent performing
> reclaim and waiting on IO that is refetching recently evicted pages?
> 
> This question seems to go beyond the memory subsystem and potentially
> involve the scheduler and the block layer, so it might be a good tech
> topic for KS.

I would like to discuss this topic, as well.

This is a very fundamental issue that used to be hard
coded in the BSDs (in the 1980s & 1990s), but where
hard coding is totally inappropriate with today's memory
sizes, and variation in I/O subsystem speeds.

Solving this, even if only on the detection side, could
make a real difference in having systems survive load
spikes.

-- 

All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

next prev parent reply	other threads:[~2016-07-29  0:25 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-25 17:11 [Ksummit-discuss] " Johannes Weiner
2016-07-25 18:15 ` Rik van Riel
2016-07-26 10:56   ` Jan Kara
2016-07-26 13:10     ` Vlastimil Babka
2016-07-28 18:55 ` [Ksummit-discuss] [TECH TOPIC] Memory thrashing, was " Johannes Weiner
2016-07-28 21:41   ` James Bottomley
2016-08-01 15:46     ` Johannes Weiner
2016-08-01 16:06       ` James Bottomley
2016-08-01 16:11         ` Dave Hansen
2016-08-01 16:33           ` James Bottomley
2016-08-01 18:13             ` Rik van Riel
2016-08-01 19:51             ` Dave Hansen
2016-08-01 17:08           ` Johannes Weiner
2016-08-01 18:19             ` Johannes Weiner
2016-07-29  0:25   ` Rik van Riel [this message]
2016-07-29 11:07   ` Mel Gorman
2016-07-29 16:26     ` Luck, Tony
2016-08-01 15:17       ` Rik van Riel
2016-08-01 16:55     ` Johannes Weiner
2016-08-02  9:18   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1469751945.13905.6.camel@redhat.com \
    --to=riel@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=ksummit-discuss@lists.linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox