From: Vlastimil Babka <vbabka@suse.cz>
To: Marinko Catovic <marinko.catovic@gmail.com>,
Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, Christopher Lameter <cl@linux.com>
Subject: Caching/buffers become useless after some time
Date: Wed, 31 Oct 2018 14:12:24 +0100 [thread overview]
Message-ID: <0a7f039d-0077-9559-cd12-64559b2e43ab@suse.cz> (raw)
In-Reply-To: <CADF2uSpiD9t-dF6bp-3-EnqWK9BBEwrfp69=_tcxUOLk_DytUA@mail.gmail.com>
Resending for lists which dropped my mail due to attachments. Sorry.
plots: https://nofile.io/f/ogwbrwhwBU7/plots.tar.bz2
R script:
files <- Sys.glob("vmstat.1*")
results <- read.table(files[1], row.names=1)
for (file in files[-1]) {
tmp2 <- read.table(file)$V2
results <- cbind(results, tmp2)
}
for (row in row.names(results)) {
png(paste("plots/", row, ".png", sep=""), width=1900, height=1150)
plot(t(as.vector(results[row,])), main=row)
dev.off()
}
On 10/22/18 3:19 AM, Marinko Catovic wrote:
> Am Mi., 29. Aug. 2018 um 18:44 Uhr schrieb Marinko Catovic
> <marinko.catovic@gmail.com>:
>>
>>
>>>> one host is at a healthy state right now, I'd run that over there immediately.
>>>
>>> Let's see what we can get from here.
>>
>>
>> oh well, that went fast. actually with having low values for buffers (around 100MB) with caches
>> around 20G or so, the performance was nevertheless super-low, I really had to drop
>> the caches right now. This is the first time I see it with caches >10G happening, but hopefully
>> this also provides a clue for you.
>>
>> Just after starting the stats I reset from previously defer to madvise - I suspect that this somehow
>> caused the rapid reaction, since a few minutes later I saw that the free RAM jumped from 5GB to 10GB,
>> after that I went afk, returning to the pc since my monitoring systems went crazy telling me about downtime.
>>
>> If you think changing /sys/kernel/mm/transparent_hugepage/defrag back to its default, while it was
>> on defer now for days, was a mistake, then please tell me.
>>
>> here you go: https://nofile.io/f/VqRg644AT01/vmstat.tar.gz
>> trace_pipe: https://nofile.io/f/wFShvZScpvn/trace_pipe.gz
>>
>
> There we go again.
>
> First of all, I have set up this monitoring on 1 host, as a matter of
> fact it did not occur on that single
> one for days and weeks now, so I set this up again on all the hosts
> and it just happened again on another one.
>
> This issue is far from over, even when upgrading to the latest 4.18.12
>
> https://nofile.io/f/z2KeNwJSMDj/vmstat-2.zip
> https://nofile.io/f/5ezPUkFWtnx/trace_pipe-2.gz
I have plot the vmstat using the attached script, and got the attached
plots. X axis are the vmstat snapshots, almost 14k of them, each for 5
seconds, so almost 19 hours. I can see the following phases:
0 - 2000:
- free memory (nr_free_pages) dropping from 48GB to the minimum allowed
by watermarks
- page cache (nr_file_pages) grows correspondingly
2000 - 6000:
- reclaimable slab (nr_slab_reclaimable) grows up to 40GB, unreclaimable
slab has same trend but much less
- page cache is shrinked correspondingly
- free memory remains at miminum
6000 - 12000:
- slab usage is slowly declining
- page cache slowly growing but there are hiccups
- free pages at minimum, growing after 9000, oscillating between 10000
and 12000
12000 - end:
- free pages growing sharply
- page cache declining sharply
- slab still slowly declining
I guess the original problem is manifested in the last phase. There
might be secondary issue with the slab usage, between 2000 and 6000 but
it doesn't seem immeidately connected (?).
I can see compaction activity (but not success) increased a lot in the
last phase, while direct reclaim is steady from 2000 onwards. This would
again suggest high-order allocations. THP doesn't seem to be the cause.
Vlastimil
next prev parent reply other threads:[~2018-10-31 13:15 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-11 13:18 Marinko Catovic
2018-07-12 11:34 ` Michal Hocko
2018-07-13 15:48 ` Marinko Catovic
2018-07-16 15:53 ` Marinko Catovic
2018-07-16 16:23 ` Michal Hocko
2018-07-16 16:33 ` Marinko Catovic
2018-07-16 16:45 ` Michal Hocko
2018-07-20 22:03 ` Marinko Catovic
2018-07-27 11:15 ` Vlastimil Babka
2018-07-30 14:40 ` Michal Hocko
2018-07-30 22:08 ` Marinko Catovic
2018-08-02 16:15 ` Vlastimil Babka
2018-08-03 14:13 ` Marinko Catovic
2018-08-06 9:40 ` Vlastimil Babka
2018-08-06 10:29 ` Marinko Catovic
2018-08-06 12:00 ` Michal Hocko
2018-08-06 15:37 ` Christopher Lameter
2018-08-06 18:16 ` Michal Hocko
2018-08-09 8:29 ` Marinko Catovic
2018-08-21 0:36 ` Marinko Catovic
2018-08-21 6:49 ` Michal Hocko
2018-08-21 7:19 ` Vlastimil Babka
2018-08-22 20:02 ` Marinko Catovic
2018-08-23 12:10 ` Vlastimil Babka
2018-08-23 12:21 ` Michal Hocko
2018-08-24 0:11 ` Marinko Catovic
2018-08-24 6:34 ` Vlastimil Babka
2018-08-24 8:11 ` Marinko Catovic
2018-08-24 8:36 ` Vlastimil Babka
2018-08-29 14:54 ` Marinko Catovic
2018-08-29 15:01 ` Michal Hocko
2018-08-29 15:13 ` Marinko Catovic
2018-08-29 15:27 ` Michal Hocko
2018-08-29 16:44 ` Marinko Catovic
2018-10-22 1:19 ` Marinko Catovic
2018-10-23 17:41 ` Marinko Catovic
2018-10-26 5:48 ` Marinko Catovic
2018-10-26 8:01 ` Michal Hocko
2018-10-26 23:31 ` Marinko Catovic
2018-10-27 6:42 ` Michal Hocko
[not found] ` <6e3a9434-32f2-0388-e0c7-2bd1c2ebc8b1@suse.cz>
2018-10-30 15:30 ` Michal Hocko
2018-10-30 16:08 ` Marinko Catovic
2018-10-30 17:00 ` Vlastimil Babka
2018-10-30 18:26 ` Marinko Catovic
2018-10-31 7:34 ` Michal Hocko
2018-10-31 7:32 ` Michal Hocko
2018-10-31 13:40 ` Vlastimil Babka
2018-10-31 14:53 ` Marinko Catovic
2018-10-31 17:01 ` Michal Hocko
2018-10-31 19:21 ` Marinko Catovic
2018-11-01 13:23 ` Michal Hocko
2018-11-01 22:46 ` Marinko Catovic
2018-11-02 8:05 ` Michal Hocko
2018-11-02 11:31 ` Marinko Catovic
2018-11-02 11:49 ` Michal Hocko
2018-11-02 12:22 ` Vlastimil Babka
2018-11-02 12:41 ` Marinko Catovic
2018-11-02 13:13 ` Vlastimil Babka
2018-11-02 13:50 ` Marinko Catovic
2018-11-02 14:49 ` Vlastimil Babka
2018-11-02 14:59 ` Vlastimil Babka
2018-11-30 12:01 ` Marinko Catovic
2018-12-10 21:30 ` Marinko Catovic
2018-12-10 21:47 ` Michal Hocko
2018-10-31 13:12 ` Vlastimil Babka [this message]
2018-08-24 6:24 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0a7f039d-0077-9559-cd12-64559b2e43ab@suse.cz \
--to=vbabka@suse.cz \
--cc=cl@linux.com \
--cc=linux-mm@kvack.org \
--cc=marinko.catovic@gmail.com \
--cc=mhocko@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox