From: Michal Hocko <mhocko@kernel.org>
To: Marinko Catovic <marinko.catovic@gmail.com>
Cc: linux-mm@kvack.org
Subject: Re: Caching/buffers become useless after some time
Date: Thu, 12 Jul 2018 13:34:11 +0200 [thread overview]
Message-ID: <20180712113411.GB328@dhcp22.suse.cz> (raw)
In-Reply-To: <CADF2uSrW=Z=7NeA4qRwStoARGeT1y33QSP48Loc1u8XSdpMJOA@mail.gmail.com>
On Wed 11-07-18 15:18:30, Marinko Catovic wrote:
> hello guys
>
>
> I tried in a few IRC, people told me to ask here, so I'll give it a try.
>
>
> I have a very weird issue with mm on several hosts.
> The systems are for shared hosting, so lots of users there with lots of
> files.
> Maybe 5TB of files per host, several million at least, there is lots of I/O
> which can be handled perfectly fine with buffers/cache
>
> The kernel version is the latest stable, 4.17.4, I had 3.x before and did
> not notice any issues until now. the same is for 4.16 which was in use
> before:
>
> The hosts altogether have 64G of RAM and operate with SSD+HDD.
> HDDs are the issue here, since those 5TB of data are stored there, there
> goes the high I/O.
> Running applications need about 15GB, so say 40GB of RAM are left for
> buffers/caching.
>
> Usually this works perfectly fine. The buffers take about 1-3G of RAM, the
> cache the rest, say 35GB as an example.
> But every now and then, maybe every 2 days it happens that both drop to
> really low values, say 100MB buffers, 3GB caches and the rest of the RAM is
> not in use, so there are about 35GB+ of totally free RAM.
>
> The performance of the host goes down significantly then, as it becomes
> unusable at some point, since it behaves as if the buffers/cache were
> totally useless.
> After lots and lots of playing around I noticed that when shutting down all
> services that access the HDDs on the system and restarting them, that this
> does *not* make any difference.
>
> But what did make a difference was stopping and umounting the fs, mounting
> it again and starting the services.
> Then the buffers+cache built up to 5GB/35GB as usual after a while and
> everything was perfectly fine again!
>
> I noticed that what happens when umount is called, that the caches are
> being dropped. So I gave it a try:
>
> sync; echo 2 > /proc/sys/vm/drop_caches
>
> has the exactly same effect. Note that echo 1 > .. does not.
>
> So if that low usage like 100MB/3GB occurs I'd have to drop the caches by
> echoing 2 to drop_caches. The 3GB then become even lower, which is
> expected, but then at least the buffers/cache built up again to ordinary
> values and the usual performance is restored after a few minutes.
> I have never seen this before, this happened after I switched the systems
> to newer ones, where the old ones had kernel 3.x, this behavior was never
> observed before.
>
> Do you have *any idea* at all what could be causing this? that issue is
> bugging me since over a month and seriously really disturbs everything I'm
> doing since lot of people access that data and all of them start to
> complain at some point where I see that the caches became useless at that
> time, having to drop them to rebuild again.
>
> Some guys in IRC suggested that his could be a fragmentation problem or
> something, or about slab shrinking.
Well, the page cache shouldn't really care about fragmentation because
single pages are used. Btw. what is the filesystem that you are using?
> The problem is that I can not reproduce this, I have to wait a while, maybe
> 2 days to observe that, until that the buffers/caches are fully in use and
> at some point they decrease within a few hours to those useless values.
> Sadly this is a production system and I can not play that much around,
> already causing downtime when dropping caches (populating caches needs
> maybe 5-10 minutes until the performance is ok again).
This doesn't really ring bells for me.
> Please tell me whatever info you need me to pastebin and when (before/after
> what event).
> Any hints are appreciated a lot, it really gives me lots of headache, since
> I am really busy with other things. Thank you very much!
Could you collect /proc/vmstat every few seconds over that time period?
Maybe it will tell us more.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-07-12 11:34 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-11 13:18 Marinko Catovic
2018-07-12 11:34 ` Michal Hocko [this message]
2018-07-13 15:48 ` Marinko Catovic
2018-07-16 15:53 ` Marinko Catovic
2018-07-16 16:23 ` Michal Hocko
2018-07-16 16:33 ` Marinko Catovic
2018-07-16 16:45 ` Michal Hocko
2018-07-20 22:03 ` Marinko Catovic
2018-07-27 11:15 ` Vlastimil Babka
2018-07-30 14:40 ` Michal Hocko
2018-07-30 22:08 ` Marinko Catovic
2018-08-02 16:15 ` Vlastimil Babka
2018-08-03 14:13 ` Marinko Catovic
2018-08-06 9:40 ` Vlastimil Babka
2018-08-06 10:29 ` Marinko Catovic
2018-08-06 12:00 ` Michal Hocko
2018-08-06 15:37 ` Christopher Lameter
2018-08-06 18:16 ` Michal Hocko
2018-08-09 8:29 ` Marinko Catovic
2018-08-21 0:36 ` Marinko Catovic
2018-08-21 6:49 ` Michal Hocko
2018-08-21 7:19 ` Vlastimil Babka
2018-08-22 20:02 ` Marinko Catovic
2018-08-23 12:10 ` Vlastimil Babka
2018-08-23 12:21 ` Michal Hocko
2018-08-24 0:11 ` Marinko Catovic
2018-08-24 6:34 ` Vlastimil Babka
2018-08-24 8:11 ` Marinko Catovic
2018-08-24 8:36 ` Vlastimil Babka
2018-08-29 14:54 ` Marinko Catovic
2018-08-29 15:01 ` Michal Hocko
2018-08-29 15:13 ` Marinko Catovic
2018-08-29 15:27 ` Michal Hocko
2018-08-29 16:44 ` Marinko Catovic
2018-10-22 1:19 ` Marinko Catovic
2018-10-23 17:41 ` Marinko Catovic
2018-10-26 5:48 ` Marinko Catovic
2018-10-26 8:01 ` Michal Hocko
2018-10-26 23:31 ` Marinko Catovic
2018-10-27 6:42 ` Michal Hocko
[not found] ` <6e3a9434-32f2-0388-e0c7-2bd1c2ebc8b1@suse.cz>
2018-10-30 15:30 ` Michal Hocko
2018-10-30 16:08 ` Marinko Catovic
2018-10-30 17:00 ` Vlastimil Babka
2018-10-30 18:26 ` Marinko Catovic
2018-10-31 7:34 ` Michal Hocko
2018-10-31 7:32 ` Michal Hocko
2018-10-31 13:40 ` Vlastimil Babka
2018-10-31 14:53 ` Marinko Catovic
2018-10-31 17:01 ` Michal Hocko
2018-10-31 19:21 ` Marinko Catovic
2018-11-01 13:23 ` Michal Hocko
2018-11-01 22:46 ` Marinko Catovic
2018-11-02 8:05 ` Michal Hocko
2018-11-02 11:31 ` Marinko Catovic
2018-11-02 11:49 ` Michal Hocko
2018-11-02 12:22 ` Vlastimil Babka
2018-11-02 12:41 ` Marinko Catovic
2018-11-02 13:13 ` Vlastimil Babka
2018-11-02 13:50 ` Marinko Catovic
2018-11-02 14:49 ` Vlastimil Babka
2018-11-02 14:59 ` Vlastimil Babka
2018-11-30 12:01 ` Marinko Catovic
2018-12-10 21:30 ` Marinko Catovic
2018-12-10 21:47 ` Michal Hocko
2018-10-31 13:12 ` Vlastimil Babka
2018-08-24 6:24 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180712113411.GB328@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=linux-mm@kvack.org \
--cc=marinko.catovic@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox