From: Michal Hocko <mhocko@kernel.org>
To: Steven Haigh <netwiz@crc.id.au>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
bugzilla-daemon@bugzilla.kernel.org
Subject: Re: [Bug 196729] New: System becomes unresponsive when swapping - Regression since 4.10.x
Date: Thu, 24 Aug 2017 14:41:39 +0200 [thread overview]
Message-ID: <20170824124139.GJ5943@dhcp22.suse.cz> (raw)
In-Reply-To: <3069262.adKtTK0b29@wopr.lan.crc.id.au>
On Thu 24-08-17 00:30:40, Steven Haigh wrote:
> On Wednesday, 23 August 2017 11:38:48 PM AEST Michal Hocko wrote:
> > On Tue 22-08-17 15:55:30, Andrew Morton wrote:
> > > (switched to email. Please respond via emailed reply-to-all, not via the
> > > bugzilla web interface).
> >
> > > On Tue, 22 Aug 2017 11:17:08 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> > [...]
> >
> > > Sadly I haven't been able to capture this information
> > >
> > > > fully yet due to said unresponsiveness.
> >
> > Please try to collect /proc/vmstat in the bacground and provide the
> > collected data. Something like
> >
> > while true
> > do
> > cp /proc/vmstat > vmstat.$(date +%s)
> > sleep 1s
> > done
> >
> > If the system turns out so busy that it won't be able to fork a process
> > or write the output (which you will see by checking timestamps of files
> > and looking for holes) then you can try the attached proggy
> > ./read_vmstat output_file timeout output_size
> >
> > Note you might need to increase the mlock rlimit to lock everything into
> > memory.
>
> Thanks Michal,
>
> I have upgraded PCs since I initially put together this data - however I was
> able to get strange behaviour by pulling out an 8Gb RAM stick in my new system
> - leaving it with only 8Gb of RAM.
>
> All these tests are performed with Fedora 26 and kernel 4.12.8-300.fc26.x86_64
>
> I have attached 3 files with output.
>
> 8Gb-noswap.tar.gz contains the output of /proc/vmstat running on 8Gb of RAM
> with no swap. Under this scenario, I was expecting the OOM reaper to just kill
> the game when memory allocated became too high for the amount of physical RAM.
> Interestingly, you'll notice a massive hang in the output before the game is
> terminated. I didn't see this before.
I have checked few gaps. E.g. vmstat.1503496391 vmstat.1503496451 which
is one minute. The most notable thing is that there are only very few
pagecache pages
[base] [diff]
nr_active_file 1641 3345
nr_inactive_file 1630 4787
So there is not much to reclaim without swap. The more important thing
is that we keep reclaiming and refaulting that memory
workingset_activate 5905591 1616391
workingset_refault 33412538 10302135
pgactivate 42279686 13219593
pgdeactivate 48175757 14833350
pgscan_kswapd 379431778 126407849
pgsteal_kswapd 49751559 13322930
so we are effectivelly trashing over the very small amount of
reclaimable memory. This is something that we cannot detect right now.
It is even questionable whether the OOM killer would be an appropriate
action. Your system has recovered and then it is always hard to decide
whether a disruptive action is more appropriate. One minute of
unresponsiveness is certainly annoying though. Your system is obviously
under provisioned to load you want to run obviously.
It is quite interesting to see that we do not really have too many
direct reclaimers during this time period
allocstall_normal 30 1
allocstall_movable 490 88
pgscan_direct_throttle 0 0
pgsteal_direct 24434 4069
pgscan_direct 38678 5868
> 8Gb-swap-on-file.tar.gz contains the output of /proc/vmstat still with 8Gb of
> RAM - but creating a file with swap on the PCIe SSD /swapfile with size 8Gb
> via:
> # dd if=/dev/zero of=/swapfile bs=1G count=8
> # mkswap /swapfile
> # swapon /swapfile
>
> Some times (all in UTC+10):
> 23:58:30 - Start loading the saved game
> 23:59:38 - Load ok, all running fine
> 00:00:15 - Load Chrome
> 00:01:00 - Quit the game
>
> The game seemed to run ok with no real issue - and a lot was swapped to the
> swap file. I'm wondering if it was purely the speed of the PCIe SSD that
> caused this appearance - as the creation of the file with dd completed at
> ~1.4GB/sec.
Swap IO tends to be really scattered and the IO performance is not really
great even on a fast storage AFAIK.
Anyway your original report sounded like a regression. Were you able to
run the _same_ workload on an older kernel without these issues?
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-08-24 12:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-196729-27@https.bugzilla.kernel.org/>
2017-08-22 22:55 ` Andrew Morton
2017-08-23 13:38 ` Michal Hocko
2017-08-23 14:30 ` Steven Haigh
2017-08-24 12:41 ` Michal Hocko [this message]
2017-08-24 14:19 ` Steven Haigh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170824124139.GJ5943@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bugzilla-daemon@bugzilla.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netwiz@crc.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox