From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f70.google.com (mail-pg0-f70.google.com [74.125.83.70]) by kanga.kvack.org (Postfix) with ESMTP id 270AE2808A4 for ; Thu, 24 Aug 2017 10:20:08 -0400 (EDT) Received: by mail-pg0-f70.google.com with SMTP id 185so3807581pgd.6 for ; Thu, 24 Aug 2017 07:20:08 -0700 (PDT) Received: from mail.crc.id.au (mail.crc.id.au. [2407:e400:b000:200::25]) by mx.google.com with ESMTPS id h132si2913223pfe.209.2017.08.24.07.20.06 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 24 Aug 2017 07:20:06 -0700 (PDT) From: Steven Haigh Subject: Re: [Bug 196729] New: System becomes unresponsive when swapping - Regression since 4.10.x Date: Fri, 25 Aug 2017 00:19:56 +1000 Message-ID: <3167565.Qac206UD3A@wopr.lan.crc.id.au> In-Reply-To: <20170824124139.GJ5943@dhcp22.suse.cz> References: <3069262.adKtTK0b29@wopr.lan.crc.id.au> <20170824124139.GJ5943@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart5469704.DWktjsOJT5"; micalg="pgp-sha256"; protocol="application/pgp-signature" Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: linux-mm@kvack.org, Andrew Morton , bugzilla-daemon@bugzilla.kernel.org --nextPart5469704.DWktjsOJT5 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" On Thursday, 24 August 2017 10:41:39 PM AEST Michal Hocko wrote: > On Thu 24-08-17 00:30:40, Steven Haigh wrote: > > On Wednesday, 23 August 2017 11:38:48 PM AEST Michal Hocko wrote: > > > On Tue 22-08-17 15:55:30, Andrew Morton wrote: > > > > (switched to email. Please respond via emailed reply-to-all, not v= ia > > > > the > > > > bugzilla web interface). > > > >=20 > > > > On Tue, 22 Aug 2017 11:17:08 +0000 bugzilla-daemon@bugzilla.kernel.= org > >=20 > > wrote: > > > [...] > > >=20 > > > > Sadly I haven't been able to capture this information > > > >=20 > > > > > fully yet due to said unresponsiveness. > > >=20 > > > Please try to collect /proc/vmstat in the bacground and provide the > > > collected data. Something like > > >=20 > > > while true > > > do > > >=20 > > > cp /proc/vmstat > vmstat.$(date +%s) > > > sleep 1s > > >=20 > > > done > > >=20 > > > If the system turns out so busy that it won't be able to fork a proce= ss > > > or write the output (which you will see by checking timestamps of fil= es > > > and looking for holes) then you can try the attached proggy > > > ./read_vmstat output_file timeout output_size > > >=20 > > > Note you might need to increase the mlock rlimit to lock everything i= nto > > > memory. > >=20 > > Thanks Michal, > >=20 > > I have upgraded PCs since I initially put together this data - however I > > was able to get strange behaviour by pulling out an 8Gb RAM stick in my > > new system - leaving it with only 8Gb of RAM. > >=20 > > All these tests are performed with Fedora 26 and kernel > > 4.12.8-300.fc26.x86_64 > >=20 > > I have attached 3 files with output. > >=20 > > 8Gb-noswap.tar.gz contains the output of /proc/vmstat running on 8Gb of > > RAM > > with no swap. Under this scenario, I was expecting the OOM reaper to ju= st > > kill the game when memory allocated became too high for the amount of > > physical RAM. Interestingly, you'll notice a massive hang in the output > > before the game is terminated. I didn't see this before. >=20 > I have checked few gaps. E.g. vmstat.1503496391 vmstat.1503496451 which > is one minute. The most notable thing is that there are only very few > pagecache pages > [base] [diff] > nr_active_file 1641 3345 > nr_inactive_file 1630 4787 >=20 > So there is not much to reclaim without swap. The more important thing > is that we keep reclaiming and refaulting that memory >=20 > workingset_activate 5905591 1616391 > workingset_refault 33412538 10302135 > pgactivate 42279686 13219593 > pgdeactivate 48175757 14833350 >=20 > pgscan_kswapd 379431778 126407849 > pgsteal_kswapd 49751559 13322930 >=20 > so we are effectivelly trashing over the very small amount of > reclaimable memory. This is something that we cannot detect right now. > It is even questionable whether the OOM killer would be an appropriate > action. Your system has recovered and then it is always hard to decide > whether a disruptive action is more appropriate. One minute of > unresponsiveness is certainly annoying though. Your system is obviously > under provisioned to load you want to run obviously. >=20 > It is quite interesting to see that we do not really have too many > direct reclaimers during this time period > allocstall_normal 30 1 > allocstall_movable 490 88 > pgscan_direct_throttle 0 0 > pgsteal_direct 24434 4069 > pgscan_direct 38678 5868 Yes, I understand that the system is really not suitable - however I believ= e=20 the test is useful - even from an informational point of view :) > > 8Gb-swap-on-file.tar.gz contains the output of /proc/vmstat still with = 8Gb > > of RAM - but creating a file with swap on the PCIe SSD /swapfile with > > size 8Gb>=20 > > via: > > # dd if=3D/dev/zero of=3D/swapfile bs=3D1G count=3D8 > > # mkswap /swapfile > > # swapon /swapfile > >=20 > > Some times (all in UTC+10): > > 23:58:30 - Start loading the saved game > > 23:59:38 - Load ok, all running fine > > 00:00:15 - Load Chrome > > 00:01:00 - Quit the game > >=20 > > The game seemed to run ok with no real issue - and a lot was swapped to > > the > > swap file. I'm wondering if it was purely the speed of the PCIe SSD that > > caused this appearance - as the creation of the file with dd completed = at > > ~1.4GB/sec. >=20 > Swap IO tends to be really scattered and the IO performance is not really > great even on a fast storage AFAIK. >=20 > Anyway your original report sounded like a regression. Were you able to > run the _same_ workload on an older kernel without these issues? When I try the same tests with swap on an SSD under kernel 4.10.x (I believ= e=20 the latest I tried was 4.10.25?) - then swap using the SSD did not cause an= y=20 issues or periods of system unresponsiveness. The file attached in the original bug report "vmstat-4.10.17-10Gb.log" was= =20 taken on my old system with 10Gb of RAM - and there were no significant pau= ses=20 while swapping. I do find it interesting that the newer '8Gb-swap-on-file.tar.gz' does not= =20 show any issues. I wonder if it would be helpful to attempt the same using = a=20 file on the SSD that was a swap disk in the '8Gb-swap-on-ssd.tar.gz' so we= =20 have a constant device - but with a file on the SSD instead of the entire=20 block device. That would at least expose any issues on the same device in f= ile=20 vs block mode? Or maybe even if there's a difference just having the file o= n a=20 much (much!) faster drive? =2D-=20 Steven Haigh =F0=9F=93=A7 netwiz@crc.id.au =F0=9F=92=BB http://www.crc.id.au =F0=9F=93=9E +61 (3) 9001 6090 =F0=9F=93=B1 0412 935 897 --nextPart5469704.DWktjsOJT5 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEFHf0gfgNrH6ofcYGQa811Xp9MdwFAlme4IwACgkQQa811Xp9 MdwTTA/9FGVwUpWQ9/IydveksU1IAts03jQQx4YztA2MZVsDwClWjo7W73UXDB2/ 1d633b3GjK0OrSmSYli2iMJq+G41jRnM2gU5mkCXSiQKpymjywD2RiqShMl7+hQ2 svcWkylATJbRLkjpoZ27a2vg8S/GH8IzMvGHBXCsVGU6bgNis9WiuINNw0qf6qJd RYY9L9PztQI/5n96VrQyb31WHsJWJhatTu/0S3LRTKFnLR30OPWrTthXRuAXHSsp Xp8ze1HTstmIeDiwhnelMvFCWTxVVPjGUc0VFv3LLqvLhBVIj9NVNPsdlIYhy+iq qmHp7+meNGt0h+AX+VgLAbyPQ/WcuUaXaYS07H6YAzMqSY2YKwu2PYDA4g7UilF3 pgNfM6qhvz8cUWcwj4e1wK7KyW6NrXDgbw+mpubuLdGCPrAZuBzT4c7aT7TQ9SE0 yyh1bqc4j5jzdXTEwnPQEf2VIl00SFOEBH75dUnfm00pdGt27KLWpsOfcl3yyp1d 7rqh6S3YRxDaZ0aHseLb8NkL/YkMayUiN77SZojoKvQqR18qFgXtbg09wZQGXnuz JzMmaVJr0mgtmP/fmTdW12dFCw8KquSR38fvZqNYOBVyu10amfmwS0LY+HL6iigB oVdeKX80D3aepq/dhMGbrnLt+BDXeYSQ1Bn3bqAFbqZNAEnRHo4= =VE3c -----END PGP SIGNATURE----- --nextPart5469704.DWktjsOJT5-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org