From: Dave Chinner <david@fromorbit.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, linux-xfs@vger.kernel.org,
Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH] [Regression, v5.0] mm: boosted kswapd reclaim b0rks system cache balance
Date: Thu, 8 Aug 2019 08:32:41 +1000 [thread overview]
Message-ID: <20190807223241.GO7777@dread.disaster.area> (raw)
In-Reply-To: <20190807205615.GI2739@techsingularity.net>
On Wed, Aug 07, 2019 at 09:56:15PM +0100, Mel Gorman wrote:
> On Wed, Aug 07, 2019 at 04:03:16PM +0100, Mel Gorman wrote:
> > <SNIP>
> >
> > On that basis, it may justify ripping out the may_shrinkslab logic
> > everywhere. The downside is that some microbenchmarks will notice.
> > Specifically IO benchmarks that fill memory and reread (particularly
> > rereading the metadata via any inode operation) may show reduced
> > results. Such benchmarks can be strongly affected by whether the inode
> > information is still memory resident and watermark boosting reduces
> > the changes the data is still resident in memory. Technically still a
> > regression but a tunable one.
> >
> > Hence the following "it builds" patch that has zero supporting data on
> > whether it's a good idea or not.
> >
>
> This is a more complete version of the same patch that summaries the
> problem and includes data from my own testing
....
> A fsmark benchmark configuration was constructed similar to
> what Dave reported and is codified by the mmtest configuration
> config-io-fsmark-small-file-stream. It was evaluated on a 1-socket machine
> to avoid dealing with NUMA-related issues and the timing of reclaim. The
> storage was an SSD Samsung Evo and a fresh XFS filesystem was used for
> the test data.
Have you run fstrim on that drive recently? I'm running these tests
on a 960 EVO ssd, and when I started looking at shrinkers 3 weeks
ago I had all sorts of whacky performance problems and inconsistent
results. Turned out there were all sorts of random long IO latencies
occurring (in the hundreds of milliseconds) because the drive was
constantly running garbage collection to free up space. As a result
it was both blocking on GC and thermal throttling under these fsmark
workloads.
I made a new XFS filesystem on it (lazy man's rm -rf *), then ran
fstrim on it to tell the drive all the space is free. Drive temps
dropped 30C immediately, and all of the whacky performance anomolies
went away. I now fstrim the drive in my vm startup scripts before
each test run, and it's giving consistent results again.
> It is likely that the test configuration is not a proper match for Dave's
> test as the results are different in terms of performance. However, my
> configuration reports fsmark performance every 10% of memory worth of
> files and I suspect Dave's configuration reported Files/sec when memory
> was already full. THP was enabled for mine, disabled for Dave's and
> probably a whole load of other methodology differences that rarely
> get recorded properly.
Yup, like I forgot to mention that my test system is using a 4-node
fakenuma setup (i.e. 4 nodes, 4GB RAM and 4 CPUs per node, so
there are 4 separate kswapd's doing concurrent reclaim). That
changes reclaim patterns as well.
> fsmark
> 5.3.0-rc3 5.3.0-rc3
> vanilla shrinker-v1r1
> Min 1-files/sec 5181.70 ( 0.00%) 3204.20 ( -38.16%)
> 1st-qrtle 1-files/sec 14877.10 ( 0.00%) 6596.90 ( -55.66%)
> 2nd-qrtle 1-files/sec 6521.30 ( 0.00%) 5707.80 ( -12.47%)
> 3rd-qrtle 1-files/sec 5614.30 ( 0.00%) 5363.80 ( -4.46%)
> Max-1 1-files/sec 18463.00 ( 0.00%) 18479.90 ( 0.09%)
> Max-5 1-files/sec 18028.40 ( 0.00%) 17829.00 ( -1.11%)
> Max-10 1-files/sec 17502.70 ( 0.00%) 17080.90 ( -2.41%)
> Max-90 1-files/sec 5438.80 ( 0.00%) 5106.60 ( -6.11%)
> Max-95 1-files/sec 5390.30 ( 0.00%) 5020.40 ( -6.86%)
> Max-99 1-files/sec 5271.20 ( 0.00%) 3376.20 ( -35.95%)
> Max 1-files/sec 18463.00 ( 0.00%) 18479.90 ( 0.09%)
> Hmean 1-files/sec 7459.11 ( 0.00%) 6249.49 ( -16.22%)
> Stddev 1-files/sec 4733.16 ( 0.00%) 4362.10 ( 7.84%)
> CoeffVar 1-files/sec 51.66 ( 0.00%) 57.49 ( -11.29%)
> BHmean-99 1-files/sec 7515.09 ( 0.00%) 6351.81 ( -15.48%)
> BHmean-95 1-files/sec 7625.39 ( 0.00%) 6486.09 ( -14.94%)
> BHmean-90 1-files/sec 7803.19 ( 0.00%) 6588.61 ( -15.57%)
> BHmean-75 1-files/sec 8518.74 ( 0.00%) 6954.25 ( -18.37%)
> BHmean-50 1-files/sec 10953.31 ( 0.00%) 8017.89 ( -26.80%)
> BHmean-25 1-files/sec 16732.38 ( 0.00%) 11739.65 ( -29.84%)
>
> 5.3.0-rc3 5.3.0-rc3
> vanillashrinker-v1r1
> Duration User 77.29 89.09
> Duration System 1097.13 1332.86
> Duration Elapsed 2014.14 2596.39
I'm not sure we are testing or measuring exactly the same things :)
> This is showing that fsmark runs slower as a result of this patch but
> there are other important observations that justify the patch.
>
> 1. With the vanilla kernel, the number of dirty pages in the system
> is very low for much of the test. With this patch, dirty pages
> is generally kept at 10% which matches vm.dirty_background_ratio
> which is normal expected historical behaviour.
>
> 2. With the vanilla kernel, the ratio of Slab/Pagecache is close to
> 0.95 for much of the test i.e. Slab is being left alone and dominating
> memory consumption. With the patch applied, the ratio varies between
> 0.35 and 0.45 with the bulk of the measured ratios roughly half way
> between those values. This is a different balance to what Dave reported
> but it was at least consistent.
Yeah, the balance is typically a bit different for different configs
and storage. The trick is getting the balance to be roughly
consistent across a range of different configs. The fakenuma setup
also has a significant impact on where the balance is found. And I
can't remember if the "fixed" memory usage numbers I quoted came
from a run with my "make XFS inode reclaim nonblocking" patchset or
not.
> 3. Slabs are scanned throughout the entire test with the patch applied.
> The vanille kernel has long periods with no scan activity and then
> relatively massive spikes.
>
> 4. Overall vmstats are closer to normal expectations
>
> 5.3.0-rc3 5.3.0-rc3
> vanilla shrinker-v1r1
> Direct pages scanned 60308.00 5226.00
> Kswapd pages scanned 18316110.00 12295574.00
> Kswapd pages reclaimed 13121037.00 7280152.00
> Direct pages reclaimed 11817.00 5226.00
> Kswapd efficiency % 71.64 59.21
> Kswapd velocity 9093.76 4735.64
> Direct efficiency % 19.59 100.00
> Direct velocity 29.94 2.01
> Page reclaim immediate 247921.00 0.00
> Slabs scanned 16602344.00 29369536.00
> Direct inode steals 1574.00 800.00
> Kswapd inode steals 130033.00 3968788.00
> Kswapd skipped wait 0.00 0.00
That looks a lot better. Patch looks reasonable, though I'm
interested to know what impact it has on tests you ran in the
original commit for the boosting.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2019-08-07 22:33 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-07 9:18 Dave Chinner
2019-08-07 9:30 ` Michal Hocko
2019-08-07 15:03 ` Mel Gorman
2019-08-07 20:56 ` Mel Gorman
2019-08-07 22:32 ` Dave Chinner [this message]
2019-08-07 23:48 ` Mel Gorman
2019-08-08 0:26 ` Dave Chinner
2019-08-08 15:36 ` Christoph Hellwig
2019-08-08 17:04 ` Mel Gorman
2019-08-07 22:08 ` Dave Chinner
2019-08-07 22:33 ` Dave Chinner
2019-08-07 23:55 ` Mel Gorman
2019-08-08 0:30 ` Dave Chinner
2019-08-08 5:51 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190807223241.GO7777@dread.disaster.area \
--to=david@fromorbit.com \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox