From: Mel Gorman <mgorman@techsingularity.net>
To: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@kernel.org>,
linux-mm@kvack.org, linux-xfs@vger.kernel.org,
Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH] [Regression, v5.0] mm: boosted kswapd reclaim b0rks system cache balance
Date: Thu, 8 Aug 2019 00:48:15 +0100 [thread overview]
Message-ID: <20190807234815.GJ2739@techsingularity.net> (raw)
In-Reply-To: <20190807223241.GO7777@dread.disaster.area>
On Thu, Aug 08, 2019 at 08:32:41AM +1000, Dave Chinner wrote:
> On Wed, Aug 07, 2019 at 09:56:15PM +0100, Mel Gorman wrote:
> > On Wed, Aug 07, 2019 at 04:03:16PM +0100, Mel Gorman wrote:
> > > <SNIP>
> > >
> > > On that basis, it may justify ripping out the may_shrinkslab logic
> > > everywhere. The downside is that some microbenchmarks will notice.
> > > Specifically IO benchmarks that fill memory and reread (particularly
> > > rereading the metadata via any inode operation) may show reduced
> > > results. Such benchmarks can be strongly affected by whether the inode
> > > information is still memory resident and watermark boosting reduces
> > > the changes the data is still resident in memory. Technically still a
> > > regression but a tunable one.
> > >
> > > Hence the following "it builds" patch that has zero supporting data on
> > > whether it's a good idea or not.
> > >
> >
> > This is a more complete version of the same patch that summaries the
> > problem and includes data from my own testing
> ....
> > A fsmark benchmark configuration was constructed similar to
> > what Dave reported and is codified by the mmtest configuration
> > config-io-fsmark-small-file-stream. It was evaluated on a 1-socket machine
> > to avoid dealing with NUMA-related issues and the timing of reclaim. The
> > storage was an SSD Samsung Evo and a fresh XFS filesystem was used for
> > the test data.
>
> Have you run fstrim on that drive recently? I'm running these tests
> on a 960 EVO ssd, and when I started looking at shrinkers 3 weeks
> ago I had all sorts of whacky performance problems and inconsistent
> results. Turned out there were all sorts of random long IO latencies
> occurring (in the hundreds of milliseconds) because the drive was
> constantly running garbage collection to free up space. As a result
> it was both blocking on GC and thermal throttling under these fsmark
> workloads.
>
No, I was under the impression that making a new filesystem typically
trimmed it as well. Maybe that's just some filesystems (e.g. ext4) or
just completely wrong.
> I made a new XFS filesystem on it (lazy man's rm -rf *),
Ah, all IO tests I do make a new filesystem. I know there is the whole
problem of filesystem aging but I've yet to come across a methodology
that two people can agree on that is a sensible, reproducible method.
> then ran
> fstrim on it to tell the drive all the space is free. Drive temps
> dropped 30C immediately, and all of the whacky performance anomolies
> went away. I now fstrim the drive in my vm startup scripts before
> each test run, and it's giving consistent results again.
>
I'll replicate that if making a new filesystem is not guaranteed to
trim. It'll muck up historical data but that happens to me every so
often anyway.
> > It is likely that the test configuration is not a proper match for Dave's
> > test as the results are different in terms of performance. However, my
> > configuration reports fsmark performance every 10% of memory worth of
> > files and I suspect Dave's configuration reported Files/sec when memory
> > was already full. THP was enabled for mine, disabled for Dave's and
> > probably a whole load of other methodology differences that rarely
> > get recorded properly.
>
> Yup, like I forgot to mention that my test system is using a 4-node
> fakenuma setup (i.e. 4 nodes, 4GB RAM and 4 CPUs per node, so
> there are 4 separate kswapd's doing concurrent reclaim). That
> changes reclaim patterns as well.
>
Good to know. In this particular case, I don't think I need to exactly
replicate what you have given that the slam reclaim behaviour is
definitely more consistent and the ratios of slab/pagecache are
predictable.
>
> > fsmark
> > 5.3.0-rc3 5.3.0-rc3
> > vanilla shrinker-v1r1
> > Min 1-files/sec 5181.70 ( 0.00%) 3204.20 ( -38.16%)
> > 1st-qrtle 1-files/sec 14877.10 ( 0.00%) 6596.90 ( -55.66%)
> > 2nd-qrtle 1-files/sec 6521.30 ( 0.00%) 5707.80 ( -12.47%)
> > 3rd-qrtle 1-files/sec 5614.30 ( 0.00%) 5363.80 ( -4.46%)
> > Max-1 1-files/sec 18463.00 ( 0.00%) 18479.90 ( 0.09%)
> > Max-5 1-files/sec 18028.40 ( 0.00%) 17829.00 ( -1.11%)
> > Max-10 1-files/sec 17502.70 ( 0.00%) 17080.90 ( -2.41%)
> > Max-90 1-files/sec 5438.80 ( 0.00%) 5106.60 ( -6.11%)
> > Max-95 1-files/sec 5390.30 ( 0.00%) 5020.40 ( -6.86%)
> > Max-99 1-files/sec 5271.20 ( 0.00%) 3376.20 ( -35.95%)
> > Max 1-files/sec 18463.00 ( 0.00%) 18479.90 ( 0.09%)
> > Hmean 1-files/sec 7459.11 ( 0.00%) 6249.49 ( -16.22%)
> > Stddev 1-files/sec 4733.16 ( 0.00%) 4362.10 ( 7.84%)
> > CoeffVar 1-files/sec 51.66 ( 0.00%) 57.49 ( -11.29%)
> > BHmean-99 1-files/sec 7515.09 ( 0.00%) 6351.81 ( -15.48%)
> > BHmean-95 1-files/sec 7625.39 ( 0.00%) 6486.09 ( -14.94%)
> > BHmean-90 1-files/sec 7803.19 ( 0.00%) 6588.61 ( -15.57%)
> > BHmean-75 1-files/sec 8518.74 ( 0.00%) 6954.25 ( -18.37%)
> > BHmean-50 1-files/sec 10953.31 ( 0.00%) 8017.89 ( -26.80%)
> > BHmean-25 1-files/sec 16732.38 ( 0.00%) 11739.65 ( -29.84%)
> >
> > 5.3.0-rc3 5.3.0-rc3
> > vanillashrinker-v1r1
> > Duration User 77.29 89.09
> > Duration System 1097.13 1332.86
> > Duration Elapsed 2014.14 2596.39
>
> I'm not sure we are testing or measuring exactly the same things :)
>
Probably not.
> > This is showing that fsmark runs slower as a result of this patch but
> > there are other important observations that justify the patch.
> >
> > 1. With the vanilla kernel, the number of dirty pages in the system
> > is very low for much of the test. With this patch, dirty pages
> > is generally kept at 10% which matches vm.dirty_background_ratio
> > which is normal expected historical behaviour.
> >
> > 2. With the vanilla kernel, the ratio of Slab/Pagecache is close to
> > 0.95 for much of the test i.e. Slab is being left alone and dominating
> > memory consumption. With the patch applied, the ratio varies between
> > 0.35 and 0.45 with the bulk of the measured ratios roughly half way
> > between those values. This is a different balance to what Dave reported
> > but it was at least consistent.
>
> Yeah, the balance is typically a bit different for different configs
> and storage. The trick is getting the balance to be roughly
> consistent across a range of different configs. The fakenuma setup
> also has a significant impact on where the balance is found. And I
> can't remember if the "fixed" memory usage numbers I quoted came
> from a run with my "make XFS inode reclaim nonblocking" patchset or
> not.
>
Again, I wouldn't sweat too much about it. The generated graphs
definitely showed more consistent behaviour even if the headline
performance was not improved.
> > 3. Slabs are scanned throughout the entire test with the patch applied.
> > The vanille kernel has long periods with no scan activity and then
> > relatively massive spikes.
> >
> > 4. Overall vmstats are closer to normal expectations
> >
> > 5.3.0-rc3 5.3.0-rc3
> > vanilla shrinker-v1r1
> > Direct pages scanned 60308.00 5226.00
> > Kswapd pages scanned 18316110.00 12295574.00
> > Kswapd pages reclaimed 13121037.00 7280152.00
> > Direct pages reclaimed 11817.00 5226.00
> > Kswapd efficiency % 71.64 59.21
> > Kswapd velocity 9093.76 4735.64
> > Direct efficiency % 19.59 100.00
> > Direct velocity 29.94 2.01
> > Page reclaim immediate 247921.00 0.00
> > Slabs scanned 16602344.00 29369536.00
> > Direct inode steals 1574.00 800.00
> > Kswapd inode steals 130033.00 3968788.00
> > Kswapd skipped wait 0.00 0.00
>
> That looks a lot better. Patch looks reasonable, though I'm
> interested to know what impact it has on tests you ran in the
> original commit for the boosting.
>
I'll find out soon enough but I'm leaning on the side that kswapd reclaim
should be predictable and that even if there are some performance problems
as a result of it, there will be others that see a gain. It'll be a case
of "no matter what way you jump, someone shouts" but kswapd having spiky
unpredictable behaviour is a recipe for "sometimes my machine is crap
and I've no idea why".
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2019-08-07 23:48 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-07 9:18 Dave Chinner
2019-08-07 9:30 ` Michal Hocko
2019-08-07 15:03 ` Mel Gorman
2019-08-07 20:56 ` Mel Gorman
2019-08-07 22:32 ` Dave Chinner
2019-08-07 23:48 ` Mel Gorman [this message]
2019-08-08 0:26 ` Dave Chinner
2019-08-08 15:36 ` Christoph Hellwig
2019-08-08 17:04 ` Mel Gorman
2019-08-07 22:08 ` Dave Chinner
2019-08-07 22:33 ` Dave Chinner
2019-08-07 23:55 ` Mel Gorman
2019-08-08 0:30 ` Dave Chinner
2019-08-08 5:51 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190807234815.GJ2739@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=david@fromorbit.com \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox