From: Jan Kara <jack@suse.cz>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>, linux-mm@kvack.org, peterz@infradead.org
Subject: Re: [PATCH 0/2 v2] Flexible proportions for BDIs
Date: Mon, 14 May 2012 23:28:03 +0200 [thread overview]
Message-ID: <20120514212803.GT5353@quack.suse.cz> (raw)
In-Reply-To: <20120513032952.GA8099@localhost>
On Sun 13-05-12 11:29:52, Wu Fengguang wrote:
> On Fri, May 11, 2012 at 10:51:14PM +0800, Fengguang Wu wrote:
> > > > > Look at the gray "bdi setpoint" lines. The
> > > > > VM_COMPLETIONS_PERIOD_LEN=8s kernel is able to achieve roughly the
> > > > > same stable bdi_setpoint as the vanilla kernel, while being able to
> > > > > adapt to the balanced bdi_setpoint much more fast (actually now the
> > > > > bdi_setpoint is immediately close to the balanced value when
> > > > > balance_dirty_pages() starts throttling, while the vanilla kernel
> > > > > takes about 20 seconds for bdi_setpoint to grow up).
> > > > Which graph is from which kernel? All four graphs have the same name so
> > > > I'm not sure...
> > >
> > > They are for test cases:
> > >
> > > 0.5s period
> > > bay/JBOD-2HDD-thresh=1000M/xfs-1dd-1-3.4.0-rc2-prop+/balance_dirty_pages-pages+.png
> > > 3s period
> > > bay/JBOD-2HDD-thresh=1000M/xfs-1dd-1-3.4.0-rc2-prop3+/balance_dirty_pages-pages+.png
> > > 8s period
> > > bay/JBOD-2HDD-thresh=1000M/xfs-1dd-1-3.4.0-rc2-prop8+/balance_dirty_pages-pages+.png
> > > vanilla
> > > bay/JBOD-2HDD-thresh=1000M/xfs-1dd-1-3.3.0/balance_dirty_pages-pages+.png
> > >
> > > > The faster (almost immediate) initial adaptation to bdi's writeout fraction
> > > > is mostly an effect of better normalization with my patches. Although it is
> > > > pleasant, it happens just at the moment when there is a small number of
> > > > periods with non-zero number of events. So more important for practice is
> > > > in my opininion to compare transition of computed fractions when workload
> > > > changes (i.e. we start writing to one bdi while writing to another bdi or
> > > > so).
> > >
> > > OK. I'll test this scheme and report back.
> > >
> > > loop {
> > > dd to disk 1 for 30s
> > > dd to disk 2 for 30s
> > > }
> >
> > Here are the new results. For simplicity I run the dd dirtiers
> > continuously, and start another dd reader to knock down the write
> > bandwidth from time to time:
> >
> > loop {
> > dd from disk 1 for 30s
> > dd from disk 2 for 30s
> > }
> >
> > The first attached iostat graph shows the resulting read/write
> > bandwidth for one of the two disks.
> >
> > The followed graphs are for
> > - 3s period
> > - 8s period
> > - vanilla
> > in order. The test case is (xfs-1dd, mem=2GB, 2 disks JBOD).
>
> Here are more results for another test box with mem=256G running 4
> SSDs. This time I run 8 dd readers to better disturb the writes.
>
> The first 3 graphs are for cases:
>
> lkp-nex04/alternant_read_8/xfs-10dd-2-3.4.0-rc5-prop3+
> lkp-nex04/alternant_read_8/xfs-10dd-2-3.4.0-rc5-prop8+
> lkp-nex04/alternant_read_8/xfs-10dd-2-3.3.0
>
> The last graph shows how the write bandwidth is squeezed by reads over time:
>
> lkp-nex04/alternant_read_8/xfs-10dd-2-3.4.0-rc5-prop8+/iostat-bw.png
>
> The observations for this box are
>
> - the 3s and 8s periods result in roughly the same adaption speed
>
> - the patch makes a really *big* difference in systems with big
> memory:bandwidth ratio. It's sweet! In comparison, the vanilla
> kernel adapts to new write bandwidth so much slower.
Yes, in this configuration the benefit of the new algorithm can be clearly
seen. Together with the results of previous test I'd say 3s period is the
best candidate.
Just I was thinking whether the period shouldn't be somehow set
automatically because I'm not convinced 3s will be right for everybody...
Maybe something based on how big fluctuations in completion rate we
observe. But it would be tricky given the load itself changes as well. So
for now we'll have to live with a hardwired period I guess.
Thanks for the tests Fengguang! So is anybody against merging this?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-05-14 21:28 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-03 22:39 Jan Kara
2012-05-03 22:39 ` [PATCH 1/2] lib: Proportions with flexible period Jan Kara
2012-05-03 22:39 ` [PATCH 2/2] block: Convert BDI proportion calculations to flexible proportions Jan Kara
2012-05-07 14:47 ` Fengguang Wu
2012-05-07 15:21 ` Peter Zijlstra
2012-05-09 11:38 ` Jan Kara
2012-05-07 14:43 ` [PATCH 0/2 v2] Flexible proportions for BDIs Fengguang Wu
2012-05-09 11:37 ` Jan Kara
2012-05-10 7:31 ` Fengguang Wu
2012-05-11 14:51 ` Fengguang Wu
2012-05-13 3:29 ` Fengguang Wu
2012-05-14 21:28 ` Jan Kara [this message]
2012-05-15 11:12 ` Peter Zijlstra
2012-05-15 15:14 ` Jan Kara
2012-05-15 13:15 ` Fengguang Wu
2012-05-14 21:12 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120514212803.GT5353@quack.suse.cz \
--to=jack@suse.cz \
--cc=fengguang.wu@intel.com \
--cc=linux-mm@kvack.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox