[RFC][PATCH] mm: stop balance_dirty_pages doing too much work

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
@ 2009-06-24 10:38 Richard Kennedy
  2009-06-24 22:27 ` Andrew Morton
  2009-08-07 12:20 ` Peter Zijlstra
  0 siblings, 2 replies; 22+ messages in thread
From: Richard Kennedy @ 2009-06-24 10:38 UTC (permalink / raw)
  To: Andrew Morton, a.p.zijlstra; +Cc: linux-mm, lkml

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

When writing to 2 (or more) devices at the same time, stop
balance_dirty_pages moving dirty pages to writeback when it has reached
the bdi threshold. This prevents balance_dirty_pages overshooting its
limits and moving all dirty pages to writeback.     

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
---
balance_dirty_pages can overreact and move all of the dirty pages to
writeback unnecessarily.

balance_dirty_pages makes its decision to throttle based on the number
of dirty plus writeback pages that are over the calculated limit,so it
will continue to move pages even when there are plenty of pages in
writeback and less than the threshold still dirty.

This allows it to overshoot its limits and move all the dirty pages to
writeback while waiting for the drives to catch up and empty the
writeback list. 

A simple fio test easily demonstrates this problem.  

fio --name=f1 --directory=/disk1 --size=2G -rw=write
	--name=f2 --directory=/disk2 --size=1G --rw=write 		--startdelay=10

The attached graph before.png shows how all pages are moved to writeback
as the second write starts and the throttling kicks in.

after.png is the same test with the patch applied, which clearly shows
that it keeps dirty_background_ratio dirty pages in the buffer.
The values and timings of the graphs are only approximate but are good
enough to show the behaviour.  

This is the simplest fix I could find, but I'm not entirely sure that it
alone will be enough for all cases. But it certainly is an improvement
on my desktop machine writing to 2 disks.

Do we need something more for machines with large arrays where
bdi_threshold * number_of_drives is greater than the dirty_ratio ?

regards
Richard

This patch against 2.6.30 git latest, built and tested on x86_64. 

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7b0dcea..7687879 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
 		 * filesystems (i.e. NFS) in which data may have been
 		 * written to the server's write cache, but has not yet
 		 * been flushed to permanent storage.
+		 * Only move pages to writeback if this bdi is over its
+		 * threshold otherwise wait until the disk writes catch
+		 * up.
 		 */
-		if (bdi_nr_reclaimable) {
+		if (bdi_nr_reclaimable > bdi_thresh) {
 			writeback_inodes(&wbc);
 			pages_written += write_chunk - wbc.nr_to_write;
 			get_dirty_limits(&background_thresh, &dirty_thresh,

[-- Attachment #2: before.png --]
[-- Type: image/png, Size: 2613 bytes --]

[-- Attachment #3: after.png --]
[-- Type: image/png, Size: 2247 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-24 10:38 [RFC][PATCH] mm: stop balance_dirty_pages doing too much work Richard Kennedy
@ 2009-06-24 22:27 ` Andrew Morton
  2009-06-25  5:13   ` Jens Axboe
  2009-06-25  8:00   ` Peter Zijlstra
  2009-08-07 12:20 ` Peter Zijlstra
  1 sibling, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2009-06-24 22:27 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: a.p.zijlstra, linux-mm, linux-kernel, Jens Axboe

On Wed, 24 Jun 2009 11:38:24 +0100
Richard Kennedy <richard@rsk.demon.co.uk> wrote:

> When writing to 2 (or more) devices at the same time, stop
> balance_dirty_pages moving dirty pages to writeback when it has reached
> the bdi threshold. This prevents balance_dirty_pages overshooting its
> limits and moving all dirty pages to writeback.     
> 
>     
> Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> ---
> balance_dirty_pages can overreact and move all of the dirty pages to
> writeback unnecessarily.
> 
> balance_dirty_pages makes its decision to throttle based on the number
> of dirty plus writeback pages that are over the calculated limit,so it
> will continue to move pages even when there are plenty of pages in
> writeback and less than the threshold still dirty.
> 
> This allows it to overshoot its limits and move all the dirty pages to
> writeback while waiting for the drives to catch up and empty the
> writeback list. 
> 
> A simple fio test easily demonstrates this problem.  
> 
> fio --name=f1 --directory=/disk1 --size=2G -rw=write
> 	--name=f2 --directory=/disk2 --size=1G --rw=write 		--startdelay=10
> 
> The attached graph before.png shows how all pages are moved to writeback
> as the second write starts and the throttling kicks in.
> 
> after.png is the same test with the patch applied, which clearly shows
> that it keeps dirty_background_ratio dirty pages in the buffer.
> The values and timings of the graphs are only approximate but are good
> enough to show the behaviour.  
> 
> This is the simplest fix I could find, but I'm not entirely sure that it
> alone will be enough for all cases. But it certainly is an improvement
> on my desktop machine writing to 2 disks.
> 
> Do we need something more for machines with large arrays where
> bdi_threshold * number_of_drives is greater than the dirty_ratio ?
> 

um.  Interesting find.  Jens, was any of your performance testing using
multiple devices?  If so, it looks like the results just got invalidated :)

> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 7b0dcea..7687879 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
>  		 * filesystems (i.e. NFS) in which data may have been
>  		 * written to the server's write cache, but has not yet
>  		 * been flushed to permanent storage.
> +		 * Only move pages to writeback if this bdi is over its
> +		 * threshold otherwise wait until the disk writes catch
> +		 * up.
>  		 */
> -		if (bdi_nr_reclaimable) {
> +		if (bdi_nr_reclaimable > bdi_thresh) {
>  			writeback_inodes(&wbc);
>  			pages_written += write_chunk - wbc.nr_to_write;
>  			get_dirty_limits(&background_thresh, &dirty_thresh,

yup, we need to think about the effect with zillions of disks.  Peter,
could you please take a look?

Also...  get_dirty_limits() is rather hard to grok.  The callers of
get_dirty_limits() treat its three return values as "thresholds", but
they're not named as thresholds within get_dirty_limits() itself, which
is a bit confusing.  And the meaning of each of those return values is
pretty obscure from the code - could we document them please?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-24 22:27 ` Andrew Morton
@ 2009-06-25  5:13   ` Jens Axboe
  2009-06-25  8:00   ` Peter Zijlstra
  1 sibling, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2009-06-25  5:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Richard Kennedy, a.p.zijlstra, linux-mm, linux-kernel

On Wed, Jun 24 2009, Andrew Morton wrote:
> On Wed, 24 Jun 2009 11:38:24 +0100
> Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> 
> > When writing to 2 (or more) devices at the same time, stop
> > balance_dirty_pages moving dirty pages to writeback when it has reached
> > the bdi threshold. This prevents balance_dirty_pages overshooting its
> > limits and moving all dirty pages to writeback.     
> > 
> >     
> > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > ---
> > balance_dirty_pages can overreact and move all of the dirty pages to
> > writeback unnecessarily.
> > 
> > balance_dirty_pages makes its decision to throttle based on the number
> > of dirty plus writeback pages that are over the calculated limit,so it
> > will continue to move pages even when there are plenty of pages in
> > writeback and less than the threshold still dirty.
> > 
> > This allows it to overshoot its limits and move all the dirty pages to
> > writeback while waiting for the drives to catch up and empty the
> > writeback list. 
> > 
> > A simple fio test easily demonstrates this problem.  
> > 
> > fio --name=f1 --directory=/disk1 --size=2G -rw=write
> > 	--name=f2 --directory=/disk2 --size=1G --rw=write 		--startdelay=10
> > 
> > The attached graph before.png shows how all pages are moved to writeback
> > as the second write starts and the throttling kicks in.
> > 
> > after.png is the same test with the patch applied, which clearly shows
> > that it keeps dirty_background_ratio dirty pages in the buffer.
> > The values and timings of the graphs are only approximate but are good
> > enough to show the behaviour.  
> > 
> > This is the simplest fix I could find, but I'm not entirely sure that it
> > alone will be enough for all cases. But it certainly is an improvement
> > on my desktop machine writing to 2 disks.
> > 
> > Do we need something more for machines with large arrays where
> > bdi_threshold * number_of_drives is greater than the dirty_ratio ?
> > 
> 
> um.  Interesting find.  Jens, was any of your performance testing using
> multiple devices?  If so, it looks like the results just got invalidated :)

"invalidated" is a bit too much I think, skewed is more like it. But
most of my testing has been on a single spindle, only the very first
patches used 10 disks as a test base.

> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 7b0dcea..7687879 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
> >  		 * filesystems (i.e. NFS) in which data may have been
> >  		 * written to the server's write cache, but has not yet
> >  		 * been flushed to permanent storage.
> > +		 * Only move pages to writeback if this bdi is over its
> > +		 * threshold otherwise wait until the disk writes catch
> > +		 * up.
> >  		 */
> > -		if (bdi_nr_reclaimable) {
> > +		if (bdi_nr_reclaimable > bdi_thresh) {
> >  			writeback_inodes(&wbc);
> >  			pages_written += write_chunk - wbc.nr_to_write;
> >  			get_dirty_limits(&background_thresh, &dirty_thresh,
> 
> yup, we need to think about the effect with zillions of disks.  Peter,
> could you please take a look?
> 
> Also...  get_dirty_limits() is rather hard to grok.  The callers of
> get_dirty_limits() treat its three return values as "thresholds", but
> they're not named as thresholds within get_dirty_limits() itself, which
> is a bit confusing.  And the meaning of each of those return values is
> pretty obscure from the code - could we document them please?

This is indeed a pretty interesting find!

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-24 22:27 ` Andrew Morton
  2009-06-25  5:13   ` Jens Axboe
@ 2009-06-25  8:00   ` Peter Zijlstra
  2009-06-25  9:10     ` Jens Axboe
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2009-06-25  8:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Richard Kennedy, linux-mm, linux-kernel, Jens Axboe

On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> On Wed, 24 Jun 2009 11:38:24 +0100
> Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> 
> > When writing to 2 (or more) devices at the same time, stop
> > balance_dirty_pages moving dirty pages to writeback when it has reached
> > the bdi threshold. This prevents balance_dirty_pages overshooting its
> > limits and moving all dirty pages to writeback.     
> > 
> >     
> > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > ---

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>


[ moved explanation below ]

> > 
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 7b0dcea..7687879 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
> >  		 * filesystems (i.e. NFS) in which data may have been
> >  		 * written to the server's write cache, but has not yet
> >  		 * been flushed to permanent storage.
> > +		 * Only move pages to writeback if this bdi is over its
> > +		 * threshold otherwise wait until the disk writes catch
> > +		 * up.
> >  		 */
> > -		if (bdi_nr_reclaimable) {
> > +		if (bdi_nr_reclaimable > bdi_thresh) {
> >  			writeback_inodes(&wbc);
> >  			pages_written += write_chunk - wbc.nr_to_write;
> >  			get_dirty_limits(&background_thresh, &dirty_thresh,
> 
> yup, we need to think about the effect with zillions of disks.  Peter,
> could you please take a look?

Looks to have been in that form forever (immediate git history).

When reading the code I read it like:

		if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
			break;

		if (nr_reclaimable + nr_writeback <
				(background_thresh + dirty_thresh) / 2)
			break;

		if (bdi_nr_reclaimable) {
			writeback_inodes(&wbc);

Which to me reads:

  - if there's not enough to do, drop out
  - see if background write-out can catch up, drop out
  - is there anything to do, yay! work.

/me goes read the changelog, maybe there's a clue in there :-)

> > balance_dirty_pages can overreact and move all of the dirty pages to
> > writeback unnecessarily.
> > 
> > balance_dirty_pages makes its decision to throttle based on the number
> > of dirty plus writeback pages that are over the calculated limit,so it
> > will continue to move pages even when there are plenty of pages in
> > writeback and less than the threshold still dirty.
> > 
> > This allows it to overshoot its limits and move all the dirty pages to
> > writeback while waiting for the drives to catch up and empty the
> > writeback list. 

Ahhh, indeed, how silly of me not to notice that before!
 
> > This is the simplest fix I could find, but I'm not entirely sure that it
> > alone will be enough for all cases. But it certainly is an improvement
> > on my desktop machine writing to 2 disks.

Seems good to me.

> > Do we need something more for machines with large arrays where
> > bdi_threshold * number_of_drives is greater than the dirty_ratio ?

[ I assumed s/dirty_ratio/dirty_thresh/, since dirty_ratio is a ratio
  and bdi_threshold is an actual value, therefore the inequality above
  doesn't make sense ]

That cannot actually happen (aside from small numerical glitches).

bdi_threshold = P_i * dirty_thresh, where \Sum P_i = 1

The proportion is relative to the recent writeout speed of the device.


On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> Also...  get_dirty_limits() is rather hard to grok.  The callers of
> get_dirty_limits() treat its three return values as "thresholds", but
> they're not named as thresholds within get_dirty_limits() itself, which
> is a bit confusing.  And the meaning of each of those return values is
> pretty obscure from the code - could we document them please?

Does something like this help?

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7b0dcea..dc2cee1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -426,6 +426,13 @@ unsigned long determine_dirtyable_memory(void)
 	return x + 1;	/* Ensure that we never return 0 */
 }
 
+/*
+ * get_dirty_limits() - compute the various dirty limits
+ *
+ * @pbackground - dirty limit at which we want to start background write-out
+ * @pdirty	- total dirty limit, we should not have more dirty than this
+ * @pdbi_dirty	- the share of @pdirty available to @bdi
+ */
 void
 get_dirty_limits(unsigned long *pbackground, unsigned long *pdirty,
 		 unsigned long *pbdi_dirty, struct backing_dev_info *bdi)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-25  8:00   ` Peter Zijlstra
@ 2009-06-25  9:10     ` Jens Axboe
  2009-06-25  9:26       ` Jens Axboe
  2009-06-26  9:15       ` Richard Kennedy
  0 siblings, 2 replies; 22+ messages in thread
From: Jens Axboe @ 2009-06-25  9:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andrew Morton, Richard Kennedy, linux-mm, linux-kernel

On Thu, Jun 25 2009, Peter Zijlstra wrote:
> On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> > On Wed, 24 Jun 2009 11:38:24 +0100
> > Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> > 
> > > When writing to 2 (or more) devices at the same time, stop
> > > balance_dirty_pages moving dirty pages to writeback when it has reached
> > > the bdi threshold. This prevents balance_dirty_pages overshooting its
> > > limits and moving all dirty pages to writeback.     
> > > 
> > >     
> > > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > > ---
> 
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

After doing some integration and update work on the writeback branch, I
threw 2.6.31-rc1, 2.6.31-rc1+patch, 2.6.31-rc1+writeback into the test
mix. The writeback series include this patch as a prep patch. Results
for the mmap write test case:

Kernel          Throughput      usr     sys     ctx     util
--------------------------------------------------------------
vanilla         184MB/sec       19.51%  50.49%  12995   82.88%
vanilla         184MB/sec       19.60%  50.77%  12846   83.47%
vanilla         182MB/sec       19.25%  51.18%  14692   82.76%
vanilla+patch   169MB/sec       18.08%  43.61%   9507   76.38%
vanilla+patch   170MB/sec       18.37%  43.46%  10275   76.62%
vanilla+patch   165MB/sec       17.59%  42.06%  10165   74.39%
writeback       215MB/sec       22.69%  53.23%   4085   92.32%
writeback       214MB/sec       24.31%  52.90%   4495   92.40%
writeback       208MB/sec       23.14%  52.12%   4067   91.68%

To be perfectly clear:

vanilla         2.6.31-rc1 stock
vanilla+patch   2.6.31-rc1 + bdi_thresh patch
writeback       2.6.31-rc1 + bdi_thresh patch + writeback series

This is just a single spindle w/ext4, nothing fancy. I'll do a 3-series
run with the writeback and this patch backed out, to see if it makes a
difference here. I didn't do that initially, since the results were in
the range that I expected.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-25  9:10     ` Jens Axboe
@ 2009-06-25  9:26       ` Jens Axboe
  2009-06-25 12:33         ` Al Boldi
  2009-06-26  9:15       ` Richard Kennedy
  1 sibling, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2009-06-25  9:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andrew Morton, Richard Kennedy, linux-mm, linux-kernel

On Thu, Jun 25 2009, Jens Axboe wrote:
> On Thu, Jun 25 2009, Peter Zijlstra wrote:
> > On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> > > On Wed, 24 Jun 2009 11:38:24 +0100
> > > Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> > > 
> > > > When writing to 2 (or more) devices at the same time, stop
> > > > balance_dirty_pages moving dirty pages to writeback when it has reached
> > > > the bdi threshold. This prevents balance_dirty_pages overshooting its
> > > > limits and moving all dirty pages to writeback.     
> > > > 
> > > >     
> > > > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > ---
> > 
> > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> After doing some integration and update work on the writeback branch, I
> threw 2.6.31-rc1, 2.6.31-rc1+patch, 2.6.31-rc1+writeback into the test
> mix. The writeback series include this patch as a prep patch. Results
> for the mmap write test case:
> 
> Kernel          Throughput      usr     sys     ctx     util
> --------------------------------------------------------------
> vanilla         184MB/sec       19.51%  50.49%  12995   82.88%
> vanilla         184MB/sec       19.60%  50.77%  12846   83.47%
> vanilla         182MB/sec       19.25%  51.18%  14692   82.76%
> vanilla+patch   169MB/sec       18.08%  43.61%   9507   76.38%
> vanilla+patch   170MB/sec       18.37%  43.46%  10275   76.62%
> vanilla+patch   165MB/sec       17.59%  42.06%  10165   74.39%
> writeback       215MB/sec       22.69%  53.23%   4085   92.32%
> writeback       214MB/sec       24.31%  52.90%   4495   92.40%
> writeback       208MB/sec       23.14%  52.12%   4067   91.68%
> 
> To be perfectly clear:
> 
> vanilla         2.6.31-rc1 stock
> vanilla+patch   2.6.31-rc1 + bdi_thresh patch
> writeback       2.6.31-rc1 + bdi_thresh patch + writeback series
> 
> This is just a single spindle w/ext4, nothing fancy. I'll do a 3-series
> run with the writeback and this patch backed out, to see if it makes a
> difference here. I didn't do that initially, since the results were in
> the range that I expected.

Results for writeback without the bdi_thresh patch

Kernel          Throughput      usr     sys     ctx     util
--------------------------------------------------------------
wb-bdi_thresh   211MB/sec       22.71%  53.30%   4050   91.19%
wb-bdi_thresh   212MB/sec       22.78%  53.55%   4809   91.51%
wb-bdi_thresh   212MB/sec       22.99%  54.23%   4715   93.10%

Not a lot of difference there, without more than three runs it's hard to
say what is significant. Could be a small decrease in throughput, if the
208MB/sec results from above is an outlier (I think it is, ~215MB/sec is
usually the most consistent result).

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-25  9:26       ` Jens Axboe
@ 2009-06-25 12:33         ` Al Boldi
  2009-06-25 12:43           ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Al Boldi @ 2009-06-25 12:33 UTC (permalink / raw)
  To: Jens Axboe, Peter Zijlstra
  Cc: Andrew Morton, Richard Kennedy, linux-mm, linux-kernel

Jens Axboe wrote:
> On Thu, Jun 25 2009, Jens Axboe wrote:
> > On Thu, Jun 25 2009, Peter Zijlstra wrote:
> > > On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> > > > On Wed, 24 Jun 2009 11:38:24 +0100
> > > >
> > > > Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> > > > > When writing to 2 (or more) devices at the same time, stop
> > > > > balance_dirty_pages moving dirty pages to writeback when it has
> > > > > reached the bdi threshold. This prevents balance_dirty_pages
> > > > > overshooting its limits and moving all dirty pages to writeback.
> > > > >
> > > > >
> > > > > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > ---
> > >
> > > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >
> > After doing some integration and update work on the writeback branch, I
> > threw 2.6.31-rc1, 2.6.31-rc1+patch, 2.6.31-rc1+writeback into the test
> > mix. The writeback series include this patch as a prep patch. Results
> > for the mmap write test case:
> >
> > Kernel          Throughput      usr     sys     ctx     util
> > --------------------------------------------------------------
> > vanilla         184MB/sec       19.51%  50.49%  12995   82.88%
> > vanilla         184MB/sec       19.60%  50.77%  12846   83.47%
> > vanilla         182MB/sec       19.25%  51.18%  14692   82.76%
> > vanilla+patch   169MB/sec       18.08%  43.61%   9507   76.38%
> > vanilla+patch   170MB/sec       18.37%  43.46%  10275   76.62%
> > vanilla+patch   165MB/sec       17.59%  42.06%  10165   74.39%
> > writeback       215MB/sec       22.69%  53.23%   4085   92.32%
> > writeback       214MB/sec       24.31%  52.90%   4495   92.40%
> > writeback       208MB/sec       23.14%  52.12%   4067   91.68%
> >
> > To be perfectly clear:
> >
> > vanilla         2.6.31-rc1 stock
> > vanilla+patch   2.6.31-rc1 + bdi_thresh patch
> > writeback       2.6.31-rc1 + bdi_thresh patch + writeback series
> >
> > This is just a single spindle w/ext4, nothing fancy. I'll do a 3-series
> > run with the writeback and this patch backed out, to see if it makes a
> > difference here. I didn't do that initially, since the results were in
> > the range that I expected.
>
> Results for writeback without the bdi_thresh patch
>
> Kernel          Throughput      usr     sys     ctx     util
> --------------------------------------------------------------
> wb-bdi_thresh   211MB/sec       22.71%  53.30%   4050   91.19%
> wb-bdi_thresh   212MB/sec       22.78%  53.55%   4809   91.51%
> wb-bdi_thresh   212MB/sec       22.99%  54.23%   4715   93.10%
>
> Not a lot of difference there, without more than three runs it's hard to
> say what is significant. Could be a small decrease in throughput, if the
> 208MB/sec results from above is an outlier (I think it is, ~215MB/sec is
> usually the most consistent result).

What's the iowait on these runs?


Thanks!

--
Al

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-25 12:33         ` Al Boldi
@ 2009-06-25 12:43           ` Jens Axboe
  2009-06-25 13:46             ` Al Boldi
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2009-06-25 12:43 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Andrew Morton, Richard Kennedy, linux-mm, linux-kernel

On Thu, Jun 25 2009, Al Boldi wrote:
> Jens Axboe wrote:
> > On Thu, Jun 25 2009, Jens Axboe wrote:
> > > On Thu, Jun 25 2009, Peter Zijlstra wrote:
> > > > On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> > > > > On Wed, 24 Jun 2009 11:38:24 +0100
> > > > >
> > > > > Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> > > > > > When writing to 2 (or more) devices at the same time, stop
> > > > > > balance_dirty_pages moving dirty pages to writeback when it has
> > > > > > reached the bdi threshold. This prevents balance_dirty_pages
> > > > > > overshooting its limits and moving all dirty pages to writeback.
> > > > > >
> > > > > >
> > > > > > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > > ---
> > > >
> > > > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > >
> > > After doing some integration and update work on the writeback branch, I
> > > threw 2.6.31-rc1, 2.6.31-rc1+patch, 2.6.31-rc1+writeback into the test
> > > mix. The writeback series include this patch as a prep patch. Results
> > > for the mmap write test case:
> > >
> > > Kernel          Throughput      usr     sys     ctx     util
> > > --------------------------------------------------------------
> > > vanilla         184MB/sec       19.51%  50.49%  12995   82.88%
> > > vanilla         184MB/sec       19.60%  50.77%  12846   83.47%
> > > vanilla         182MB/sec       19.25%  51.18%  14692   82.76%
> > > vanilla+patch   169MB/sec       18.08%  43.61%   9507   76.38%
> > > vanilla+patch   170MB/sec       18.37%  43.46%  10275   76.62%
> > > vanilla+patch   165MB/sec       17.59%  42.06%  10165   74.39%
> > > writeback       215MB/sec       22.69%  53.23%   4085   92.32%
> > > writeback       214MB/sec       24.31%  52.90%   4495   92.40%
> > > writeback       208MB/sec       23.14%  52.12%   4067   91.68%
> > >
> > > To be perfectly clear:
> > >
> > > vanilla         2.6.31-rc1 stock
> > > vanilla+patch   2.6.31-rc1 + bdi_thresh patch
> > > writeback       2.6.31-rc1 + bdi_thresh patch + writeback series
> > >
> > > This is just a single spindle w/ext4, nothing fancy. I'll do a 3-series
> > > run with the writeback and this patch backed out, to see if it makes a
> > > difference here. I didn't do that initially, since the results were in
> > > the range that I expected.
> >
> > Results for writeback without the bdi_thresh patch
> >
> > Kernel          Throughput      usr     sys     ctx     util
> > --------------------------------------------------------------
> > wb-bdi_thresh   211MB/sec       22.71%  53.30%   4050   91.19%
> > wb-bdi_thresh   212MB/sec       22.78%  53.55%   4809   91.51%
> > wb-bdi_thresh   212MB/sec       22.99%  54.23%   4715   93.10%
> >
> > Not a lot of difference there, without more than three runs it's hard to
> > say what is significant. Could be a small decrease in throughput, if the
> > 208MB/sec results from above is an outlier (I think it is, ~215MB/sec is
> > usually the most consistent result).
> 
> What's the iowait on these runs?

Not sure, I didn't check. Why do you ask?

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-25 12:43           ` Jens Axboe
@ 2009-06-25 13:46             ` Al Boldi
  2009-06-25 14:44               ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Al Boldi @ 2009-06-25 13:46 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Peter Zijlstra, Andrew Morton, Richard Kennedy, linux-mm, linux-kernel

Jens Axboe wrote:
> On Thu, Jun 25 2009, Al Boldi wrote:
> > Jens Axboe wrote:
> > > On Thu, Jun 25 2009, Jens Axboe wrote:
> > > > On Thu, Jun 25 2009, Peter Zijlstra wrote:
> > > > > On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> > > > > > On Wed, 24 Jun 2009 11:38:24 +0100
> > > > > >
> > > > > > Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> > > > > > > When writing to 2 (or more) devices at the same time, stop
> > > > > > > balance_dirty_pages moving dirty pages to writeback when it has
> > > > > > > reached the bdi threshold. This prevents balance_dirty_pages
> > > > > > > overshooting its limits and moving all dirty pages to
> > > > > > > writeback.
> > > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > > > ---
> > > > >
> > > > > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > >
> > > > After doing some integration and update work on the writeback branch,
> > > > I threw 2.6.31-rc1, 2.6.31-rc1+patch, 2.6.31-rc1+writeback into the
> > > > test mix. The writeback series include this patch as a prep patch.
> > > > Results for the mmap write test case:
> > > >
> > > > Kernel          Throughput      usr     sys     ctx     util
> > > > --------------------------------------------------------------
> > > > vanilla         184MB/sec       19.51%  50.49%  12995   82.88%
> > > > vanilla         184MB/sec       19.60%  50.77%  12846   83.47%
> > > > vanilla         182MB/sec       19.25%  51.18%  14692   82.76%
> > > > vanilla+patch   169MB/sec       18.08%  43.61%   9507   76.38%
> > > > vanilla+patch   170MB/sec       18.37%  43.46%  10275   76.62%
> > > > vanilla+patch   165MB/sec       17.59%  42.06%  10165   74.39%
> > > > writeback       215MB/sec       22.69%  53.23%   4085   92.32%
> > > > writeback       214MB/sec       24.31%  52.90%   4495   92.40%
> > > > writeback       208MB/sec       23.14%  52.12%   4067   91.68%
> > > >
> > > > To be perfectly clear:
> > > >
> > > > vanilla         2.6.31-rc1 stock
> > > > vanilla+patch   2.6.31-rc1 + bdi_thresh patch
> > > > writeback       2.6.31-rc1 + bdi_thresh patch + writeback series
> > > >
> > > > This is just a single spindle w/ext4, nothing fancy. I'll do a
> > > > 3-series run with the writeback and this patch backed out, to see if
> > > > it makes a difference here. I didn't do that initially, since the
> > > > results were in the range that I expected.
> > >
> > > Results for writeback without the bdi_thresh patch
> > >
> > > Kernel          Throughput      usr     sys     ctx     util
> > > --------------------------------------------------------------
> > > wb-bdi_thresh   211MB/sec       22.71%  53.30%   4050   91.19%
> > > wb-bdi_thresh   212MB/sec       22.78%  53.55%   4809   91.51%
> > > wb-bdi_thresh   212MB/sec       22.99%  54.23%   4715   93.10%
> > >
> > > Not a lot of difference there, without more than three runs it's hard
> > > to say what is significant. Could be a small decrease in throughput, if
> > > the 208MB/sec results from above is an outlier (I think it is,
> > > ~215MB/sec is usually the most consistent result).
> >
> > What's the iowait on these runs?
>
> Not sure, I didn't check. Why do you ask?

iowait gives you an indication of seekactivity.


Thanks!

--
Al



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-25 13:46             ` Al Boldi
@ 2009-06-25 14:44               ` Jens Axboe
  2009-06-25 17:10                 ` Al Boldi
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2009-06-25 14:44 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Andrew Morton, Richard Kennedy, linux-mm, linux-kernel

On Thu, Jun 25 2009, Al Boldi wrote:
> Jens Axboe wrote:
> > On Thu, Jun 25 2009, Al Boldi wrote:
> > > Jens Axboe wrote:
> > > > On Thu, Jun 25 2009, Jens Axboe wrote:
> > > > > On Thu, Jun 25 2009, Peter Zijlstra wrote:
> > > > > > On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> > > > > > > On Wed, 24 Jun 2009 11:38:24 +0100
> > > > > > >
> > > > > > > Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> > > > > > > > When writing to 2 (or more) devices at the same time, stop
> > > > > > > > balance_dirty_pages moving dirty pages to writeback when it has
> > > > > > > > reached the bdi threshold. This prevents balance_dirty_pages
> > > > > > > > overshooting its limits and moving all dirty pages to
> > > > > > > > writeback.
> > > > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > > > > ---
> > > > > >
> > > > > > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > >
> > > > > After doing some integration and update work on the writeback branch,
> > > > > I threw 2.6.31-rc1, 2.6.31-rc1+patch, 2.6.31-rc1+writeback into the
> > > > > test mix. The writeback series include this patch as a prep patch.
> > > > > Results for the mmap write test case:
> > > > >
> > > > > Kernel          Throughput      usr     sys     ctx     util
> > > > > --------------------------------------------------------------
> > > > > vanilla         184MB/sec       19.51%  50.49%  12995   82.88%
> > > > > vanilla         184MB/sec       19.60%  50.77%  12846   83.47%
> > > > > vanilla         182MB/sec       19.25%  51.18%  14692   82.76%
> > > > > vanilla+patch   169MB/sec       18.08%  43.61%   9507   76.38%
> > > > > vanilla+patch   170MB/sec       18.37%  43.46%  10275   76.62%
> > > > > vanilla+patch   165MB/sec       17.59%  42.06%  10165   74.39%
> > > > > writeback       215MB/sec       22.69%  53.23%   4085   92.32%
> > > > > writeback       214MB/sec       24.31%  52.90%   4495   92.40%
> > > > > writeback       208MB/sec       23.14%  52.12%   4067   91.68%
> > > > >
> > > > > To be perfectly clear:
> > > > >
> > > > > vanilla         2.6.31-rc1 stock
> > > > > vanilla+patch   2.6.31-rc1 + bdi_thresh patch
> > > > > writeback       2.6.31-rc1 + bdi_thresh patch + writeback series
> > > > >
> > > > > This is just a single spindle w/ext4, nothing fancy. I'll do a
> > > > > 3-series run with the writeback and this patch backed out, to see if
> > > > > it makes a difference here. I didn't do that initially, since the
> > > > > results were in the range that I expected.
> > > >
> > > > Results for writeback without the bdi_thresh patch
> > > >
> > > > Kernel          Throughput      usr     sys     ctx     util
> > > > --------------------------------------------------------------
> > > > wb-bdi_thresh   211MB/sec       22.71%  53.30%   4050   91.19%
> > > > wb-bdi_thresh   212MB/sec       22.78%  53.55%   4809   91.51%
> > > > wb-bdi_thresh   212MB/sec       22.99%  54.23%   4715   93.10%
> > > >
> > > > Not a lot of difference there, without more than three runs it's hard
> > > > to say what is significant. Could be a small decrease in throughput, if
> > > > the 208MB/sec results from above is an outlier (I think it is,
> > > > ~215MB/sec is usually the most consistent result).
> > >
> > > What's the iowait on these runs?
> >
> > Not sure, I didn't check. Why do you ask?
> 
> iowait gives you an indication of seekactivity.

The test case is random mmap writes to files that have been laid out
sequentially. So it's all seeks. The target drive is an SSD disk though,
so it doesn't matter a whole lot (it's a good SSD).

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-25 14:44               ` Jens Axboe
@ 2009-06-25 17:10                 ` Al Boldi
  2009-06-26  5:02                   ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Al Boldi @ 2009-06-25 17:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Peter Zijlstra, Andrew Morton, Richard Kennedy, linux-mm, linux-kernel

Jens Axboe wrote:
> The test case is random mmap writes to files that have been laid out
> sequentially. So it's all seeks. The target drive is an SSD disk though,
> so it doesn't matter a whole lot (it's a good SSD).

Oh, SSD.  What numbers do you get for normal disks?


Thanks!

--
Al

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-25 17:10                 ` Al Boldi
@ 2009-06-26  5:02                   ` Jens Axboe
  2009-06-26 11:37                     ` Al Boldi
  0 siblings, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2009-06-26  5:02 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Andrew Morton, Richard Kennedy, linux-mm, linux-kernel

On Thu, Jun 25 2009, Al Boldi wrote:
> Jens Axboe wrote:
> > The test case is random mmap writes to files that have been laid out
> > sequentially. So it's all seeks. The target drive is an SSD disk though,
> > so it doesn't matter a whole lot (it's a good SSD).
> 
> Oh, SSD.  What numbers do you get for normal disks?

I haven't run this particular test on rotating storage. The type of
drive should not matter a lot, I'm mostly interested in comparing
vanilla and the writeback patches on identical workloads and storage.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-25  9:10     ` Jens Axboe
  2009-06-25  9:26       ` Jens Axboe
@ 2009-06-26  9:15       ` Richard Kennedy
  2009-06-26  9:20         ` Jens Axboe
  1 sibling, 1 reply; 22+ messages in thread
From: Richard Kennedy @ 2009-06-26  9:15 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Peter Zijlstra, Andrew Morton, linux-mm, linux-kernel

On Thu, 2009-06-25 at 11:10 +0200, Jens Axboe wrote:
> On Thu, Jun 25 2009, Peter Zijlstra wrote:
> > On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> > > On Wed, 24 Jun 2009 11:38:24 +0100
> > > Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> > > 
> > > > When writing to 2 (or more) devices at the same time, stop
> > > > balance_dirty_pages moving dirty pages to writeback when it has reached
> > > > the bdi threshold. This prevents balance_dirty_pages overshooting its
> > > > limits and moving all dirty pages to writeback.     
> > > > 
> > > >     
> > > > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > ---
> > 
> > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> After doing some integration and update work on the writeback branch, I
> threw 2.6.31-rc1, 2.6.31-rc1+patch, 2.6.31-rc1+writeback into the test
> mix. The writeback series include this patch as a prep patch. Results
> for the mmap write test case:
> 
> Kernel          Throughput      usr     sys     ctx     util
> --------------------------------------------------------------
> vanilla         184MB/sec       19.51%  50.49%  12995   82.88%
> vanilla         184MB/sec       19.60%  50.77%  12846   83.47%
> vanilla         182MB/sec       19.25%  51.18%  14692   82.76%
> vanilla+patch   169MB/sec       18.08%  43.61%   9507   76.38%
> vanilla+patch   170MB/sec       18.37%  43.46%  10275   76.62%
> vanilla+patch   165MB/sec       17.59%  42.06%  10165   74.39%
> writeback       215MB/sec       22.69%  53.23%   4085   92.32%
> writeback       214MB/sec       24.31%  52.90%   4495   92.40%
> writeback       208MB/sec       23.14%  52.12%   4067   91.68%
> 
> To be perfectly clear:
> 
> vanilla         2.6.31-rc1 stock
> vanilla+patch   2.6.31-rc1 + bdi_thresh patch
> writeback       2.6.31-rc1 + bdi_thresh patch + writeback series
> 
> This is just a single spindle w/ext4, nothing fancy. I'll do a 3-series
> run with the writeback and this patch backed out, to see if it makes a
> difference here. I didn't do that initially, since the results were in
> the range that I expected.

Intriguing numbers. It would tell us a lot if we could find out why
vanilla + patch is slower than vanilla. I'll run some tests using mmap
and see if I can find anything.
What block size are you using ?

I see that the last test of each group is the slowest. I wonder if this
is showing a slowdown over time or just noise? Any chance you could run
more tests in each group?

regards
Richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-26  9:15       ` Richard Kennedy
@ 2009-06-26  9:20         ` Jens Axboe
  0 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2009-06-26  9:20 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: Peter Zijlstra, Andrew Morton, linux-mm, linux-kernel

On Fri, Jun 26 2009, Richard Kennedy wrote:
> On Thu, 2009-06-25 at 11:10 +0200, Jens Axboe wrote:
> > On Thu, Jun 25 2009, Peter Zijlstra wrote:
> > > On Wed, 2009-06-24 at 15:27 -0700, Andrew Morton wrote:
> > > > On Wed, 24 Jun 2009 11:38:24 +0100
> > > > Richard Kennedy <richard@rsk.demon.co.uk> wrote:
> > > > 
> > > > > When writing to 2 (or more) devices at the same time, stop
> > > > > balance_dirty_pages moving dirty pages to writeback when it has reached
> > > > > the bdi threshold. This prevents balance_dirty_pages overshooting its
> > > > > limits and moving all dirty pages to writeback.     
> > > > > 
> > > > >     
> > > > > Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> > > > > ---
> > > 
> > > Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > 
> > After doing some integration and update work on the writeback branch, I
> > threw 2.6.31-rc1, 2.6.31-rc1+patch, 2.6.31-rc1+writeback into the test
> > mix. The writeback series include this patch as a prep patch. Results
> > for the mmap write test case:
> > 
> > Kernel          Throughput      usr     sys     ctx     util
> > --------------------------------------------------------------
> > vanilla         184MB/sec       19.51%  50.49%  12995   82.88%
> > vanilla         184MB/sec       19.60%  50.77%  12846   83.47%
> > vanilla         182MB/sec       19.25%  51.18%  14692   82.76%
> > vanilla+patch   169MB/sec       18.08%  43.61%   9507   76.38%
> > vanilla+patch   170MB/sec       18.37%  43.46%  10275   76.62%
> > vanilla+patch   165MB/sec       17.59%  42.06%  10165   74.39%
> > writeback       215MB/sec       22.69%  53.23%   4085   92.32%
> > writeback       214MB/sec       24.31%  52.90%   4495   92.40%
> > writeback       208MB/sec       23.14%  52.12%   4067   91.68%
> > 
> > To be perfectly clear:
> > 
> > vanilla         2.6.31-rc1 stock
> > vanilla+patch   2.6.31-rc1 + bdi_thresh patch
> > writeback       2.6.31-rc1 + bdi_thresh patch + writeback series
> > 
> > This is just a single spindle w/ext4, nothing fancy. I'll do a 3-series
> > run with the writeback and this patch backed out, to see if it makes a
> > difference here. I didn't do that initially, since the results were in
> > the range that I expected.
> 
> Intriguing numbers. It would tell us a lot if we could find out why
> vanilla + patch is slower than vanilla. I'll run some tests using mmap
> and see if I can find anything.
> What block size are you using ?

It's using 4kb block size.

> I see that the last test of each group is the slowest. I wonder if this
> is showing a slowdown over time or just noise? Any chance you could run
> more tests in each group?

The runs are actually inverted, so the last entry is the first run. It's
a bit confusing. So the first run is usually the odd one out, after that
they are stable.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-26  5:02                   ` Jens Axboe
@ 2009-06-26 11:37                     ` Al Boldi
  2009-06-26 12:35                       ` Jens Axboe
  0 siblings, 1 reply; 22+ messages in thread
From: Al Boldi @ 2009-06-26 11:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Peter Zijlstra, Andrew Morton, Richard Kennedy, linux-mm, linux-kernel

Jens Axboe wrote:
> On Thu, Jun 25 2009, Al Boldi wrote:
> > Jens Axboe wrote:
> > > The test case is random mmap writes to files that have been laid out
> > > sequentially. So it's all seeks. The target drive is an SSD disk
> > > though, so it doesn't matter a whole lot (it's a good SSD).
> >
> > Oh, SSD.  What numbers do you get for normal disks?
>
> I haven't run this particular test on rotating storage. The type of
> drive should not matter a lot, I'm mostly interested in comparing
> vanilla and the writeback patches on identical workloads and storage.

I think drive type matters a lot.  Access strategy on drives with high seek 
delays differs from those with no seek delays.  So it would probably be of 
interest to see this test run on rotating storage, unless the writeback 
patches are only meant for SSD?


Thanks!

--
Al

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-26 11:37                     ` Al Boldi
@ 2009-06-26 12:35                       ` Jens Axboe
  0 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2009-06-26 12:35 UTC (permalink / raw)
  To: Al Boldi
  Cc: Peter Zijlstra, Andrew Morton, Richard Kennedy, linux-mm, linux-kernel

On Fri, Jun 26 2009, Al Boldi wrote:
> Jens Axboe wrote:
> > On Thu, Jun 25 2009, Al Boldi wrote:
> > > Jens Axboe wrote:
> > > > The test case is random mmap writes to files that have been laid out
> > > > sequentially. So it's all seeks. The target drive is an SSD disk
> > > > though, so it doesn't matter a whole lot (it's a good SSD).
> > >
> > > Oh, SSD.  What numbers do you get for normal disks?
> >
> > I haven't run this particular test on rotating storage. The type of
> > drive should not matter a lot, I'm mostly interested in comparing
> > vanilla and the writeback patches on identical workloads and storage.
> 
> I think drive type matters a lot.  Access strategy on drives with high seek 
> delays differs from those with no seek delays.  So it would probably be of 
> interest to see this test run on rotating storage, unless the writeback 
> patches are only meant for SSD?

Don't get me wrong, I've tested a lot on rotating drives too. The
changelog for one of the patches even mentions results and vmstat for a
10 disk setup, all rotating disks.

What I'm saying is that this particular test is tailored for SSD's and
writeback. The patchset is not for SSD's in particular, it applies
equally across the board (even for non-block IO, like NFS).

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-06-24 10:38 [RFC][PATCH] mm: stop balance_dirty_pages doing too much work Richard Kennedy
  2009-06-24 22:27 ` Andrew Morton
@ 2009-08-07 12:20 ` Peter Zijlstra
  2009-08-07 14:36   ` Richard Kennedy
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2009-08-07 12:20 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: Andrew Morton, linux-mm, lkml, Chris Mason

On Wed, 2009-06-24 at 11:38 +0100, Richard Kennedy wrote:
>   
> Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> ---
> balance_dirty_pages can overreact and move all of the dirty pages to
> writeback unnecessarily.
> 
> balance_dirty_pages makes its decision to throttle based on the number
> of dirty plus writeback pages that are over the calculated limit,so it
> will continue to move pages even when there are plenty of pages in
> writeback and less than the threshold still dirty.
> 
> This allows it to overshoot its limits and move all the dirty pages to
> writeback while waiting for the drives to catch up and empty the
> writeback list. 
> 
> A simple fio test easily demonstrates this problem.  
> 
> fio --name=f1 --directory=/disk1 --size=2G -rw=write
> 	--name=f2 --directory=/disk2 --size=1G --rw=write 		--startdelay=10
> 
> The attached graph before.png shows how all pages are moved to writeback
> as the second write starts and the throttling kicks in.
> 
> after.png is the same test with the patch applied, which clearly shows
> that it keeps dirty_background_ratio dirty pages in the buffer.
> The values and timings of the graphs are only approximate but are good
> enough to show the behaviour.  
> 
> This is the simplest fix I could find, but I'm not entirely sure that it
> alone will be enough for all cases. But it certainly is an improvement
> on my desktop machine writing to 2 disks.
> 
> Do we need something more for machines with large arrays where
> bdi_threshold * number_of_drives is greater than the dirty_ratio ?
> 

> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 7b0dcea..7687879 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
>  		 * filesystems (i.e. NFS) in which data may have been
>  		 * written to the server's write cache, but has not yet
>  		 * been flushed to permanent storage.
> +		 * Only move pages to writeback if this bdi is over its
> +		 * threshold otherwise wait until the disk writes catch
> +		 * up.
>  		 */
> -		if (bdi_nr_reclaimable) {
> +		if (bdi_nr_reclaimable > bdi_thresh) {
>  			writeback_inodes(&wbc);
>  			pages_written += write_chunk - wbc.nr_to_write;
>  			get_dirty_limits(&background_thresh, &dirty_thresh,

OK, so Chris ran into this bit yesterday, complaining that he'd only get
very few write requests and couldn't saturate his IO channel.

Now, since writing out everything once there's something to do sucks for
Richard, but only writing out stuff when we're over the limit sucks for
Chris (since we can only be over the limit a little), the best thing
would be to only write out when we're over the background limit. Since
that is the low watermark we use for throttling it makes sense that we
try to write out when above that.

However, since there's a lack of bdi_background_thresh, and I don't
think introducing one just for this is really justified. How about the
below?

Chris how did this work for you? Richard, does this make things suck for
you again?

---
 mm/page-writeback.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 81627eb..92f42d6 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -545,7 +545,7 @@ static void balance_dirty_pages(struct address_space *mapping)
 		 * threshold otherwise wait until the disk writes catch
 		 * up.
 		 */
-		if (bdi_nr_reclaimable > bdi_thresh) {
+		if (bdi_nr_reclaimable > bdi_thresh/2) {
 			writeback_inodes(&wbc);
 			pages_written += write_chunk - wbc.nr_to_write;
 			get_dirty_limits(&background_thresh, &dirty_thresh,


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-08-07 12:20 ` Peter Zijlstra
@ 2009-08-07 14:36   ` Richard Kennedy
  2009-08-07 14:38     ` Peter Zijlstra
  2009-08-07 15:22     ` Chris Mason
  0 siblings, 2 replies; 22+ messages in thread
From: Richard Kennedy @ 2009-08-07 14:36 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andrew Morton, linux-mm, lkml, Chris Mason

On Fri, 2009-08-07 at 14:20 +0200, Peter Zijlstra wrote:
> On Wed, 2009-06-24 at 11:38 +0100, Richard Kennedy wrote:
...
> OK, so Chris ran into this bit yesterday, complaining that he'd only get
> very few write requests and couldn't saturate his IO channel.
> 
> Now, since writing out everything once there's something to do sucks for
> Richard, but only writing out stuff when we're over the limit sucks for
> Chris (since we can only be over the limit a little), the best thing
> would be to only write out when we're over the background limit. Since
> that is the low watermark we use for throttling it makes sense that we
> try to write out when above that.
> 
> However, since there's a lack of bdi_background_thresh, and I don't
> think introducing one just for this is really justified. How about the
> below?
> 
> Chris how did this work for you? Richard, does this make things suck for
> you again?
> 
> ---
>  mm/page-writeback.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 81627eb..92f42d6 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -545,7 +545,7 @@ static void balance_dirty_pages(struct address_space *mapping)
>  		 * threshold otherwise wait until the disk writes catch
>  		 * up.
>  		 */
> -		if (bdi_nr_reclaimable > bdi_thresh) {
> +		if (bdi_nr_reclaimable > bdi_thresh/2) {
>  			writeback_inodes(&wbc);
>  			pages_written += write_chunk - wbc.nr_to_write;
>  			get_dirty_limits(&background_thresh, &dirty_thresh,
> 
> 
I'll run some tests and let you know :)

But what if someone has changed the vm settings?
Maybe something like 
	(bdi_thresh * dirty_background_ratio / dirty_ratio)
might be better ?

Chris, what sort of workload are you having problems with?

regards
Richard


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-08-07 14:36   ` Richard Kennedy
@ 2009-08-07 14:38     ` Peter Zijlstra
  2009-08-07 15:22     ` Chris Mason
  1 sibling, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2009-08-07 14:38 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: Andrew Morton, linux-mm, lkml, Chris Mason

On Fri, 2009-08-07 at 15:36 +0100, Richard Kennedy wrote:
> On Fri, 2009-08-07 at 14:20 +0200, Peter Zijlstra wrote:
> > On Wed, 2009-06-24 at 11:38 +0100, Richard Kennedy wrote:
> ....
> > OK, so Chris ran into this bit yesterday, complaining that he'd only get
> > very few write requests and couldn't saturate his IO channel.
> > 
> > Now, since writing out everything once there's something to do sucks for
> > Richard, but only writing out stuff when we're over the limit sucks for
> > Chris (since we can only be over the limit a little), the best thing
> > would be to only write out when we're over the background limit. Since
> > that is the low watermark we use for throttling it makes sense that we
> > try to write out when above that.
> > 
> > However, since there's a lack of bdi_background_thresh, and I don't
> > think introducing one just for this is really justified. How about the
> > below?
> > 
> > Chris how did this work for you? Richard, does this make things suck for
> > you again?
> > 
> > ---
> >  mm/page-writeback.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 81627eb..92f42d6 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -545,7 +545,7 @@ static void balance_dirty_pages(struct address_space *mapping)
> >  		 * threshold otherwise wait until the disk writes catch
> >  		 * up.
> >  		 */
> > -		if (bdi_nr_reclaimable > bdi_thresh) {
> > +		if (bdi_nr_reclaimable > bdi_thresh/2) {
> >  			writeback_inodes(&wbc);
> >  			pages_written += write_chunk - wbc.nr_to_write;
> >  			get_dirty_limits(&background_thresh, &dirty_thresh,
> > 
> > 
> I'll run some tests and let you know :)
> 
> But what if someone has changed the vm settings?
> Maybe something like 
> 	(bdi_thresh * dirty_background_ratio / dirty_ratio)
> might be better ?

Yeah, but that's a mult and a div extra, all that really matters is
being able to generate plenty of IO. I think the 1/2 approximation
should suffice, but we'll see.

Thanks for testing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-08-07 14:36   ` Richard Kennedy
  2009-08-07 14:38     ` Peter Zijlstra
@ 2009-08-07 15:22     ` Chris Mason
  2009-08-07 16:09       ` Richard Kennedy
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Mason @ 2009-08-07 15:22 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: Peter Zijlstra, Andrew Morton, linux-mm, lkml

On Fri, Aug 07, 2009 at 03:36:01PM +0100, Richard Kennedy wrote:
> On Fri, 2009-08-07 at 14:20 +0200, Peter Zijlstra wrote:
> > On Wed, 2009-06-24 at 11:38 +0100, Richard Kennedy wrote:
> ...
> > OK, so Chris ran into this bit yesterday, complaining that he'd only get
> > very few write requests and couldn't saturate his IO channel.
> > 
> > Now, since writing out everything once there's something to do sucks for
> > Richard, but only writing out stuff when we're over the limit sucks for
> > Chris (since we can only be over the limit a little), the best thing
> > would be to only write out when we're over the background limit. Since
> > that is the low watermark we use for throttling it makes sense that we
> > try to write out when above that.
> > 
> > However, since there's a lack of bdi_background_thresh, and I don't
> > think introducing one just for this is really justified. How about the
> > below?
> > 
> > Chris how did this work for you? Richard, does this make things suck for
> > you again?
> > 
> > ---
> >  mm/page-writeback.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index 81627eb..92f42d6 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -545,7 +545,7 @@ static void balance_dirty_pages(struct address_space *mapping)
> >  		 * threshold otherwise wait until the disk writes catch
> >  		 * up.
> >  		 */
> > -		if (bdi_nr_reclaimable > bdi_thresh) {
> > +		if (bdi_nr_reclaimable > bdi_thresh/2) {

My patch had two extra spaces ;)

> >  			writeback_inodes(&wbc);
> >  			pages_written += write_chunk - wbc.nr_to_write;
> >  			get_dirty_limits(&background_thresh, &dirty_thresh,
> > 
> > 
> I'll run some tests and let you know :)
> 
> But what if someone has changed the vm settings?
> Maybe something like 
> 	(bdi_thresh * dirty_background_ratio / dirty_ratio)
> might be better ?
> 
> Chris, what sort of workload are you having problems with?

So, buffered writeback in general has a bunch of interesting features
right now, and to be honest I'm having a hard time untangling all of it.
It doesn't help that each of our filesystems is reacting differently.

Btrfs and XFS both use helper threads to process IO completion.  This
means that we tend to collect more writeback pages than the other
filesystems do.

The end result of this is that O_DIRECT is dramatically faster than
buffered on every streaming write workload I've tried.  I'm still trying
to sort out exactly where buffered IO is going wrong.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-08-07 15:22     ` Chris Mason
@ 2009-08-07 16:09       ` Richard Kennedy
  2009-08-07 21:02         ` Chris Mason
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Kennedy @ 2009-08-07 16:09 UTC (permalink / raw)
  To: Chris Mason; +Cc: Peter Zijlstra, Andrew Morton, linux-mm, lkml

On Fri, 2009-08-07 at 11:22 -0400, Chris Mason wrote:
> On Fri, Aug 07, 2009 at 03:36:01PM +0100, Richard Kennedy wrote:
> > On Fri, 2009-08-07 at 14:20 +0200, Peter Zijlstra wrote:
> > > On Wed, 2009-06-24 at 11:38 +0100, Richard Kennedy wrote:
> > ...
> > > OK, so Chris ran into this bit yesterday, complaining that he'd only get
> > > very few write requests and couldn't saturate his IO channel.
> > > 
> > > Now, since writing out everything once there's something to do sucks for
> > > Richard, but only writing out stuff when we're over the limit sucks for
> > > Chris (since we can only be over the limit a little), the best thing
> > > would be to only write out when we're over the background limit. Since
> > > that is the low watermark we use for throttling it makes sense that we
> > > try to write out when above that.
> > > 
> > > However, since there's a lack of bdi_background_thresh, and I don't
> > > think introducing one just for this is really justified. How about the
> > > below?
> > > 
> > > Chris how did this work for you? Richard, does this make things suck for
> > > you again?
> > > 
> > > ---
> > >  mm/page-writeback.c |    2 +-
> > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > index 81627eb..92f42d6 100644
> > > --- a/mm/page-writeback.c
> > > +++ b/mm/page-writeback.c
> > > @@ -545,7 +545,7 @@ static void balance_dirty_pages(struct address_space *mapping)
> > >  		 * threshold otherwise wait until the disk writes catch
> > >  		 * up.
> > >  		 */
> > > -		if (bdi_nr_reclaimable > bdi_thresh) {
> > > +		if (bdi_nr_reclaimable > bdi_thresh/2) {
> 
> My patch had two extra spaces ;)
> 
> > >  			writeback_inodes(&wbc);
> > >  			pages_written += write_chunk - wbc.nr_to_write;
> > >  			get_dirty_limits(&background_thresh, &dirty_thresh,
> > > 
> > > 
> > I'll run some tests and let you know :)
> > 
> > But what if someone has changed the vm settings?
> > Maybe something like 
> > 	(bdi_thresh * dirty_background_ratio / dirty_ratio)
> > might be better ?
> > 
> > Chris, what sort of workload are you having problems with?
> 
> So, buffered writeback in general has a bunch of interesting features
> right now, and to be honest I'm having a hard time untangling all of it.
> It doesn't help that each of our filesystems is reacting differently.
> 
> Btrfs and XFS both use helper threads to process IO completion.  This
> means that we tend to collect more writeback pages than the other
> filesystems do.
> 
> The end result of this is that O_DIRECT is dramatically faster than
> buffered on every streaming write workload I've tried.  I'm still trying
> to sort out exactly where buffered IO is going wrong.
> 
> -chris
> 
Yes, it's all pretty complex.
With a large number of pages in writeback do you think that the total
dirty pages goes over the threshold ?
I do wonder how long we get stuck in congestion_wait, it may be
interesting to see if reducing its timeout has any effect. 

I've been experimenting with rewriting balance_dirty_pages to reduce the
number of calls to global_page_state. Those counters share a cacheline
and are fairly hot. So far it's looking good & improving simple write
performance particularly when the write size is just over the dirty
threshold.

regard
Richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
  2009-08-07 16:09       ` Richard Kennedy
@ 2009-08-07 21:02         ` Chris Mason
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Mason @ 2009-08-07 21:02 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: Peter Zijlstra, Andrew Morton, linux-mm, lkml

On Fri, Aug 07, 2009 at 05:09:21PM +0100, Richard Kennedy wrote:
> On Fri, 2009-08-07 at 11:22 -0400, Chris Mason wrote:
> > On Fri, Aug 07, 2009 at 03:36:01PM +0100, Richard Kennedy wrote:
> > > On Fri, 2009-08-07 at 14:20 +0200, Peter Zijlstra wrote:
> > > > On Wed, 2009-06-24 at 11:38 +0100, Richard Kennedy wrote:
> > > ...
> > > > OK, so Chris ran into this bit yesterday, complaining that he'd only get
> > > > very few write requests and couldn't saturate his IO channel.
> > > > 
> > > > Now, since writing out everything once there's something to do sucks for
> > > > Richard, but only writing out stuff when we're over the limit sucks for
> > > > Chris (since we can only be over the limit a little), the best thing
> > > > would be to only write out when we're over the background limit. Since
> > > > that is the low watermark we use for throttling it makes sense that we
> > > > try to write out when above that.
> > > > 
> > > > However, since there's a lack of bdi_background_thresh, and I don't
> > > > think introducing one just for this is really justified. How about the
> > > > below?
> > > > 
> > > > Chris how did this work for you? Richard, does this make things suck for
> > > > you again?
> > > > 
> > > > ---
> > > >  mm/page-writeback.c |    2 +-
> > > >  1 files changed, 1 insertions(+), 1 deletions(-)
> > > > 
> > > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > > > index 81627eb..92f42d6 100644
> > > > --- a/mm/page-writeback.c
> > > > +++ b/mm/page-writeback.c
> > > > @@ -545,7 +545,7 @@ static void balance_dirty_pages(struct address_space *mapping)
> > > >  		 * threshold otherwise wait until the disk writes catch
> > > >  		 * up.
> > > >  		 */
> > > > -		if (bdi_nr_reclaimable > bdi_thresh) {
> > > > +		if (bdi_nr_reclaimable > bdi_thresh/2) {
> > 
> > My patch had two extra spaces ;)
> > 
> > > >  			writeback_inodes(&wbc);
> > > >  			pages_written += write_chunk - wbc.nr_to_write;
> > > >  			get_dirty_limits(&background_thresh, &dirty_thresh,
> > > > 
> > > > 
> > > I'll run some tests and let you know :)
> > > 
> > > But what if someone has changed the vm settings?
> > > Maybe something like 
> > > 	(bdi_thresh * dirty_background_ratio / dirty_ratio)
> > > might be better ?
> > > 
> > > Chris, what sort of workload are you having problems with?
> > 
> > So, buffered writeback in general has a bunch of interesting features
> > right now, and to be honest I'm having a hard time untangling all of it.
> > It doesn't help that each of our filesystems is reacting differently.
> > 
> > Btrfs and XFS both use helper threads to process IO completion.  This
> > means that we tend to collect more writeback pages than the other
> > filesystems do.
> > 
> > The end result of this is that O_DIRECT is dramatically faster than
> > buffered on every streaming write workload I've tried.  I'm still trying
> > to sort out exactly where buffered IO is going wrong.
> > 
> > -chris
> > 
> Yes, it's all pretty complex.
> With a large number of pages in writeback do you think that the total
> dirty pages goes over the threshold ?
> I do wonder how long we get stuck in congestion_wait, it may be
> interesting to see if reducing its timeout has any effect. 

I've done a few more tests here, and running with Peter's patch doesn't
really change the results for either XFS or Btrfs.  The real reason his
patch was helping me at first was because it was forcing the btrfs code
into writepages more often.

I also did a run with congestion_wait(1) instead of HZ/10, and it also
didn't make much difference at all.

The only way I've been able to get buffered writeback closer to to
O_DIRECT writeback is just by increasing the number of pages that get
written (wbc.nr_to_write).

On Btrfs I'm testing with a patch that extends nr_to_write to cover the
full extent if delalloc is done (up to a limit of 32MB), which is
another way of saying it is covering a larger range of pages the FS
knows to be contiguous on disk.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-08-07 21:02 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-24 10:38 [RFC][PATCH] mm: stop balance_dirty_pages doing too much work Richard Kennedy
2009-06-24 22:27 ` Andrew Morton
2009-06-25  5:13   ` Jens Axboe
2009-06-25  8:00   ` Peter Zijlstra
2009-06-25  9:10     ` Jens Axboe
2009-06-25  9:26       ` Jens Axboe
2009-06-25 12:33         ` Al Boldi
2009-06-25 12:43           ` Jens Axboe
2009-06-25 13:46             ` Al Boldi
2009-06-25 14:44               ` Jens Axboe
2009-06-25 17:10                 ` Al Boldi
2009-06-26  5:02                   ` Jens Axboe
2009-06-26 11:37                     ` Al Boldi
2009-06-26 12:35                       ` Jens Axboe
2009-06-26  9:15       ` Richard Kennedy
2009-06-26  9:20         ` Jens Axboe
2009-08-07 12:20 ` Peter Zijlstra
2009-08-07 14:36   ` Richard Kennedy
2009-08-07 14:38     ` Peter Zijlstra
2009-08-07 15:22     ` Chris Mason
2009-08-07 16:09       ` Richard Kennedy
2009-08-07 21:02         ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox