From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm <linux-mm@kvack.org>,
lkml <linux-kernel@vger.kernel.org>,
Chris Mason <chris.mason@oracle.com>
Subject: Re: [RFC][PATCH] mm: stop balance_dirty_pages doing too much work
Date: Fri, 07 Aug 2009 14:20:01 +0200 [thread overview]
Message-ID: <1249647601.32113.700.camel@twins> (raw)
In-Reply-To: <1245839904.3210.85.camel@localhost.localdomain>
On Wed, 2009-06-24 at 11:38 +0100, Richard Kennedy wrote:
>
> Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
> ---
> balance_dirty_pages can overreact and move all of the dirty pages to
> writeback unnecessarily.
>
> balance_dirty_pages makes its decision to throttle based on the number
> of dirty plus writeback pages that are over the calculated limit,so it
> will continue to move pages even when there are plenty of pages in
> writeback and less than the threshold still dirty.
>
> This allows it to overshoot its limits and move all the dirty pages to
> writeback while waiting for the drives to catch up and empty the
> writeback list.
>
> A simple fio test easily demonstrates this problem.
>
> fio --name=f1 --directory=/disk1 --size=2G -rw=write
> --name=f2 --directory=/disk2 --size=1G --rw=write --startdelay=10
>
> The attached graph before.png shows how all pages are moved to writeback
> as the second write starts and the throttling kicks in.
>
> after.png is the same test with the patch applied, which clearly shows
> that it keeps dirty_background_ratio dirty pages in the buffer.
> The values and timings of the graphs are only approximate but are good
> enough to show the behaviour.
>
> This is the simplest fix I could find, but I'm not entirely sure that it
> alone will be enough for all cases. But it certainly is an improvement
> on my desktop machine writing to 2 disks.
>
> Do we need something more for machines with large arrays where
> bdi_threshold * number_of_drives is greater than the dirty_ratio ?
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 7b0dcea..7687879 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -541,8 +541,11 @@ static void balance_dirty_pages(struct address_space *mapping)
> * filesystems (i.e. NFS) in which data may have been
> * written to the server's write cache, but has not yet
> * been flushed to permanent storage.
> + * Only move pages to writeback if this bdi is over its
> + * threshold otherwise wait until the disk writes catch
> + * up.
> */
> - if (bdi_nr_reclaimable) {
> + if (bdi_nr_reclaimable > bdi_thresh) {
> writeback_inodes(&wbc);
> pages_written += write_chunk - wbc.nr_to_write;
> get_dirty_limits(&background_thresh, &dirty_thresh,
OK, so Chris ran into this bit yesterday, complaining that he'd only get
very few write requests and couldn't saturate his IO channel.
Now, since writing out everything once there's something to do sucks for
Richard, but only writing out stuff when we're over the limit sucks for
Chris (since we can only be over the limit a little), the best thing
would be to only write out when we're over the background limit. Since
that is the low watermark we use for throttling it makes sense that we
try to write out when above that.
However, since there's a lack of bdi_background_thresh, and I don't
think introducing one just for this is really justified. How about the
below?
Chris how did this work for you? Richard, does this make things suck for
you again?
---
mm/page-writeback.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 81627eb..92f42d6 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -545,7 +545,7 @@ static void balance_dirty_pages(struct address_space *mapping)
* threshold otherwise wait until the disk writes catch
* up.
*/
- if (bdi_nr_reclaimable > bdi_thresh) {
+ if (bdi_nr_reclaimable > bdi_thresh/2) {
writeback_inodes(&wbc);
pages_written += write_chunk - wbc.nr_to_write;
get_dirty_limits(&background_thresh, &dirty_thresh,
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-08-07 12:20 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-24 10:38 Richard Kennedy
2009-06-24 22:27 ` Andrew Morton
2009-06-25 5:13 ` Jens Axboe
2009-06-25 8:00 ` Peter Zijlstra
2009-06-25 9:10 ` Jens Axboe
2009-06-25 9:26 ` Jens Axboe
2009-06-25 12:33 ` Al Boldi
2009-06-25 12:43 ` Jens Axboe
2009-06-25 13:46 ` Al Boldi
2009-06-25 14:44 ` Jens Axboe
2009-06-25 17:10 ` Al Boldi
2009-06-26 5:02 ` Jens Axboe
2009-06-26 11:37 ` Al Boldi
2009-06-26 12:35 ` Jens Axboe
2009-06-26 9:15 ` Richard Kennedy
2009-06-26 9:20 ` Jens Axboe
2009-08-07 12:20 ` Peter Zijlstra [this message]
2009-08-07 14:36 ` Richard Kennedy
2009-08-07 14:38 ` Peter Zijlstra
2009-08-07 15:22 ` Chris Mason
2009-08-07 16:09 ` Richard Kennedy
2009-08-07 21:02 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1249647601.32113.700.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=richard@rsk.demon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox