[PATCH] bump up nr_to_write in xfs_vm

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] bump up nr_to_write in xfs_vm_writepage
@ 2009-07-02 21:29 Eric Sandeen
  2009-07-07  9:07 ` Olaf Weber
  2009-07-07 15:17 ` Chris Mason
  0 siblings, 2 replies; 14+ messages in thread
From: Eric Sandeen @ 2009-07-02 21:29 UTC (permalink / raw)
  To: xfs mailing list; +Cc: linux-mm, Christoph Hellwig, MASON,CHRISTOPHER

Talking w/ someone who had a raid6 of 15 drives on an areca
controller, he wondered why he could only get 300MB/s or so
out of a streaming buffered write to xfs like so:

dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s

when the same write directly to the device was going closer
to 700MB/s...

With the following change things get moving again for xfs:

dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
10737418240 bytes (11 GB) copied, 16.2938 s, 659 MB/s

Chris had sent out something similar at Christoph's suggestion,
and Christoph reminded me of it, and I tested it a variant of
it, and it seems to help shockingly well.

Feels like a bandaid though; thoughts?  Other tests to do?

Thanks,
-Eric

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Cc: Chris Mason <chris.mason@oracle.com>
---

Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
@@ -1268,6 +1268,13 @@ xfs_vm_writepage(
 	if (!page_has_buffers(page))
 		create_empty_buffers(page, 1 << inode->i_blkbits, 0);
 
+
+	/*
+	 *  VM calculation for nr_to_write seems off.  Bump it way
+	 *  up, this gets simple streaming writes zippy again.
+	 */
+	wbc->nr_to_write *= 4;
+
 	/*
 	 * Convert delayed allocate, unwritten or unmapped space
 	 * to real space and flush out to disk.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-02 21:29 [PATCH] bump up nr_to_write in xfs_vm_writepage Eric Sandeen
@ 2009-07-07  9:07 ` Olaf Weber
  2009-07-07 10:19   ` Christoph Hellwig
  2009-07-07 15:17 ` Chris Mason
  1 sibling, 1 reply; 14+ messages in thread
From: Olaf Weber @ 2009-07-07  9:07 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: xfs mailing list, Christoph Hellwig, linux-mm, MASON, CHRISTOPHER

Eric Sandeen writes:

> Talking w/ someone who had a raid6 of 15 drives on an areca
> controller, he wondered why he could only get 300MB/s or so
> out of a streaming buffered write to xfs like so:

> dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
> 10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s

> when the same write directly to the device was going closer
> to 700MB/s...

> With the following change things get moving again for xfs:

> dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
> 10737418240 bytes (11 GB) copied, 16.2938 s, 659 MB/s

> Chris had sent out something similar at Christoph's suggestion,
> and Christoph reminded me of it, and I tested it a variant of
> it, and it seems to help shockingly well.

> Feels like a bandaid though; thoughts?  Other tests to do?

If the nr_to_write calculation really yields a value that is too
small, shouldn't it be fixed elsewhere?

Otherwise it might make sense to make the fudge factor tunable.

> +
> +	/*
> +	 *  VM calculation for nr_to_write seems off.  Bump it way
> +	 *  up, this gets simple streaming writes zippy again.
> +	 */
> +	wbc->nr_to_write *= 4;
> +

-- 
Olaf Weber                 SGI               Phone:  +31(0)30-6696752
                           Veldzigt 2b       Fax:    +31(0)30-6696799
Technical Lead             3454 PW de Meern  Vnet:   955-7151
Storage Software           The Netherlands   Email:  olaf@sgi.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07  9:07 ` Olaf Weber
@ 2009-07-07 10:19   ` Christoph Hellwig
  2009-07-07 10:33     ` KOSAKI Motohiro
  2009-07-07 11:37     ` Olaf Weber
  0 siblings, 2 replies; 14+ messages in thread
From: Christoph Hellwig @ 2009-07-07 10:19 UTC (permalink / raw)
  To: Olaf Weber
  Cc: Eric Sandeen, xfs mailing list, Christoph Hellwig, linux-mm,
	MASON, CHRISTOPHER

On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote:
> If the nr_to_write calculation really yields a value that is too
> small, shouldn't it be fixed elsewhere?

In theory it should.  But given the amazing feedback of the VM people
on this I'd rather make sure we do get the full HW bandwith on large
arrays instead of sucking badly and not just wait forever.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07 10:19   ` Christoph Hellwig
@ 2009-07-07 10:33     ` KOSAKI Motohiro
  2009-07-07 10:44       ` Christoph Hellwig
  2009-07-07 11:37     ` Olaf Weber
  1 sibling, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-07-07 10:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kosaki.motohiro, Olaf Weber, Eric Sandeen, xfs mailing list,
	linux-mm, MASON, CHRISTOPHER

> On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote:
> > If the nr_to_write calculation really yields a value that is too
> > small, shouldn't it be fixed elsewhere?
> 
> In theory it should.  But given the amazing feedback of the VM people
> on this I'd rather make sure we do get the full HW bandwith on large
> arrays instead of sucking badly and not just wait forever.

At least, I agree with Olaf. if you got someone's NAK in past thread,
Could you please tell me its url?




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07 10:33     ` KOSAKI Motohiro
@ 2009-07-07 10:44       ` Christoph Hellwig
  2009-07-09  2:04         ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2009-07-07 10:44 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Hellwig, Eric Sandeen, xfs mailing list, linux-mm,
	Olaf Weber, MASON, CHRISTOPHER

On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> At least, I agree with Olaf. if you got someone's NAK in past thread,
> Could you please tell me its url?

The previous thread was simply dead-ended and nothing happened.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07 10:44       ` Christoph Hellwig
@ 2009-07-09  2:04         ` KOSAKI Motohiro
  2009-07-09 13:01           ` Chris Mason
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-07-09  2:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kosaki.motohiro, Eric Sandeen, xfs mailing list, linux-mm,
	Olaf Weber, MASON, CHRISTOPHER

> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> > At least, I agree with Olaf. if you got someone's NAK in past thread,
> > Could you please tell me its url?
> 
> The previous thread was simply dead-ended and nothing happened.
> 

Can you remember this thread subject? sorry, I haven't remember it.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-09  2:04         ` KOSAKI Motohiro
@ 2009-07-09 13:01           ` Chris Mason
  2009-07-10  7:12             ` KOSAKI Motohiro
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Mason @ 2009-07-09 13:01 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Hellwig, Eric Sandeen, xfs mailing list, linux-mm, Olaf Weber

On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
> > On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> > > At least, I agree with Olaf. if you got someone's NAK in past thread,
> > > Could you please tell me its url?
> > 
> > The previous thread was simply dead-ended and nothing happened.
> > 
> 
> Can you remember this thread subject? sorry, I haven't remember it.

This is the original thread, it did lead to a few different patches
going in, but the nr_to_write change wasn't one of them.

http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-09 13:01           ` Chris Mason
@ 2009-07-10  7:12             ` KOSAKI Motohiro
  2009-07-24  5:20               ` Felix Blyakher
  0 siblings, 1 reply; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-07-10  7:12 UTC (permalink / raw)
  To: Chris Mason
  Cc: kosaki.motohiro, Christoph Hellwig, Eric Sandeen,
	xfs mailing list, linux-mm, Olaf Weber

> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
> > > On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> > > > At least, I agree with Olaf. if you got someone's NAK in past thread,
> > > > Could you please tell me its url?
> > > 
> > > The previous thread was simply dead-ended and nothing happened.
> > > 
> > 
> > Can you remember this thread subject? sorry, I haven't remember it.
> 
> This is the original thread, it did lead to a few different patches
> going in, but the nr_to_write change wasn't one of them.
> 
> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread

Thanks good pointer. This thread have multiple interesting discussion.

1. making ext4_write_cache_pages() or modifying write_cache_pages()

I think this is Christoph's homework. he said

> I agree.  But I'm still not quite sure if that requirement is unique to
> ext4 anyway.  Give me some time to dive into the writeback code again,
> haven't been there for quite a while.

if he says modifying write_cache_pages() is necessary, I'd like to review it.

2. Current mapping->writeback_index updating is not proper?

I'm not sure which solution is better. but I think your first proposal is
enough acceptable.

3. Current wbc->nr_to_write value is not proper?

Current writeback_set_ratelimit() doesn't permit that ratelimit_pages exceed
4M byte. but it is too low restriction for nowadays.
(that's my understand. right?)

=======================================================
void writeback_set_ratelimit(void)
{
        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
        if (ratelimit_pages < 16)
                ratelimit_pages = 16;
        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
}
=======================================================

Yes, 4M bytes are pretty magical constant. We have three choice
  A. Remove magical 4M constant simple (a bit danger)
  B. Decide high border from IO capability
  C. Introduce new /proc knob (as Olaf proposed)

In my personal prefer, B & C are better.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-10  7:12             ` KOSAKI Motohiro
@ 2009-07-24  5:20               ` Felix Blyakher
  2009-07-24  5:33                 ` KOSAKI Motohiro
  2009-07-24 12:05                 ` Chris Mason
  0 siblings, 2 replies; 14+ messages in thread
From: Felix Blyakher @ 2009-07-24  5:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Chris Mason, Eric Sandeen, xfs mailing list, Christoph Hellwig,
	linux-mm, Olaf Weber


On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote:

>> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
>>>> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
>>>>> At least, I agree with Olaf. if you got someone's NAK in past  
>>>>> thread,
>>>>> Could you please tell me its url?
>>>>
>>>> The previous thread was simply dead-ended and nothing happened.
>>>>
>>>
>>> Can you remember this thread subject? sorry, I haven't remember it.
>>
>> This is the original thread, it did lead to a few different patches
>> going in, but the nr_to_write change wasn't one of them.
>>
>> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread
>
> Thanks good pointer. This thread have multiple interesting discussion.
>
> 1. making ext4_write_cache_pages() or modifying write_cache_pages()
>
> I think this is Christoph's homework. he said
>
>> I agree.  But I'm still not quite sure if that requirement is  
>> unique to
>> ext4 anyway.  Give me some time to dive into the writeback code  
>> again,
>> haven't been there for quite a while.
>
> if he says modifying write_cache_pages() is necessary, I'd like to  
> review it.
>
>
> 2. Current mapping->writeback_index updating is not proper?
>
> I'm not sure which solution is better. but I think your first  
> proposal is
> enough acceptable.
>
>
> 3. Current wbc->nr_to_write value is not proper?
>
> Current writeback_set_ratelimit() doesn't permit that  
> ratelimit_pages exceed
> 4M byte. but it is too low restriction for nowadays.
> (that's my understand. right?)
>
> =======================================================
> void writeback_set_ratelimit(void)
> {
>        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
>        if (ratelimit_pages < 16)
>                ratelimit_pages = 16;
>        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
>                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
> }
> =======================================================
>
> Yes, 4M bytes are pretty magical constant. We have three choice
>  A. Remove magical 4M constant simple (a bit danger)

That's will be outside the xfs, and seems like there is no much interest
from mm people.

>  B. Decide high border from IO capability

It's not clear to me how to calculate that high border, but again
it's outside of the xfs scope, and we don't have much control here.

>  C. Introduce new /proc knob (as Olaf proposed)

We need at least to play with different numbers, and putting the
knob (xfs tunable) would be one way to do it. Also, different
configurations may need different nr_to_write value.

In either way it seems hackish, but with the knob at least there is
some control of it.

Felix

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-24  5:20               ` Felix Blyakher
@ 2009-07-24  5:33                 ` KOSAKI Motohiro
  2009-07-24 12:05                 ` Chris Mason
  1 sibling, 0 replies; 14+ messages in thread
From: KOSAKI Motohiro @ 2009-07-24  5:33 UTC (permalink / raw)
  To: Felix Blyakher
  Cc: kosaki.motohiro, Chris Mason, Eric Sandeen, xfs mailing list,
	Christoph Hellwig, linux-mm, Olaf Weber

> 
> On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote:
> 
> >> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote:
> >>>> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote:
> >>>>> At least, I agree with Olaf. if you got someone's NAK in past  
> >>>>> thread,
> >>>>> Could you please tell me its url?
> >>>>
> >>>> The previous thread was simply dead-ended and nothing happened.
> >>>>
> >>>
> >>> Can you remember this thread subject? sorry, I haven't remember it.
> >>
> >> This is the original thread, it did lead to a few different patches
> >> going in, but the nr_to_write change wasn't one of them.
> >>
> >> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread
> >
> > Thanks good pointer. This thread have multiple interesting discussion.
> >
> > 1. making ext4_write_cache_pages() or modifying write_cache_pages()
> >
> > I think this is Christoph's homework. he said
> >
> >> I agree.  But I'm still not quite sure if that requirement is  
> >> unique to
> >> ext4 anyway.  Give me some time to dive into the writeback code  
> >> again,
> >> haven't been there for quite a while.
> >
> > if he says modifying write_cache_pages() is necessary, I'd like to  
> > review it.
> >
> >
> > 2. Current mapping->writeback_index updating is not proper?
> >
> > I'm not sure which solution is better. but I think your first  
> > proposal is
> > enough acceptable.
> >
> >
> > 3. Current wbc->nr_to_write value is not proper?
> >
> > Current writeback_set_ratelimit() doesn't permit that  
> > ratelimit_pages exceed
> > 4M byte. but it is too low restriction for nowadays.
> > (that's my understand. right?)
> >
> > =======================================================
> > void writeback_set_ratelimit(void)
> > {
> >        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
> >        if (ratelimit_pages < 16)
> >                ratelimit_pages = 16;
> >        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
> >                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
> > }
> > =======================================================
> >
> > Yes, 4M bytes are pretty magical constant. We have three choice
> >  A. Remove magical 4M constant simple (a bit danger)
> 
> That's will be outside the xfs, and seems like there is no much interest
> from mm people.

That's ok. you can join mm people :)



> >  B. Decide high border from IO capability
> 
> It's not clear to me how to calculate that high border, but again
> it's outside of the xfs scope, and we don't have much control here.
> 
> >  C. Introduce new /proc knob (as Olaf proposed)
> 
> We need at least to play with different numbers, and putting the
> knob (xfs tunable) would be one way to do it. Also, different
> configurations may need different nr_to_write value.
> 
> In either way it seems hackish, but with the knob at least there is
> some control of it.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-24  5:20               ` Felix Blyakher
  2009-07-24  5:33                 ` KOSAKI Motohiro
@ 2009-07-24 12:05                 ` Chris Mason
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Mason @ 2009-07-24 12:05 UTC (permalink / raw)
  To: Felix Blyakher
  Cc: KOSAKI Motohiro, Eric Sandeen, xfs mailing list,
	Christoph Hellwig, linux-mm, Olaf Weber

On Fri, Jul 24, 2009 at 12:20:32AM -0500, Felix Blyakher wrote:
>
> On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote:
>> 3. Current wbc->nr_to_write value is not proper?
>>
>> Current writeback_set_ratelimit() doesn't permit that ratelimit_pages 
>> exceed
>> 4M byte. but it is too low restriction for nowadays.
>> (that's my understand. right?)
>>
>> =======================================================
>> void writeback_set_ratelimit(void)
>> {
>>        ratelimit_pages = vm_total_pages / (num_online_cpus() * 32);
>>        if (ratelimit_pages < 16)
>>                ratelimit_pages = 16;
>>        if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024)
>>                ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE;
>> }
>> =======================================================
>>
>> Yes, 4M bytes are pretty magical constant. We have three choice
>>  A. Remove magical 4M constant simple (a bit danger)
>
> That's will be outside the xfs, and seems like there is no much interest
> from mm people.
>
>>  B. Decide high border from IO capability

It is worth pointing out that Jens Axboe is planning on more feedback
controlled knobs as part of pdflush rework.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07 10:19   ` Christoph Hellwig
  2009-07-07 10:33     ` KOSAKI Motohiro
@ 2009-07-07 11:37     ` Olaf Weber
  2009-07-07 14:46       ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Olaf Weber @ 2009-07-07 11:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Sandeen, linux-mm, MASON, CHRISTOPHER, xfs mailing list

Christoph Hellwig writes:
> On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote:

>> If the nr_to_write calculation really yields a value that is too
>> small, shouldn't it be fixed elsewhere?

> In theory it should.  But given the amazing feedback of the VM people
> on this I'd rather make sure we do get the full HW bandwith on large
> arrays instead of sucking badly and not just wait forever.

So how do you feel about making the fudge factor tunable?  I don't
have a good sense myself of what the value should be, whether the
hard-coded 4 is good enough in general.

-- 
Olaf Weber                 SGI               Phone:  +31(0)30-6696752
                           Veldzigt 2b       Fax:    +31(0)30-6696799
Technical Lead             3454 PW de Meern  Vnet:   955-7151
Storage Software           The Netherlands   Email:  olaf@sgi.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-07 11:37     ` Olaf Weber
@ 2009-07-07 14:46       ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2009-07-07 14:46 UTC (permalink / raw)
  To: Olaf Weber
  Cc: Christoph Hellwig, Eric Sandeen, linux-mm, MASON, CHRISTOPHER,
	xfs mailing list

On Tue, Jul 07, 2009 at 01:37:05PM +0200, Olaf Weber wrote:
> > In theory it should.  But given the amazing feedback of the VM people
> > on this I'd rather make sure we do get the full HW bandwith on large
> > arrays instead of sucking badly and not just wait forever.
> 
> So how do you feel about making the fudge factor tunable?  I don't
> have a good sense myself of what the value should be, whether the
> hard-coded 4 is good enough in general.

A tunable means exposing an ABI, which I'd rather not do for a hack like
this.  If you don't like the number feel free to experiment around with
it, SGI should have enough large systems that can be used to test this
out.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage
  2009-07-02 21:29 [PATCH] bump up nr_to_write in xfs_vm_writepage Eric Sandeen
  2009-07-07  9:07 ` Olaf Weber
@ 2009-07-07 15:17 ` Chris Mason
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Mason @ 2009-07-07 15:17 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs mailing list, linux-mm, Christoph Hellwig, jens.axboe

On Thu, Jul 02, 2009 at 04:29:41PM -0500, Eric Sandeen wrote:
> Talking w/ someone who had a raid6 of 15 drives on an areca
> controller, he wondered why he could only get 300MB/s or so
> out of a streaming buffered write to xfs like so:
> 
> dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920
> 10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s

I did some quick tests and found some unhappy things ;)  On my 5 drive
sata array (configured via LVM in a stripeset), dd with O_DIRECT to the
block device can stream writes at a healthy 550MB/s.

On 2.6.30, XFS does O_DIRECT at the exact same 550MB/s, and buffered
writes at 370MB/s.  Btrfs does a little better on buffered and a little
worse on O_DIRECT.  Ext4 splits the middle and does 400MB/s on both
buffered and O_DIRECT.

2.6.31-rc2 gave similar results.  One thing I noticed was that pdflush
and friends aren't using the right flag in congestion_wait after it was
updated to do congestion based on sync/async instead of read/write.  I'm
always happy when I get to blame bugs on Jens, but fixing the congestion
flag usage actually made the runs slower (he still promises to send a
patch for the congestion).

A little while ago, Jan Kara sent seekwatcher changes that let it graph
per-process info about IO submission, so I cooked up a graph of the IO
done by pdflush, dd, and others during an XFS buffered streaming write.

http://oss.oracle.com/~mason/seekwatcher/xfs-dd-2.6.30.png

The dark blue dots are dd doing writes and the light green dots are
pdflush.  The graph shows that pdflush spends almost the entire run
sitting around doing nothing, and sysrq-w shows all the pdflush threads
waiting around in congestion_wait.

Just to make sure the graphing wasn't hiding work done by pdflush, I
filtered out all the dd IO:

http://oss.oracle.com/~mason/seekwatcher/xfs-dd-2.6.30-filtered.png

With all of this in mind, I think the reason why the nr_to_write change
is helping is because dd is doing all the IO during balance_dirty_pages,
and the higher nr_to_write number is making sure that more IO goes out
at a time.

Once dd starts doing IO in balance_dirty_pages, our queues get
congested.  From that moment on, the bdi_congested checks in the
writeback path make pdflush sit down.  I doubt the queue every really
leaves congestion because we get over the dirty high water mark and dd
is jumping in and sending IO down the pipe without waiting for
congestion to clear.

sysrq-w supports this.  dd is always in get_request_wait and pdflush is
always in congestion_wait.

This bad interaction between pdflush and congestion was one of the
motivations for Jens' new writeback work, so I was really hoping to git
pull and post a fantastic new benchmark result.  With Jens' code the
graph ends up completely inverted, with roughly the same performance.

Instead of dd doing all the work, the flusher thread is doing all the
work (horray!) and dd is almost always in congestion_wait (boo).  I
think the cause is a little different, it seems that with Jens' code, dd
finds the flusher thread has the inode locked, and so
balance_dirty_pages doesn't find any work to do.  It waits on
congestion_wait().

If I replace the balance_dirty_pages() congestion_wait() with
schedule_timeout(1) in Jens' writeback branch, xfs buffered writes go
from 370MB/s to 520MB/s.  There are still some big peaks and valleys,
but it at least shows where we need to think harder about congestion
flags, IO waiting and other issues.

All of this is a long way of saying that until Jens' new code goes in,
(with additional tuning) the nr_to_write change makes sense to me.  I
don't see a 2.6.31 suitable way to tune things without his work.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-07-24 12:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-02 21:29 [PATCH] bump up nr_to_write in xfs_vm_writepage Eric Sandeen
2009-07-07  9:07 ` Olaf Weber
2009-07-07 10:19   ` Christoph Hellwig
2009-07-07 10:33     ` KOSAKI Motohiro
2009-07-07 10:44       ` Christoph Hellwig
2009-07-09  2:04         ` KOSAKI Motohiro
2009-07-09 13:01           ` Chris Mason
2009-07-10  7:12             ` KOSAKI Motohiro
2009-07-24  5:20               ` Felix Blyakher
2009-07-24  5:33                 ` KOSAKI Motohiro
2009-07-24 12:05                 ` Chris Mason
2009-07-07 11:37     ` Olaf Weber
2009-07-07 14:46       ` Christoph Hellwig
2009-07-07 15:17 ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox