* [PATCH] bump up nr_to_write in xfs_vm_writepage @ 2009-07-02 21:29 Eric Sandeen 2009-07-07 9:07 ` Olaf Weber 2009-07-07 15:17 ` Chris Mason 0 siblings, 2 replies; 14+ messages in thread From: Eric Sandeen @ 2009-07-02 21:29 UTC (permalink / raw) To: xfs mailing list; +Cc: linux-mm, Christoph Hellwig, MASON,CHRISTOPHER Talking w/ someone who had a raid6 of 15 drives on an areca controller, he wondered why he could only get 300MB/s or so out of a streaming buffered write to xfs like so: dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920 10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s when the same write directly to the device was going closer to 700MB/s... With the following change things get moving again for xfs: dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920 10737418240 bytes (11 GB) copied, 16.2938 s, 659 MB/s Chris had sent out something similar at Christoph's suggestion, and Christoph reminded me of it, and I tested it a variant of it, and it seems to help shockingly well. Feels like a bandaid though; thoughts? Other tests to do? Thanks, -Eric Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Cc: Chris Mason <chris.mason@oracle.com> --- Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c =================================================================== --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c +++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c @@ -1268,6 +1268,13 @@ xfs_vm_writepage( if (!page_has_buffers(page)) create_empty_buffers(page, 1 << inode->i_blkbits, 0); + + /* + * VM calculation for nr_to_write seems off. Bump it way + * up, this gets simple streaming writes zippy again. + */ + wbc->nr_to_write *= 4; + /* * Convert delayed allocate, unwritten or unmapped space * to real space and flush out to disk. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-02 21:29 [PATCH] bump up nr_to_write in xfs_vm_writepage Eric Sandeen @ 2009-07-07 9:07 ` Olaf Weber 2009-07-07 10:19 ` Christoph Hellwig 2009-07-07 15:17 ` Chris Mason 1 sibling, 1 reply; 14+ messages in thread From: Olaf Weber @ 2009-07-07 9:07 UTC (permalink / raw) To: Eric Sandeen Cc: xfs mailing list, Christoph Hellwig, linux-mm, MASON, CHRISTOPHER Eric Sandeen writes: > Talking w/ someone who had a raid6 of 15 drives on an areca > controller, he wondered why he could only get 300MB/s or so > out of a streaming buffered write to xfs like so: > dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920 > 10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s > when the same write directly to the device was going closer > to 700MB/s... > With the following change things get moving again for xfs: > dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920 > 10737418240 bytes (11 GB) copied, 16.2938 s, 659 MB/s > Chris had sent out something similar at Christoph's suggestion, > and Christoph reminded me of it, and I tested it a variant of > it, and it seems to help shockingly well. > Feels like a bandaid though; thoughts? Other tests to do? If the nr_to_write calculation really yields a value that is too small, shouldn't it be fixed elsewhere? Otherwise it might make sense to make the fudge factor tunable. > + > + /* > + * VM calculation for nr_to_write seems off. Bump it way > + * up, this gets simple streaming writes zippy again. > + */ > + wbc->nr_to_write *= 4; > + -- Olaf Weber SGI Phone: +31(0)30-6696752 Veldzigt 2b Fax: +31(0)30-6696799 Technical Lead 3454 PW de Meern Vnet: 955-7151 Storage Software The Netherlands Email: olaf@sgi.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-07 9:07 ` Olaf Weber @ 2009-07-07 10:19 ` Christoph Hellwig 2009-07-07 10:33 ` KOSAKI Motohiro 2009-07-07 11:37 ` Olaf Weber 0 siblings, 2 replies; 14+ messages in thread From: Christoph Hellwig @ 2009-07-07 10:19 UTC (permalink / raw) To: Olaf Weber Cc: Eric Sandeen, xfs mailing list, Christoph Hellwig, linux-mm, MASON, CHRISTOPHER On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote: > If the nr_to_write calculation really yields a value that is too > small, shouldn't it be fixed elsewhere? In theory it should. But given the amazing feedback of the VM people on this I'd rather make sure we do get the full HW bandwith on large arrays instead of sucking badly and not just wait forever. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-07 10:19 ` Christoph Hellwig @ 2009-07-07 10:33 ` KOSAKI Motohiro 2009-07-07 10:44 ` Christoph Hellwig 2009-07-07 11:37 ` Olaf Weber 1 sibling, 1 reply; 14+ messages in thread From: KOSAKI Motohiro @ 2009-07-07 10:33 UTC (permalink / raw) To: Christoph Hellwig Cc: kosaki.motohiro, Olaf Weber, Eric Sandeen, xfs mailing list, linux-mm, MASON, CHRISTOPHER > On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote: > > If the nr_to_write calculation really yields a value that is too > > small, shouldn't it be fixed elsewhere? > > In theory it should. But given the amazing feedback of the VM people > on this I'd rather make sure we do get the full HW bandwith on large > arrays instead of sucking badly and not just wait forever. At least, I agree with Olaf. if you got someone's NAK in past thread, Could you please tell me its url? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-07 10:33 ` KOSAKI Motohiro @ 2009-07-07 10:44 ` Christoph Hellwig 2009-07-09 2:04 ` KOSAKI Motohiro 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2009-07-07 10:44 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Christoph Hellwig, Eric Sandeen, xfs mailing list, linux-mm, Olaf Weber, MASON, CHRISTOPHER On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote: > At least, I agree with Olaf. if you got someone's NAK in past thread, > Could you please tell me its url? The previous thread was simply dead-ended and nothing happened. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-07 10:44 ` Christoph Hellwig @ 2009-07-09 2:04 ` KOSAKI Motohiro 2009-07-09 13:01 ` Chris Mason 0 siblings, 1 reply; 14+ messages in thread From: KOSAKI Motohiro @ 2009-07-09 2:04 UTC (permalink / raw) To: Christoph Hellwig Cc: kosaki.motohiro, Eric Sandeen, xfs mailing list, linux-mm, Olaf Weber, MASON, CHRISTOPHER > On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote: > > At least, I agree with Olaf. if you got someone's NAK in past thread, > > Could you please tell me its url? > > The previous thread was simply dead-ended and nothing happened. > Can you remember this thread subject? sorry, I haven't remember it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-09 2:04 ` KOSAKI Motohiro @ 2009-07-09 13:01 ` Chris Mason 2009-07-10 7:12 ` KOSAKI Motohiro 0 siblings, 1 reply; 14+ messages in thread From: Chris Mason @ 2009-07-09 13:01 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Christoph Hellwig, Eric Sandeen, xfs mailing list, linux-mm, Olaf Weber On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote: > > On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote: > > > At least, I agree with Olaf. if you got someone's NAK in past thread, > > > Could you please tell me its url? > > > > The previous thread was simply dead-ended and nothing happened. > > > > Can you remember this thread subject? sorry, I haven't remember it. This is the original thread, it did lead to a few different patches going in, but the nr_to_write change wasn't one of them. http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-09 13:01 ` Chris Mason @ 2009-07-10 7:12 ` KOSAKI Motohiro 2009-07-24 5:20 ` Felix Blyakher 0 siblings, 1 reply; 14+ messages in thread From: KOSAKI Motohiro @ 2009-07-10 7:12 UTC (permalink / raw) To: Chris Mason Cc: kosaki.motohiro, Christoph Hellwig, Eric Sandeen, xfs mailing list, linux-mm, Olaf Weber > On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote: > > > On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote: > > > > At least, I agree with Olaf. if you got someone's NAK in past thread, > > > > Could you please tell me its url? > > > > > > The previous thread was simply dead-ended and nothing happened. > > > > > > > Can you remember this thread subject? sorry, I haven't remember it. > > This is the original thread, it did lead to a few different patches > going in, but the nr_to_write change wasn't one of them. > > http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread Thanks good pointer. This thread have multiple interesting discussion. 1. making ext4_write_cache_pages() or modifying write_cache_pages() I think this is Christoph's homework. he said > I agree. But I'm still not quite sure if that requirement is unique to > ext4 anyway. Give me some time to dive into the writeback code again, > haven't been there for quite a while. if he says modifying write_cache_pages() is necessary, I'd like to review it. 2. Current mapping->writeback_index updating is not proper? I'm not sure which solution is better. but I think your first proposal is enough acceptable. 3. Current wbc->nr_to_write value is not proper? Current writeback_set_ratelimit() doesn't permit that ratelimit_pages exceed 4M byte. but it is too low restriction for nowadays. (that's my understand. right?) ======================================================= void writeback_set_ratelimit(void) { ratelimit_pages = vm_total_pages / (num_online_cpus() * 32); if (ratelimit_pages < 16) ratelimit_pages = 16; if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024) ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE; } ======================================================= Yes, 4M bytes are pretty magical constant. We have three choice A. Remove magical 4M constant simple (a bit danger) B. Decide high border from IO capability C. Introduce new /proc knob (as Olaf proposed) In my personal prefer, B & C are better. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-10 7:12 ` KOSAKI Motohiro @ 2009-07-24 5:20 ` Felix Blyakher 2009-07-24 5:33 ` KOSAKI Motohiro 2009-07-24 12:05 ` Chris Mason 0 siblings, 2 replies; 14+ messages in thread From: Felix Blyakher @ 2009-07-24 5:20 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Chris Mason, Eric Sandeen, xfs mailing list, Christoph Hellwig, linux-mm, Olaf Weber On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote: >> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote: >>>> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote: >>>>> At least, I agree with Olaf. if you got someone's NAK in past >>>>> thread, >>>>> Could you please tell me its url? >>>> >>>> The previous thread was simply dead-ended and nothing happened. >>>> >>> >>> Can you remember this thread subject? sorry, I haven't remember it. >> >> This is the original thread, it did lead to a few different patches >> going in, but the nr_to_write change wasn't one of them. >> >> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread > > Thanks good pointer. This thread have multiple interesting discussion. > > 1. making ext4_write_cache_pages() or modifying write_cache_pages() > > I think this is Christoph's homework. he said > >> I agree. But I'm still not quite sure if that requirement is >> unique to >> ext4 anyway. Give me some time to dive into the writeback code >> again, >> haven't been there for quite a while. > > if he says modifying write_cache_pages() is necessary, I'd like to > review it. > > > 2. Current mapping->writeback_index updating is not proper? > > I'm not sure which solution is better. but I think your first > proposal is > enough acceptable. > > > 3. Current wbc->nr_to_write value is not proper? > > Current writeback_set_ratelimit() doesn't permit that > ratelimit_pages exceed > 4M byte. but it is too low restriction for nowadays. > (that's my understand. right?) > > ======================================================= > void writeback_set_ratelimit(void) > { > ratelimit_pages = vm_total_pages / (num_online_cpus() * 32); > if (ratelimit_pages < 16) > ratelimit_pages = 16; > if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024) > ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE; > } > ======================================================= > > Yes, 4M bytes are pretty magical constant. We have three choice > A. Remove magical 4M constant simple (a bit danger) That's will be outside the xfs, and seems like there is no much interest from mm people. > B. Decide high border from IO capability It's not clear to me how to calculate that high border, but again it's outside of the xfs scope, and we don't have much control here. > C. Introduce new /proc knob (as Olaf proposed) We need at least to play with different numbers, and putting the knob (xfs tunable) would be one way to do it. Also, different configurations may need different nr_to_write value. In either way it seems hackish, but with the knob at least there is some control of it. Felix -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-24 5:20 ` Felix Blyakher @ 2009-07-24 5:33 ` KOSAKI Motohiro 2009-07-24 12:05 ` Chris Mason 1 sibling, 0 replies; 14+ messages in thread From: KOSAKI Motohiro @ 2009-07-24 5:33 UTC (permalink / raw) To: Felix Blyakher Cc: kosaki.motohiro, Chris Mason, Eric Sandeen, xfs mailing list, Christoph Hellwig, linux-mm, Olaf Weber > > On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote: > > >> On Thu, Jul 09, 2009 at 11:04:32AM +0900, KOSAKI Motohiro wrote: > >>>> On Tue, Jul 07, 2009 at 07:33:04PM +0900, KOSAKI Motohiro wrote: > >>>>> At least, I agree with Olaf. if you got someone's NAK in past > >>>>> thread, > >>>>> Could you please tell me its url? > >>>> > >>>> The previous thread was simply dead-ended and nothing happened. > >>>> > >>> > >>> Can you remember this thread subject? sorry, I haven't remember it. > >> > >> This is the original thread, it did lead to a few different patches > >> going in, but the nr_to_write change wasn't one of them. > >> > >> http://kerneltrap.org/mailarchive/linux-kernel/2008/10/1/3472704/thread > > > > Thanks good pointer. This thread have multiple interesting discussion. > > > > 1. making ext4_write_cache_pages() or modifying write_cache_pages() > > > > I think this is Christoph's homework. he said > > > >> I agree. But I'm still not quite sure if that requirement is > >> unique to > >> ext4 anyway. Give me some time to dive into the writeback code > >> again, > >> haven't been there for quite a while. > > > > if he says modifying write_cache_pages() is necessary, I'd like to > > review it. > > > > > > 2. Current mapping->writeback_index updating is not proper? > > > > I'm not sure which solution is better. but I think your first > > proposal is > > enough acceptable. > > > > > > 3. Current wbc->nr_to_write value is not proper? > > > > Current writeback_set_ratelimit() doesn't permit that > > ratelimit_pages exceed > > 4M byte. but it is too low restriction for nowadays. > > (that's my understand. right?) > > > > ======================================================= > > void writeback_set_ratelimit(void) > > { > > ratelimit_pages = vm_total_pages / (num_online_cpus() * 32); > > if (ratelimit_pages < 16) > > ratelimit_pages = 16; > > if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024) > > ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE; > > } > > ======================================================= > > > > Yes, 4M bytes are pretty magical constant. We have three choice > > A. Remove magical 4M constant simple (a bit danger) > > That's will be outside the xfs, and seems like there is no much interest > from mm people. That's ok. you can join mm people :) > > B. Decide high border from IO capability > > It's not clear to me how to calculate that high border, but again > it's outside of the xfs scope, and we don't have much control here. > > > C. Introduce new /proc knob (as Olaf proposed) > > We need at least to play with different numbers, and putting the > knob (xfs tunable) would be one way to do it. Also, different > configurations may need different nr_to_write value. > > In either way it seems hackish, but with the knob at least there is > some control of it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-24 5:20 ` Felix Blyakher 2009-07-24 5:33 ` KOSAKI Motohiro @ 2009-07-24 12:05 ` Chris Mason 1 sibling, 0 replies; 14+ messages in thread From: Chris Mason @ 2009-07-24 12:05 UTC (permalink / raw) To: Felix Blyakher Cc: KOSAKI Motohiro, Eric Sandeen, xfs mailing list, Christoph Hellwig, linux-mm, Olaf Weber On Fri, Jul 24, 2009 at 12:20:32AM -0500, Felix Blyakher wrote: > > On Jul 10, 2009, at 2:12 AM, KOSAKI Motohiro wrote: >> 3. Current wbc->nr_to_write value is not proper? >> >> Current writeback_set_ratelimit() doesn't permit that ratelimit_pages >> exceed >> 4M byte. but it is too low restriction for nowadays. >> (that's my understand. right?) >> >> ======================================================= >> void writeback_set_ratelimit(void) >> { >> ratelimit_pages = vm_total_pages / (num_online_cpus() * 32); >> if (ratelimit_pages < 16) >> ratelimit_pages = 16; >> if (ratelimit_pages * PAGE_CACHE_SIZE > 4096 * 1024) >> ratelimit_pages = (4096 * 1024) / PAGE_CACHE_SIZE; >> } >> ======================================================= >> >> Yes, 4M bytes are pretty magical constant. We have three choice >> A. Remove magical 4M constant simple (a bit danger) > > That's will be outside the xfs, and seems like there is no much interest > from mm people. > >> B. Decide high border from IO capability It is worth pointing out that Jens Axboe is planning on more feedback controlled knobs as part of pdflush rework. -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-07 10:19 ` Christoph Hellwig 2009-07-07 10:33 ` KOSAKI Motohiro @ 2009-07-07 11:37 ` Olaf Weber 2009-07-07 14:46 ` Christoph Hellwig 1 sibling, 1 reply; 14+ messages in thread From: Olaf Weber @ 2009-07-07 11:37 UTC (permalink / raw) To: Christoph Hellwig Cc: Eric Sandeen, linux-mm, MASON, CHRISTOPHER, xfs mailing list Christoph Hellwig writes: > On Tue, Jul 07, 2009 at 11:07:30AM +0200, Olaf Weber wrote: >> If the nr_to_write calculation really yields a value that is too >> small, shouldn't it be fixed elsewhere? > In theory it should. But given the amazing feedback of the VM people > on this I'd rather make sure we do get the full HW bandwith on large > arrays instead of sucking badly and not just wait forever. So how do you feel about making the fudge factor tunable? I don't have a good sense myself of what the value should be, whether the hard-coded 4 is good enough in general. -- Olaf Weber SGI Phone: +31(0)30-6696752 Veldzigt 2b Fax: +31(0)30-6696799 Technical Lead 3454 PW de Meern Vnet: 955-7151 Storage Software The Netherlands Email: olaf@sgi.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-07 11:37 ` Olaf Weber @ 2009-07-07 14:46 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2009-07-07 14:46 UTC (permalink / raw) To: Olaf Weber Cc: Christoph Hellwig, Eric Sandeen, linux-mm, MASON, CHRISTOPHER, xfs mailing list On Tue, Jul 07, 2009 at 01:37:05PM +0200, Olaf Weber wrote: > > In theory it should. But given the amazing feedback of the VM people > > on this I'd rather make sure we do get the full HW bandwith on large > > arrays instead of sucking badly and not just wait forever. > > So how do you feel about making the fudge factor tunable? I don't > have a good sense myself of what the value should be, whether the > hard-coded 4 is good enough in general. A tunable means exposing an ABI, which I'd rather not do for a hack like this. If you don't like the number feel free to experiment around with it, SGI should have enough large systems that can be used to test this out. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] bump up nr_to_write in xfs_vm_writepage 2009-07-02 21:29 [PATCH] bump up nr_to_write in xfs_vm_writepage Eric Sandeen 2009-07-07 9:07 ` Olaf Weber @ 2009-07-07 15:17 ` Chris Mason 1 sibling, 0 replies; 14+ messages in thread From: Chris Mason @ 2009-07-07 15:17 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs mailing list, linux-mm, Christoph Hellwig, jens.axboe On Thu, Jul 02, 2009 at 04:29:41PM -0500, Eric Sandeen wrote: > Talking w/ someone who had a raid6 of 15 drives on an areca > controller, he wondered why he could only get 300MB/s or so > out of a streaming buffered write to xfs like so: > > dd if=/dev/zero of=/mnt/storage/10gbfile bs=128k count=81920 > 10737418240 bytes (11 GB) copied, 34.294 s, 313 MB/s I did some quick tests and found some unhappy things ;) On my 5 drive sata array (configured via LVM in a stripeset), dd with O_DIRECT to the block device can stream writes at a healthy 550MB/s. On 2.6.30, XFS does O_DIRECT at the exact same 550MB/s, and buffered writes at 370MB/s. Btrfs does a little better on buffered and a little worse on O_DIRECT. Ext4 splits the middle and does 400MB/s on both buffered and O_DIRECT. 2.6.31-rc2 gave similar results. One thing I noticed was that pdflush and friends aren't using the right flag in congestion_wait after it was updated to do congestion based on sync/async instead of read/write. I'm always happy when I get to blame bugs on Jens, but fixing the congestion flag usage actually made the runs slower (he still promises to send a patch for the congestion). A little while ago, Jan Kara sent seekwatcher changes that let it graph per-process info about IO submission, so I cooked up a graph of the IO done by pdflush, dd, and others during an XFS buffered streaming write. http://oss.oracle.com/~mason/seekwatcher/xfs-dd-2.6.30.png The dark blue dots are dd doing writes and the light green dots are pdflush. The graph shows that pdflush spends almost the entire run sitting around doing nothing, and sysrq-w shows all the pdflush threads waiting around in congestion_wait. Just to make sure the graphing wasn't hiding work done by pdflush, I filtered out all the dd IO: http://oss.oracle.com/~mason/seekwatcher/xfs-dd-2.6.30-filtered.png With all of this in mind, I think the reason why the nr_to_write change is helping is because dd is doing all the IO during balance_dirty_pages, and the higher nr_to_write number is making sure that more IO goes out at a time. Once dd starts doing IO in balance_dirty_pages, our queues get congested. From that moment on, the bdi_congested checks in the writeback path make pdflush sit down. I doubt the queue every really leaves congestion because we get over the dirty high water mark and dd is jumping in and sending IO down the pipe without waiting for congestion to clear. sysrq-w supports this. dd is always in get_request_wait and pdflush is always in congestion_wait. This bad interaction between pdflush and congestion was one of the motivations for Jens' new writeback work, so I was really hoping to git pull and post a fantastic new benchmark result. With Jens' code the graph ends up completely inverted, with roughly the same performance. Instead of dd doing all the work, the flusher thread is doing all the work (horray!) and dd is almost always in congestion_wait (boo). I think the cause is a little different, it seems that with Jens' code, dd finds the flusher thread has the inode locked, and so balance_dirty_pages doesn't find any work to do. It waits on congestion_wait(). If I replace the balance_dirty_pages() congestion_wait() with schedule_timeout(1) in Jens' writeback branch, xfs buffered writes go from 370MB/s to 520MB/s. There are still some big peaks and valleys, but it at least shows where we need to think harder about congestion flags, IO waiting and other issues. All of this is a long way of saying that until Jens' new code goes in, (with additional tuning) the nr_to_write change makes sense to me. I don't see a 2.6.31 suitable way to tune things without his work. -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-07-24 12:05 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-07-02 21:29 [PATCH] bump up nr_to_write in xfs_vm_writepage Eric Sandeen 2009-07-07 9:07 ` Olaf Weber 2009-07-07 10:19 ` Christoph Hellwig 2009-07-07 10:33 ` KOSAKI Motohiro 2009-07-07 10:44 ` Christoph Hellwig 2009-07-09 2:04 ` KOSAKI Motohiro 2009-07-09 13:01 ` Chris Mason 2009-07-10 7:12 ` KOSAKI Motohiro 2009-07-24 5:20 ` Felix Blyakher 2009-07-24 5:33 ` KOSAKI Motohiro 2009-07-24 12:05 ` Chris Mason 2009-07-07 11:37 ` Olaf Weber 2009-07-07 14:46 ` Christoph Hellwig 2009-07-07 15:17 ` Chris Mason
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox