* per-bdi-throttling: synchronous writepage doesn't work correctly @ 2007-11-01 16:49 Miklos Szeredi 2007-11-01 16:51 ` Peter Zijlstra 0 siblings, 1 reply; 10+ messages in thread From: Miklos Szeredi @ 2007-11-01 16:49 UTC (permalink / raw) To: a.p.zijlstra; +Cc: jdike, linux-mm, linux-kernel Hi, It looks like bdi_thresh will always be zero if filesystem does synchronous writepage, resulting in very poor write performance. Hostfs (UML) is one such example, but there might be others. The only solution I can think of is to add a set_page_writeback(); end_page_writeback() pair (or some reduced variant, that only does the proportions magic). But that means auditing quite a few filesystems... Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: per-bdi-throttling: synchronous writepage doesn't work correctly 2007-11-01 16:49 per-bdi-throttling: synchronous writepage doesn't work correctly Miklos Szeredi @ 2007-11-01 16:51 ` Peter Zijlstra 2007-11-01 17:00 ` Miklos Szeredi 0 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2007-11-01 16:51 UTC (permalink / raw) To: Miklos Szeredi; +Cc: jdike, linux-mm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 651 bytes --] On Thu, 2007-11-01 at 17:49 +0100, Miklos Szeredi wrote: > Hi, > > It looks like bdi_thresh will always be zero if filesystem does > synchronous writepage, resulting in very poor write performance. > > Hostfs (UML) is one such example, but there might be others. > > The only solution I can think of is to add a set_page_writeback(); > end_page_writeback() pair (or some reduced variant, that only does > the proportions magic). But that means auditing quite a few > filesystems... Ouch... I take it there is no other function that is shared between all these writeout paths which we could stick a bdi_writeout_inc(bdi) in? [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: per-bdi-throttling: synchronous writepage doesn't work correctly 2007-11-01 16:51 ` Peter Zijlstra @ 2007-11-01 17:00 ` Miklos Szeredi 2007-11-01 17:09 ` Peter Zijlstra 0 siblings, 1 reply; 10+ messages in thread From: Miklos Szeredi @ 2007-11-01 17:00 UTC (permalink / raw) To: a.p.zijlstra; +Cc: jdike, linux-mm, linux-kernel > > Hi, > > > > It looks like bdi_thresh will always be zero if filesystem does > > synchronous writepage, resulting in very poor write performance. > > > > Hostfs (UML) is one such example, but there might be others. > > > > The only solution I can think of is to add a set_page_writeback(); > > end_page_writeback() pair (or some reduced variant, that only does > > the proportions magic). But that means auditing quite a few > > filesystems... > > Ouch... > > I take it there is no other function that is shared between all these > writeout paths which we could stick a bdi_writeout_inc(bdi) in? No, and you can't detect it from the callers either I think. Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: per-bdi-throttling: synchronous writepage doesn't work correctly 2007-11-01 17:00 ` Miklos Szeredi @ 2007-11-01 17:09 ` Peter Zijlstra 2007-11-01 17:16 ` Peter Zijlstra 2007-11-01 17:28 ` Miklos Szeredi 0 siblings, 2 replies; 10+ messages in thread From: Peter Zijlstra @ 2007-11-01 17:09 UTC (permalink / raw) To: Miklos Szeredi; +Cc: jdike, linux-mm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1007 bytes --] On Thu, 2007-11-01 at 18:00 +0100, Miklos Szeredi wrote: > > > Hi, > > > > > > It looks like bdi_thresh will always be zero if filesystem does > > > synchronous writepage, resulting in very poor write performance. > > > > > > Hostfs (UML) is one such example, but there might be others. > > > > > > The only solution I can think of is to add a set_page_writeback(); > > > end_page_writeback() pair (or some reduced variant, that only does > > > the proportions magic). But that means auditing quite a few > > > filesystems... > > > > Ouch... > > > > I take it there is no other function that is shared between all these > > writeout paths which we could stick a bdi_writeout_inc(bdi) in? > > No, and you can't detect it from the callers either I think. The page not having PG_writeback set on return is a hint, but not fool proof, it could be the device is just blazing fast. I guess there is nothing to it but for me to grep writepage and manually look at all hits... [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: per-bdi-throttling: synchronous writepage doesn't work correctly 2007-11-01 17:09 ` Peter Zijlstra @ 2007-11-01 17:16 ` Peter Zijlstra 2007-11-01 18:35 ` Peter Zijlstra 2007-11-01 17:28 ` Miklos Szeredi 1 sibling, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2007-11-01 17:16 UTC (permalink / raw) To: Miklos Szeredi Cc: jdike, linux-mm, linux-kernel, Christoph Hellwig, Andrew Morton, Al Viro On Thu, 2007-11-01 at 18:09 +0100, Peter Zijlstra wrote: > On Thu, 2007-11-01 at 18:00 +0100, Miklos Szeredi wrote: > > > > Hi, > > > > > > > > It looks like bdi_thresh will always be zero if filesystem does > > > > synchronous writepage, resulting in very poor write performance. > > > > > > > > Hostfs (UML) is one such example, but there might be others. > > > > > > > > The only solution I can think of is to add a set_page_writeback(); > > > > end_page_writeback() pair (or some reduced variant, that only does > > > > the proportions magic). But that means auditing quite a few > > > > filesystems... > > > > > > Ouch... > > > > > > I take it there is no other function that is shared between all these > > > writeout paths which we could stick a bdi_writeout_inc(bdi) in? > > > > No, and you can't detect it from the callers either I think. > > The page not having PG_writeback set on return is a hint, but not fool > proof, it could be the device is just blazing fast. > > I guess there is nothing to it but for me to grep writepage and manually > look at all hits... writepage: called by the VM to write a dirty page to backing store. This may happen for data integrity reasons (i.e. 'sync'), or to free up memory (flush). The difference can be seen in wbc->sync_mode. The PG_Dirty flag has been cleared and PageLocked is true. writepage should start writeout, should set PG_Writeback, and should make sure the page is unlocked, either synchronously or asynchronously when the write operation completes. If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to try too hard if there are problems, and may choose to write out other pages from the mapping if that is easier (e.g. due to internal dependencies). If it chooses not to start writeout, it should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep calling ->writepage on that page. See the file "Locking" for more details. The "should set PG_Writeback" bit threw me off I guess. Anyway, do we want me to just stick in bdi_writeout_inc(page->mapping->backing_dev_info) everywhere, or do we want to dress this up in a nice API? If so, any suggestions? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: per-bdi-throttling: synchronous writepage doesn't work correctly 2007-11-01 17:16 ` Peter Zijlstra @ 2007-11-01 18:35 ` Peter Zijlstra 2007-11-01 19:19 ` Miklos Szeredi 0 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2007-11-01 18:35 UTC (permalink / raw) To: Miklos Szeredi Cc: jdike, linux-mm, linux-kernel, Christoph Hellwig, Andrew Morton, Al Viro On Thu, 2007-11-01 at 18:16 +0100, Peter Zijlstra wrote: > On Thu, 2007-11-01 at 18:09 +0100, Peter Zijlstra wrote: > > On Thu, 2007-11-01 at 18:00 +0100, Miklos Szeredi wrote: > > > > > Hi, > > > > > > > > > > It looks like bdi_thresh will always be zero if filesystem does > > > > > synchronous writepage, resulting in very poor write performance. > > > > > > > > > > Hostfs (UML) is one such example, but there might be others. > > > > > > > > > > The only solution I can think of is to add a set_page_writeback(); > > > > > end_page_writeback() pair (or some reduced variant, that only does > > > > > the proportions magic). But that means auditing quite a few > > > > > filesystems... > > > > > > > > Ouch... > > > > > > > > I take it there is no other function that is shared between all these > > > > writeout paths which we could stick a bdi_writeout_inc(bdi) in? > > > > > > No, and you can't detect it from the callers either I think. > > > > The page not having PG_writeback set on return is a hint, but not fool > > proof, it could be the device is just blazing fast. > > > > I guess there is nothing to it but for me to grep writepage and manually > > look at all hits... > > writepage: called by the VM to write a dirty page to backing store. > This may happen for data integrity reasons (i.e. 'sync'), or > to free up memory (flush). The difference can be seen in > wbc->sync_mode. > The PG_Dirty flag has been cleared and PageLocked is true. > writepage should start writeout, should set PG_Writeback, > and should make sure the page is unlocked, either synchronously > or asynchronously when the write operation completes. > > If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to > try too hard if there are problems, and may choose to write out > other pages from the mapping if that is easier (e.g. due to > internal dependencies). If it chooses not to start writeout, it > should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep > calling ->writepage on that page. > > See the file "Locking" for more details. > > > The "should set PG_Writeback" bit threw me off I guess. Hmm, set_page_writeback() is also the one clearing the radix tree dirty tag. So if that is not called, we get in a bit of a mess, no? Which makes me think hostfs is buggy. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: per-bdi-throttling: synchronous writepage doesn't work correctly 2007-11-01 18:35 ` Peter Zijlstra @ 2007-11-01 19:19 ` Miklos Szeredi 2007-11-01 19:55 ` Peter Zijlstra 0 siblings, 1 reply; 10+ messages in thread From: Miklos Szeredi @ 2007-11-01 19:19 UTC (permalink / raw) To: peterz; +Cc: jdike, linux-mm, linux-kernel, hch, akpm, viro > > > > See the file "Locking" for more details. > > > > > > The "should set PG_Writeback" bit threw me off I guess. > > Hmm, set_page_writeback() is also the one clearing the radix tree dirty > tag. So if that is not called, we get in a bit of a mess, no? > > Which makes me think hostfs is buggy. Yes, looks like that sort of usage is not valid. But not clearing the dirty tag won't cause any malfunction, it'll just waste some CPU when looking for dirty pages to write back. This is probably why this wasn't noticed earlier. Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: per-bdi-throttling: synchronous writepage doesn't work correctly 2007-11-01 19:19 ` Miklos Szeredi @ 2007-11-01 19:55 ` Peter Zijlstra 0 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2007-11-01 19:55 UTC (permalink / raw) To: Miklos Szeredi; +Cc: jdike, linux-mm, linux-kernel, hch, akpm, viro On Thu, 2007-11-01 at 20:19 +0100, Miklos Szeredi wrote: > > > > > > See the file "Locking" for more details. > > > > > > > > > The "should set PG_Writeback" bit threw me off I guess. > > > > Hmm, set_page_writeback() is also the one clearing the radix tree dirty > > tag. So if that is not called, we get in a bit of a mess, no? > > > > Which makes me think hostfs is buggy. > > Yes, looks like that sort of usage is not valid. But not clearing the > dirty tag won't cause any malfunction, it'll just waste some CPU when > looking for dirty pages to write back. This is probably why this > wasn't noticed earlier. Documentation/filesystems/Locking is also quite clear on the need to call set_page_writeback() and end_page_writeback(). minimal fix for hostfs Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c index 8966b05..b6c1e12 100644 --- a/fs/hostfs/hostfs_kern.c +++ b/fs/hostfs/hostfs_kern.c @@ -415,6 +415,7 @@ int hostfs_writepage(struct page *page, struct writeback_control *wbc) int end_index = inode->i_size >> PAGE_CACHE_SHIFT; int err; + set_page_writeback(page); if (page->index >= end_index) count = inode->i_size & (PAGE_CACHE_SIZE-1); @@ -438,6 +439,7 @@ int hostfs_writepage(struct page *page, struct writeback_control *wbc) kunmap(page); unlock_page(page); + end_page_writeback(page); return err; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: per-bdi-throttling: synchronous writepage doesn't work correctly 2007-11-01 17:09 ` Peter Zijlstra 2007-11-01 17:16 ` Peter Zijlstra @ 2007-11-01 17:28 ` Miklos Szeredi 2007-11-01 17:39 ` Peter Zijlstra 1 sibling, 1 reply; 10+ messages in thread From: Miklos Szeredi @ 2007-11-01 17:28 UTC (permalink / raw) To: a.p.zijlstra; +Cc: jdike, linux-mm, linux-kernel > The page not having PG_writeback set on return is a hint, but not fool > proof, it could be the device is just blazing fast. Hmm, does it actually has to be foolproof though? What will happen if bdi_writeout_inc() is called twice for the page? The device will get twice the number of pages it deserves? That's not all that bad, especially since that is a really really fast device. Miklos -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: per-bdi-throttling: synchronous writepage doesn't work correctly 2007-11-01 17:28 ` Miklos Szeredi @ 2007-11-01 17:39 ` Peter Zijlstra 0 siblings, 0 replies; 10+ messages in thread From: Peter Zijlstra @ 2007-11-01 17:39 UTC (permalink / raw) To: Miklos Szeredi; +Cc: jdike, linux-mm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 566 bytes --] On Thu, 2007-11-01 at 18:28 +0100, Miklos Szeredi wrote: > > The page not having PG_writeback set on return is a hint, but not fool > > proof, it could be the device is just blazing fast. > > Hmm, does it actually has to be foolproof though? What will happen if > bdi_writeout_inc() is called twice for the page? The device will get > twice the number of pages it deserves? That's not all that bad, > especially since that is a really really fast device. Basically, yes. But then again, that would require auditing all ->writepage() callsites. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-11-01 19:55 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-11-01 16:49 per-bdi-throttling: synchronous writepage doesn't work correctly Miklos Szeredi 2007-11-01 16:51 ` Peter Zijlstra 2007-11-01 17:00 ` Miklos Szeredi 2007-11-01 17:09 ` Peter Zijlstra 2007-11-01 17:16 ` Peter Zijlstra 2007-11-01 18:35 ` Peter Zijlstra 2007-11-01 19:19 ` Miklos Szeredi 2007-11-01 19:55 ` Peter Zijlstra 2007-11-01 17:28 ` Miklos Szeredi 2007-11-01 17:39 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox