On Fri, Jul 31, 2009 at 03:55:44AM +0800, Martin Bligh wrote: > > Note that this is a simple fix that may have suboptimal write performance. > > Here is an old reasoning: > > > > A A A A http://lkml.org/lkml/2009/3/28/235 > > The other thing I've been experimenting with is to disable the per-page > check in write_cache_pages, ie: > > if (wbc->nonblocking && bdi_write_congested(bdi)) { > wb_stats_inc(WB_STATS_WCP_SECTION_CONG); > wbc->encountered_congestion = 1; > /* done = 1; */ > > This treats the congestion limits as soft, but encourages us to write > back in larger, more efficient chunks. If that's not going to scare > people unduly, I can submit that as well. This risks hitting the hard limit (nr_requests), and block everyone, including the ones with higher priority (ie. kswapd). On the other hand, the simple fix in previous mails won't necessarily act too sub-optimal. It's only a potential one. There is a window of (1/16)*(nr_requests)*request_size (= 128*256KB/16 = 4MB) between congestion-on and congestion-off states. So for the best we can inject a big 4MB chunk into the async write queue once it becomes uncongested. I have a writeback debug patch that can help find out how that works out in your real world workloads (by monitoring nr_to_write). You can also try doubling the ratio (1/16) in blk_queue_congestion_threshold(), to see how an increased congestion-on-off window may help. Thanks, Fengguang