To help understand the behavior change, I wrote the writeback_queue_io trace event, and found very different patterns between - vanilla kernel - this patchset plus the sync livelock fixes Basically the vanilla kernel each time pulls a random number of inodes from b_dirty, while the patched kernel tends to pull a fixed number of inodes (enqueue=1031) from b_dirty. The new behavior is very interesting... The attached test script runs 1 dd and 1 tar concurrently on XFS, whose output can be found at the start of the trace files. The elapsed time is 289s for vanilla kernel and 270s for patched kernel. Thanks, Fengguang