Hi Dave, As you suggested, I added tests for ext4 and btrfs, the results are the same. Then I tried running perf record for 10 seconds starting from 200s. (The test runs for 410s). I see several warning messages and hope they do not impact the accuracy too much: [ 252.608069] perf samples too long (2532 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 [ 252.608863] perf samples too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 25000 [ 252.609422] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1.389 msecs Anyway the noticeable perf change are: 1d3d4437eae1bb2 9b17c62382dd2e7507984b989 --------------- ------------------------- 12.15 ~10% +209.8% 37.63 ~ 2% brickland2/debug2/vm-scalability/300s-btrfs-lru-file-readtwice 12.88 ~16% +189.4% 37.27 ~ 0% brickland2/debug2/vm-scalability/300s-ext4-lru-file-readtwice 15.24 ~ 9% +146.0% 37.50 ~ 1% brickland2/debug2/vm-scalability/300s-xfs-lru-file-readtwice 40.27 +179.1% 112.40 TOTAL perf-profile.cpu-cycles._raw_spin_lock.grab_super_passive.super_cache_count.shrink_slab.do_try_to_free_pages 1d3d4437eae1bb2 9b17c62382dd2e7507984b989 --------------- ------------------------- 11.91 ~12% +218.2% 37.89 ~ 2% brickland2/debug2/vm-scalability/300s-btrfs-lru-file-readtwice 12.47 ~16% +200.3% 37.44 ~ 0% brickland2/debug2/vm-scalability/300s-ext4-lru-file-readtwice 15.36 ~11% +145.4% 37.68 ~ 1% brickland2/debug2/vm-scalability/300s-xfs-lru-file-readtwice 39.73 +184.5% 113.01 TOTAL perf-profile.cpu-cycles._raw_spin_lock.put_super.drop_super.super_cache_count.shrink_slab perf report for 9b17c62382dd2e7507984b989: # Overhead Command Shared Object Symbol # ........ ............... .................. .............................................. # 77.74% dd [kernel.kallsyms] [k] _raw_spin_lock | --- _raw_spin_lock | |--47.65%-- grab_super_passive | super_cache_count | shrink_slab | do_try_to_free_pages | try_to_free_pages | __alloc_pages_nodemask | alloc_pages_current | __page_cache_alloc | __do_page_cache_readahead | ra_submit | ondemand_readahead | | | |--92.13%-- page_cache_async_readahead | | generic_file_aio_read | | xfs_file_aio_read | | do_sync_read | | vfs_read | | SyS_read | | system_call_fastpath | | read | | | --7.87%-- page_cache_sync_readahead | generic_file_aio_read | xfs_file_aio_read | do_sync_read | vfs_read | SyS_read | system_call_fastpath | read |--47.48%-- put_super | drop_super | super_cache_count | shrink_slab | do_try_to_free_pages | try_to_free_pages | __alloc_pages_nodemask | alloc_pages_current | __page_cache_alloc | __do_page_cache_readahead | ra_submit | ondemand_readahead | | | |--92.04%-- page_cache_async_readahead | | generic_file_aio_read | | xfs_file_aio_read | | do_sync_read | | vfs_read | | SyS_read | | system_call_fastpath | | read | | | --7.96%-- page_cache_sync_readahead | generic_file_aio_read | xfs_file_aio_read | do_sync_read | vfs_read | SyS_read | system_call_fastpath | read --4.87%-- [...] The full changeset is attached. Thanks, Fengguang