linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
@ 2024-09-30  2:21 kernel test robot
  2024-09-30  8:21 ` Lorenzo Stoakes
  0 siblings, 1 reply; 13+ messages in thread
From: kernel test robot @ 2024-09-30  2:21 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin,
	oliver.sang



Hello,

kernel test robot noticed a -5.0% regression of aim9.brk_test.ops_per_sec on:


commit: cacded5e42b9609b07b22d80c10f0076d439f7d1 ("mm: avoid using vma_merge() for new VMAs")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: aim9
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
parameters:

	testtime: 300s
	test: brk_test
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202409301043.629bea78-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240930/202409301043.629bea78-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s

commit: 
  fc21959f74 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42 ("mm: avoid using vma_merge() for new VMAs")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1322908            -5.0%    1256536        aim9.brk_test.ops_per_sec
    201.54            +2.9%     207.44        aim9.time.system_time
     97.58            -6.0%      91.75        aim9.time.user_time
      0.04 ± 82%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.10 ± 60%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.04 ± 82%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.10 ± 60%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
  8.33e+08            +3.9%  8.654e+08        perf-stat.i.branch-instructions
      1.15            -0.1        1.09        perf-stat.i.branch-miss-rate%
  12964626            -1.9%   12711922        perf-stat.i.branch-misses
      1.11            -7.4%       1.03        perf-stat.i.cpi
 3.943e+09            +6.0%   4.18e+09        perf-stat.i.instructions
      0.91            +7.9%       0.98        perf-stat.i.ipc
      0.29 ±  2%      -9.1%       0.27 ±  4%  perf-stat.overall.MPKI
      1.56            -0.1        1.47        perf-stat.overall.branch-miss-rate%
      1.08            -6.8%       1.01        perf-stat.overall.cpi
      0.92            +7.2%       0.99        perf-stat.overall.ipc
 8.303e+08            +3.9%  8.627e+08        perf-stat.ps.branch-instructions
  12931205            -2.0%   12678170        perf-stat.ps.branch-misses
  3.93e+09            +6.0%  4.167e+09        perf-stat.ps.instructions
 1.184e+12            +6.1%  1.256e+12        perf-stat.total.instructions
      7.16 ±  2%      -0.4        6.76 ±  4%  perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.brk
      5.72 ±  2%      -0.4        5.35 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
      6.13 ±  2%      -0.3        5.84 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.83 ± 11%      -0.1        0.71 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.6        0.58 ±  5%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
     16.73 ±  2%      +0.6       17.34        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.7        0.66 ±  6%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
     24.21            +0.7       24.90        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     23.33            +0.7       24.05 ±  2%  perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.8        0.82 ±  4%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +0.9        0.87 ±  5%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.1        1.07 ±  9%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.1        1.10 ±  6%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.3        2.26 ±  5%  perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +7.6        7.56 ±  3%  perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +8.6        8.62 ±  4%  perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.74 ±  2%      -0.4        7.30 ±  4%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      5.81 ±  2%      -0.4        5.43 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap_event
      6.18 ±  2%      -0.3        5.88 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap
      3.93            -0.2        3.73 ±  3%  perf-profile.children.cycles-pp.perf_iterate_sb
      0.22 ± 29%      -0.1        0.08 ± 17%  perf-profile.children.cycles-pp.may_expand_vm
      0.96 ±  3%      -0.1        0.83 ±  4%  perf-profile.children.cycles-pp.vma_complete
      0.61 ± 14%      -0.1        0.52 ±  7%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.children.cycles-pp.brk_test
      0.08 ± 11%      +0.0        0.12 ± 14%  perf-profile.children.cycles-pp.mas_prev_setup
      0.17 ± 12%      +0.1        0.27 ± 10%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.00            +0.2        0.15 ± 11%  perf-profile.children.cycles-pp.mas_next_range
      0.19 ±  8%      +0.2        0.38 ± 10%  perf-profile.children.cycles-pp.mas_next_slot
      0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.children.cycles-pp.mas_prev_slot
     23.40            +0.7       24.12 ±  2%  perf-profile.children.cycles-pp.__do_sys_brk
      0.00            +7.6        7.59 ±  3%  perf-profile.children.cycles-pp.vma_expand
      0.00            +8.7        8.66 ±  4%  perf-profile.children.cycles-pp.vma_merge_new_range
      1.61 ± 10%      -0.9        0.69 ±  8%  perf-profile.self.cycles-pp.do_brk_flags
      7.64 ±  2%      -0.4        7.20 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.22 ± 30%      -0.1        0.08 ± 17%  perf-profile.self.cycles-pp.may_expand_vm
      0.57 ± 15%      -0.1        0.46 ±  6%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.self.cycles-pp.brk_test
      0.20 ±  5%      -0.0        0.18 ±  4%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
      0.07 ± 18%      +0.0        0.10 ± 18%  perf-profile.self.cycles-pp.mas_prev_setup
      0.00            +0.1        0.09 ± 12%  perf-profile.self.cycles-pp.mas_next_range
      0.36 ±  8%      +0.1        0.45 ±  6%  perf-profile.self.cycles-pp.perf_event_mmap
      0.15 ± 13%      +0.1        0.25 ± 14%  perf-profile.self.cycles-pp.mas_wr_store_entry
      0.17 ± 11%      +0.2        0.37 ± 11%  perf-profile.self.cycles-pp.mas_next_slot
      0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.self.cycles-pp.mas_prev_slot
      0.00            +0.3        0.33 ±  5%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +0.8        0.81 ±  9%  perf-profile.self.cycles-pp.vma_expand




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-09-30  2:21 [linus:master] [mm] cacded5e42: aim9.brk_test.ops_per_sec -5.0% regression kernel test robot
@ 2024-09-30  8:21 ` Lorenzo Stoakes
  2024-10-08  8:31   ` Oliver Sang
  0 siblings, 1 reply; 13+ messages in thread
From: Lorenzo Stoakes @ 2024-09-30  8:21 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin

On Mon, Sep 30, 2024 at 10:21:27AM GMT, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a -5.0% regression of aim9.brk_test.ops_per_sec on:
>
>
> commit: cacded5e42b9609b07b22d80c10f0076d439f7d1 ("mm: avoid using vma_merge() for new VMAs")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> testcase: aim9
> test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory

Hm, quite an old microarchitecture no?

Would it be possible to try this on a range of uarch's, especially more
recent noes, with some repeated runs to rule out statistical noise? Much
appreciated!

> parameters:
>
> 	testtime: 300s
> 	test: brk_test
> 	cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202409301043.629bea78-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240930/202409301043.629bea78-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s
>
> commit:
>   fc21959f74 ("mm: abstract vma_expand() to use vma_merge_struct")
>   cacded5e42 ("mm: avoid using vma_merge() for new VMAs")

Yup this results in a different code path for brk(), but local testing
indicated no regression (a prior revision of the series had encountered
one, so I carefully assessed this, found the bug, and noted no clear
regression after this - but a lot of variance in the numbers).

>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    1322908            -5.0%    1256536        aim9.brk_test.ops_per_sec

Unfortunate there's no stddev figure here, and 5% feels borderline on noise
- as above it'd be great to get some multiple runs going to rule out
noise. Thanks!

>     201.54            +2.9%     207.44        aim9.time.system_time
>      97.58            -6.0%      91.75        aim9.time.user_time
>       0.04 ± 82%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.10 ± 60%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.04 ± 82%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.10 ± 60%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>   8.33e+08            +3.9%  8.654e+08        perf-stat.i.branch-instructions
>       1.15            -0.1        1.09        perf-stat.i.branch-miss-rate%
>   12964626            -1.9%   12711922        perf-stat.i.branch-misses
>       1.11            -7.4%       1.03        perf-stat.i.cpi
>  3.943e+09            +6.0%   4.18e+09        perf-stat.i.instructions
>       0.91            +7.9%       0.98        perf-stat.i.ipc
>       0.29 ±  2%      -9.1%       0.27 ±  4%  perf-stat.overall.MPKI
>       1.56            -0.1        1.47        perf-stat.overall.branch-miss-rate%
>       1.08            -6.8%       1.01        perf-stat.overall.cpi
>       0.92            +7.2%       0.99        perf-stat.overall.ipc
>  8.303e+08            +3.9%  8.627e+08        perf-stat.ps.branch-instructions
>   12931205            -2.0%   12678170        perf-stat.ps.branch-misses
>   3.93e+09            +6.0%  4.167e+09        perf-stat.ps.instructions
>  1.184e+12            +6.1%  1.256e+12        perf-stat.total.instructions
>       7.16 ±  2%      -0.4        6.76 ±  4%  perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.brk
>       5.72 ±  2%      -0.4        5.35 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
>       6.13 ±  2%      -0.3        5.84 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.83 ± 11%      -0.1        0.71 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.6        0.58 ±  5%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
>      16.73 ±  2%      +0.6       17.34        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.7        0.66 ±  6%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
>      24.21            +0.7       24.90        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>      23.33            +0.7       24.05 ±  2%  perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.8        0.82 ±  4%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +0.9        0.87 ±  5%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +1.1        1.07 ±  9%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.1        1.10 ±  6%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +2.3        2.26 ±  5%  perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +7.6        7.56 ±  3%  perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +8.6        8.62 ±  4%  perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.74 ±  2%      -0.4        7.30 ±  4%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       5.81 ±  2%      -0.4        5.43 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap_event
>       6.18 ±  2%      -0.3        5.88 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap
>       3.93            -0.2        3.73 ±  3%  perf-profile.children.cycles-pp.perf_iterate_sb
>       0.22 ± 29%      -0.1        0.08 ± 17%  perf-profile.children.cycles-pp.may_expand_vm
>       0.96 ±  3%      -0.1        0.83 ±  4%  perf-profile.children.cycles-pp.vma_complete
>       0.61 ± 14%      -0.1        0.52 ±  7%  perf-profile.children.cycles-pp.percpu_counter_add_batch
>       0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.children.cycles-pp.brk_test
>       0.08 ± 11%      +0.0        0.12 ± 14%  perf-profile.children.cycles-pp.mas_prev_setup
>       0.17 ± 12%      +0.1        0.27 ± 10%  perf-profile.children.cycles-pp.mas_wr_store_entry
>       0.00            +0.2        0.15 ± 11%  perf-profile.children.cycles-pp.mas_next_range
>       0.19 ±  8%      +0.2        0.38 ± 10%  perf-profile.children.cycles-pp.mas_next_slot
>       0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.children.cycles-pp.mas_prev_slot
>      23.40            +0.7       24.12 ±  2%  perf-profile.children.cycles-pp.__do_sys_brk
>       0.00            +7.6        7.59 ±  3%  perf-profile.children.cycles-pp.vma_expand
>       0.00            +8.7        8.66 ±  4%  perf-profile.children.cycles-pp.vma_merge_new_range
>       1.61 ± 10%      -0.9        0.69 ±  8%  perf-profile.self.cycles-pp.do_brk_flags
>       7.64 ±  2%      -0.4        7.20 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.22 ± 30%      -0.1        0.08 ± 17%  perf-profile.self.cycles-pp.may_expand_vm
>       0.57 ± 15%      -0.1        0.46 ±  6%  perf-profile.self.cycles-pp.percpu_counter_add_batch
>       0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.self.cycles-pp.brk_test
>       0.20 ±  5%      -0.0        0.18 ±  4%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
>       0.07 ± 18%      +0.0        0.10 ± 18%  perf-profile.self.cycles-pp.mas_prev_setup
>       0.00            +0.1        0.09 ± 12%  perf-profile.self.cycles-pp.mas_next_range
>       0.36 ±  8%      +0.1        0.45 ±  6%  perf-profile.self.cycles-pp.perf_event_mmap
>       0.15 ± 13%      +0.1        0.25 ± 14%  perf-profile.self.cycles-pp.mas_wr_store_entry
>       0.17 ± 11%      +0.2        0.37 ± 11%  perf-profile.self.cycles-pp.mas_next_slot
>       0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.self.cycles-pp.mas_prev_slot
>       0.00            +0.3        0.33 ±  5%  perf-profile.self.cycles-pp.vma_merge_new_range
>       0.00            +0.8        0.81 ±  9%  perf-profile.self.cycles-pp.vma_expand
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>

Overall, previously we special-cased brk() to avoid regression, but the
special-casing is horribly duplicative and bug-prone so, while we can
revert to doing that again, I'd really, really like to avoid it if we
possibly can :)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-09-30  8:21 ` Lorenzo Stoakes
@ 2024-10-08  8:31   ` Oliver Sang
  2024-10-08  8:44     ` Lorenzo Stoakes
  0 siblings, 1 reply; 13+ messages in thread
From: Oliver Sang @ 2024-10-08  8:31 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin,
	oliver.sang

hi, Lorenzo Stoakes,

sorry for late, we are in holidays last week.

On Mon, Sep 30, 2024 at 09:21:52AM +0100, Lorenzo Stoakes wrote:
> On Mon, Sep 30, 2024 at 10:21:27AM GMT, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a -5.0% regression of aim9.brk_test.ops_per_sec on:
> >
> >
> > commit: cacded5e42b9609b07b22d80c10f0076d439f7d1 ("mm: avoid using vma_merge() for new VMAs")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > testcase: aim9
> > test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> 
> Hm, quite an old microarchitecture no?
> 
> Would it be possible to try this on a range of uarch's, especially more
> recent noes, with some repeated runs to rule out statistical noise? Much
> appreciated!

we run this test on below platforms, and observed similar regression.
one thing I want to mention is for performance tests, we run one commit at least
6 times. for this aim9 test, the data is quite stable, so there is no %stddev
value in our table. we won't show this value if it's <2%

(1)

model: Granite Rapids
nr_node: 1
nr_cpu: 240
memory: 192G

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-gnr-1ap1/brk_test/aim9/300s

fc21959f74bc1138 cacded5e42b9609b07b22d80c10
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   3220697            -6.0%    3028867        aim9.brk_test.ops_per_sec


(2)

model: Emerald Rapids
nr_node: 4
nr_cpu: 256
memory: 256G
brand: INTEL(R) XEON(R) PLATINUM 8592+

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-emr-2sp1/brk_test/aim9/300s

fc21959f74bc1138 cacded5e42b9609b07b22d80c10
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   3669298            -6.5%    3430070        aim9.brk_test.ops_per_sec


(3)

model: Sapphire Rapids
nr_node: 2
nr_cpu: 224
memory: 512G
brand: Intel(R) Xeon(R) Platinum 8480CTDX

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/brk_test/aim9/300s

fc21959f74bc1138 cacded5e42b9609b07b22d80c10
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   3540976            -6.4%    3314159        aim9.brk_test.ops_per_sec


(4)

model: Ice Lake
nr_node: 2
nr_cpu: 64
memory: 256G
brand: Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/brk_test/aim9/300s

fc21959f74bc1138 cacded5e42b9609b07b22d80c10
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   2667734            -5.6%    2518021        aim9.brk_test.ops_per_sec


> 
> > parameters:
> >
> > 	testtime: 300s
> > 	test: brk_test
> > 	cpufreq_governor: performance
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202409301043.629bea78-oliver.sang@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20240930/202409301043.629bea78-oliver.sang@intel.com
> >
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
> >   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s
> >
> > commit:
> >   fc21959f74 ("mm: abstract vma_expand() to use vma_merge_struct")
> >   cacded5e42 ("mm: avoid using vma_merge() for new VMAs")
> 
> Yup this results in a different code path for brk(), but local testing
> indicated no regression (a prior revision of the series had encountered
> one, so I carefully assessed this, found the bug, and noted no clear
> regression after this - but a lot of variance in the numbers).
> 
> >
> > fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >    1322908            -5.0%    1256536        aim9.brk_test.ops_per_sec
> 
> Unfortunate there's no stddev figure here, and 5% feels borderline on noise
> - as above it'd be great to get some multiple runs going to rule out
> noise. Thanks!

as above mentioned, the reason there is no %stddev here is it's <2%

just list raw data FYI.

for cacded5e42b9609b07b22d80c10

  "aim9.brk_test.ops_per_sec": [
    1268030.0,
    1277110.76,
    1226452.45,
    1275850.0,
    1249628.35,
    1242148.6
  ],


for fc21959f74bc1138

  "aim9.brk_test.ops_per_sec": [
    1351624.95,
    1316322.79,
    1330363.33,
    1289563.33,
    1314100.0,
    1335475.48
  ],


> 
> >     201.54            +2.9%     207.44        aim9.time.system_time
> >      97.58            -6.0%      91.75        aim9.time.user_time
> >       0.04 ± 82%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> >       0.10 ± 60%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> >       0.04 ± 82%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> >       0.10 ± 60%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> >   8.33e+08            +3.9%  8.654e+08        perf-stat.i.branch-instructions
> >       1.15            -0.1        1.09        perf-stat.i.branch-miss-rate%
> >   12964626            -1.9%   12711922        perf-stat.i.branch-misses
> >       1.11            -7.4%       1.03        perf-stat.i.cpi
> >  3.943e+09            +6.0%   4.18e+09        perf-stat.i.instructions
> >       0.91            +7.9%       0.98        perf-stat.i.ipc
> >       0.29 ±  2%      -9.1%       0.27 ±  4%  perf-stat.overall.MPKI
> >       1.56            -0.1        1.47        perf-stat.overall.branch-miss-rate%
> >       1.08            -6.8%       1.01        perf-stat.overall.cpi
> >       0.92            +7.2%       0.99        perf-stat.overall.ipc
> >  8.303e+08            +3.9%  8.627e+08        perf-stat.ps.branch-instructions
> >   12931205            -2.0%   12678170        perf-stat.ps.branch-misses
> >   3.93e+09            +6.0%  4.167e+09        perf-stat.ps.instructions
> >  1.184e+12            +6.1%  1.256e+12        perf-stat.total.instructions
> >       7.16 ±  2%      -0.4        6.76 ±  4%  perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.brk
> >       5.72 ±  2%      -0.4        5.35 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
> >       6.13 ±  2%      -0.3        5.84 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.83 ± 11%      -0.1        0.71 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.00            +0.6        0.58 ±  5%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
> >      16.73 ±  2%      +0.6       17.34        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
> >       0.00            +0.7        0.66 ±  6%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
> >      24.21            +0.7       24.90        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
> >      23.33            +0.7       24.05 ±  2%  perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
> >       0.00            +0.8        0.82 ±  4%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> >       0.00            +0.9        0.87 ±  5%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
> >       0.00            +1.1        1.07 ±  9%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> >       0.00            +1.1        1.10 ±  6%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> >       0.00            +2.3        2.26 ±  5%  perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> >       0.00            +7.6        7.56 ±  3%  perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
> >       0.00            +8.6        8.62 ±  4%  perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       7.74 ±  2%      -0.4        7.30 ±  4%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> >       5.81 ±  2%      -0.4        5.43 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap_event
> >       6.18 ±  2%      -0.3        5.88 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap
> >       3.93            -0.2        3.73 ±  3%  perf-profile.children.cycles-pp.perf_iterate_sb
> >       0.22 ± 29%      -0.1        0.08 ± 17%  perf-profile.children.cycles-pp.may_expand_vm
> >       0.96 ±  3%      -0.1        0.83 ±  4%  perf-profile.children.cycles-pp.vma_complete
> >       0.61 ± 14%      -0.1        0.52 ±  7%  perf-profile.children.cycles-pp.percpu_counter_add_batch
> >       0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.children.cycles-pp.brk_test
> >       0.08 ± 11%      +0.0        0.12 ± 14%  perf-profile.children.cycles-pp.mas_prev_setup
> >       0.17 ± 12%      +0.1        0.27 ± 10%  perf-profile.children.cycles-pp.mas_wr_store_entry
> >       0.00            +0.2        0.15 ± 11%  perf-profile.children.cycles-pp.mas_next_range
> >       0.19 ±  8%      +0.2        0.38 ± 10%  perf-profile.children.cycles-pp.mas_next_slot
> >       0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.children.cycles-pp.mas_prev_slot
> >      23.40            +0.7       24.12 ±  2%  perf-profile.children.cycles-pp.__do_sys_brk
> >       0.00            +7.6        7.59 ±  3%  perf-profile.children.cycles-pp.vma_expand
> >       0.00            +8.7        8.66 ±  4%  perf-profile.children.cycles-pp.vma_merge_new_range
> >       1.61 ± 10%      -0.9        0.69 ±  8%  perf-profile.self.cycles-pp.do_brk_flags
> >       7.64 ±  2%      -0.4        7.20 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> >       0.22 ± 30%      -0.1        0.08 ± 17%  perf-profile.self.cycles-pp.may_expand_vm
> >       0.57 ± 15%      -0.1        0.46 ±  6%  perf-profile.self.cycles-pp.percpu_counter_add_batch
> >       0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.self.cycles-pp.brk_test
> >       0.20 ±  5%      -0.0        0.18 ±  4%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
> >       0.07 ± 18%      +0.0        0.10 ± 18%  perf-profile.self.cycles-pp.mas_prev_setup
> >       0.00            +0.1        0.09 ± 12%  perf-profile.self.cycles-pp.mas_next_range
> >       0.36 ±  8%      +0.1        0.45 ±  6%  perf-profile.self.cycles-pp.perf_event_mmap
> >       0.15 ± 13%      +0.1        0.25 ± 14%  perf-profile.self.cycles-pp.mas_wr_store_entry
> >       0.17 ± 11%      +0.2        0.37 ± 11%  perf-profile.self.cycles-pp.mas_next_slot
> >       0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.self.cycles-pp.mas_prev_slot
> >       0.00            +0.3        0.33 ±  5%  perf-profile.self.cycles-pp.vma_merge_new_range
> >       0.00            +0.8        0.81 ±  9%  perf-profile.self.cycles-pp.vma_expand
> >
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> >
> 
> Overall, previously we special-cased brk() to avoid regression, but the
> special-casing is horribly duplicative and bug-prone so, while we can
> revert to doing that again, I'd really, really like to avoid it if we
> possibly can :)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-10-08  8:31   ` Oliver Sang
@ 2024-10-08  8:44     ` Lorenzo Stoakes
  2024-10-09  6:44       ` Oliver Sang
  0 siblings, 1 reply; 13+ messages in thread
From: Lorenzo Stoakes @ 2024-10-08  8:44 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin

On Tue, Oct 08, 2024 at 04:31:59PM +0800, Oliver Sang wrote:
> hi, Lorenzo Stoakes,
>
> sorry for late, we are in holidays last week.
>
> On Mon, Sep 30, 2024 at 09:21:52AM +0100, Lorenzo Stoakes wrote:
> > On Mon, Sep 30, 2024 at 10:21:27AM GMT, kernel test robot wrote:
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed a -5.0% regression of aim9.brk_test.ops_per_sec on:
> > >
> > >
> > > commit: cacded5e42b9609b07b22d80c10f0076d439f7d1 ("mm: avoid using vma_merge() for new VMAs")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > testcase: aim9
> > > test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> >
> > Hm, quite an old microarchitecture no?
> >
> > Would it be possible to try this on a range of uarch's, especially more
> > recent noes, with some repeated runs to rule out statistical noise? Much
> > appreciated!
>
> we run this test on below platforms, and observed similar regression.
> one thing I want to mention is for performance tests, we run one commit at least
> 6 times. for this aim9 test, the data is quite stable, so there is no %stddev
> value in our table. we won't show this value if it's <2%

Thanks, though I do suggest going forward it's worth adding the number even
if it's <2% or highlighting that, I found that quite misleading.

Also might I suggest reporting the most recent uarch first? As this seeming
to be ivy bridge only delayed my responding to this (not to sound
ungrateful for the report, which is very useful, but it'd be great if you
guys could test in -next, as this was there for weeks with no apparent
issues).

I will look into this now, if I provide patches would you be able to test
them using the same boxes? It'd be much appreciated!

Thanks, Lorenzo

>
> (1)
>
> model: Granite Rapids
> nr_node: 1
> nr_cpu: 240
> memory: 192G
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-gnr-1ap1/brk_test/aim9/300s
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    3220697            -6.0%    3028867        aim9.brk_test.ops_per_sec
>
>
> (2)
>
> model: Emerald Rapids
> nr_node: 4
> nr_cpu: 256
> memory: 256G
> brand: INTEL(R) XEON(R) PLATINUM 8592+
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-emr-2sp1/brk_test/aim9/300s
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    3669298            -6.5%    3430070        aim9.brk_test.ops_per_sec
>
>
> (3)
>
> model: Sapphire Rapids
> nr_node: 2
> nr_cpu: 224
> memory: 512G
> brand: Intel(R) Xeon(R) Platinum 8480CTDX
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/brk_test/aim9/300s
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    3540976            -6.4%    3314159        aim9.brk_test.ops_per_sec
>
>
> (4)
>
> model: Ice Lake
> nr_node: 2
> nr_cpu: 64
> memory: 256G
> brand: Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/brk_test/aim9/300s
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    2667734            -5.6%    2518021        aim9.brk_test.ops_per_sec
>
>
> >
> > > parameters:
> > >
> > > 	testtime: 300s
> > > 	test: brk_test
> > > 	cpufreq_governor: performance
> > >
> > >
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202409301043.629bea78-oliver.sang@intel.com
> > >
> > >
> > > Details are as below:
> > > -------------------------------------------------------------------------------------------------->
> > >
> > >
> > > The kernel config and materials to reproduce are available at:
> > > https://download.01.org/0day-ci/archive/20240930/202409301043.629bea78-oliver.sang@intel.com
> > >
> > > =========================================================================================
> > > compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
> > >   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s
> > >
> > > commit:
> > >   fc21959f74 ("mm: abstract vma_expand() to use vma_merge_struct")
> > >   cacded5e42 ("mm: avoid using vma_merge() for new VMAs")
> >
> > Yup this results in a different code path for brk(), but local testing
> > indicated no regression (a prior revision of the series had encountered
> > one, so I carefully assessed this, found the bug, and noted no clear
> > regression after this - but a lot of variance in the numbers).
> >
> > >
> > > fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> > > ---------------- ---------------------------
> > >          %stddev     %change         %stddev
> > >              \          |                \
> > >    1322908            -5.0%    1256536        aim9.brk_test.ops_per_sec
> >
> > Unfortunate there's no stddev figure here, and 5% feels borderline on noise
> > - as above it'd be great to get some multiple runs going to rule out
> > noise. Thanks!
>
> as above mentioned, the reason there is no %stddev here is it's <2%
>
> just list raw data FYI.
>
> for cacded5e42b9609b07b22d80c10
>
>   "aim9.brk_test.ops_per_sec": [
>     1268030.0,
>     1277110.76,
>     1226452.45,
>     1275850.0,
>     1249628.35,
>     1242148.6
>   ],
>
>
> for fc21959f74bc1138
>
>   "aim9.brk_test.ops_per_sec": [
>     1351624.95,
>     1316322.79,
>     1330363.33,
>     1289563.33,
>     1314100.0,
>     1335475.48
>   ],
>
>
> >
> > >     201.54            +2.9%     207.44        aim9.time.system_time
> > >      97.58            -6.0%      91.75        aim9.time.user_time
> > >       0.04 ± 82%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> > >       0.10 ± 60%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> > >       0.04 ± 82%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> > >       0.10 ± 60%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
> > >   8.33e+08            +3.9%  8.654e+08        perf-stat.i.branch-instructions
> > >       1.15            -0.1        1.09        perf-stat.i.branch-miss-rate%
> > >   12964626            -1.9%   12711922        perf-stat.i.branch-misses
> > >       1.11            -7.4%       1.03        perf-stat.i.cpi
> > >  3.943e+09            +6.0%   4.18e+09        perf-stat.i.instructions
> > >       0.91            +7.9%       0.98        perf-stat.i.ipc
> > >       0.29 ±  2%      -9.1%       0.27 ±  4%  perf-stat.overall.MPKI
> > >       1.56            -0.1        1.47        perf-stat.overall.branch-miss-rate%
> > >       1.08            -6.8%       1.01        perf-stat.overall.cpi
> > >       0.92            +7.2%       0.99        perf-stat.overall.ipc
> > >  8.303e+08            +3.9%  8.627e+08        perf-stat.ps.branch-instructions
> > >   12931205            -2.0%   12678170        perf-stat.ps.branch-misses
> > >   3.93e+09            +6.0%  4.167e+09        perf-stat.ps.instructions
> > >  1.184e+12            +6.1%  1.256e+12        perf-stat.total.instructions
> > >       7.16 ±  2%      -0.4        6.76 ±  4%  perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.brk
> > >       5.72 ±  2%      -0.4        5.35 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
> > >       6.13 ±  2%      -0.3        5.84 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       0.83 ± 11%      -0.1        0.71 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       0.00            +0.6        0.58 ±  5%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
> > >      16.73 ±  2%      +0.6       17.34        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
> > >       0.00            +0.7        0.66 ±  6%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
> > >      24.21            +0.7       24.90        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
> > >      23.33            +0.7       24.05 ±  2%  perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
> > >       0.00            +0.8        0.82 ±  4%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> > >       0.00            +0.9        0.87 ±  5%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
> > >       0.00            +1.1        1.07 ±  9%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> > >       0.00            +1.1        1.10 ±  6%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> > >       0.00            +2.3        2.26 ±  5%  perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
> > >       0.00            +7.6        7.56 ±  3%  perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
> > >       0.00            +8.6        8.62 ±  4%  perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > >       7.74 ±  2%      -0.4        7.30 ±  4%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> > >       5.81 ±  2%      -0.4        5.43 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap_event
> > >       6.18 ±  2%      -0.3        5.88 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap
> > >       3.93            -0.2        3.73 ±  3%  perf-profile.children.cycles-pp.perf_iterate_sb
> > >       0.22 ± 29%      -0.1        0.08 ± 17%  perf-profile.children.cycles-pp.may_expand_vm
> > >       0.96 ±  3%      -0.1        0.83 ±  4%  perf-profile.children.cycles-pp.vma_complete
> > >       0.61 ± 14%      -0.1        0.52 ±  7%  perf-profile.children.cycles-pp.percpu_counter_add_batch
> > >       0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.children.cycles-pp.brk_test
> > >       0.08 ± 11%      +0.0        0.12 ± 14%  perf-profile.children.cycles-pp.mas_prev_setup
> > >       0.17 ± 12%      +0.1        0.27 ± 10%  perf-profile.children.cycles-pp.mas_wr_store_entry
> > >       0.00            +0.2        0.15 ± 11%  perf-profile.children.cycles-pp.mas_next_range
> > >       0.19 ±  8%      +0.2        0.38 ± 10%  perf-profile.children.cycles-pp.mas_next_slot
> > >       0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.children.cycles-pp.mas_prev_slot
> > >      23.40            +0.7       24.12 ±  2%  perf-profile.children.cycles-pp.__do_sys_brk
> > >       0.00            +7.6        7.59 ±  3%  perf-profile.children.cycles-pp.vma_expand
> > >       0.00            +8.7        8.66 ±  4%  perf-profile.children.cycles-pp.vma_merge_new_range
> > >       1.61 ± 10%      -0.9        0.69 ±  8%  perf-profile.self.cycles-pp.do_brk_flags
> > >       7.64 ±  2%      -0.4        7.20 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> > >       0.22 ± 30%      -0.1        0.08 ± 17%  perf-profile.self.cycles-pp.may_expand_vm
> > >       0.57 ± 15%      -0.1        0.46 ±  6%  perf-profile.self.cycles-pp.percpu_counter_add_batch
> > >       0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.self.cycles-pp.brk_test
> > >       0.20 ±  5%      -0.0        0.18 ±  4%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
> > >       0.07 ± 18%      +0.0        0.10 ± 18%  perf-profile.self.cycles-pp.mas_prev_setup
> > >       0.00            +0.1        0.09 ± 12%  perf-profile.self.cycles-pp.mas_next_range
> > >       0.36 ±  8%      +0.1        0.45 ±  6%  perf-profile.self.cycles-pp.perf_event_mmap
> > >       0.15 ± 13%      +0.1        0.25 ± 14%  perf-profile.self.cycles-pp.mas_wr_store_entry
> > >       0.17 ± 11%      +0.2        0.37 ± 11%  perf-profile.self.cycles-pp.mas_next_slot
> > >       0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.self.cycles-pp.mas_prev_slot
> > >       0.00            +0.3        0.33 ±  5%  perf-profile.self.cycles-pp.vma_merge_new_range
> > >       0.00            +0.8        0.81 ±  9%  perf-profile.self.cycles-pp.vma_expand
> > >
> > >
> > >
> > >
> > > Disclaimer:
> > > Results have been estimated based on internal Intel analysis and are provided
> > > for informational purposes only. Any difference in system hardware or software
> > > design or configuration may affect actual performance.
> > >
> > >
> > > --
> > > 0-DAY CI Kernel Test Service
> > > https://github.com/intel/lkp-tests/wiki
> > >
> >
> > Overall, previously we special-cased brk() to avoid regression, but the
> > special-casing is horribly duplicative and bug-prone so, while we can
> > revert to doing that again, I'd really, really like to avoid it if we
> > possibly can :)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-10-08  8:44     ` Lorenzo Stoakes
@ 2024-10-09  6:44       ` Oliver Sang
  2024-10-09  9:52         ` Lorenzo Stoakes
  2024-10-09 21:24         ` Lorenzo Stoakes
  0 siblings, 2 replies; 13+ messages in thread
From: Oliver Sang @ 2024-10-09  6:44 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin,
	oliver.sang

hi, Lorenzo,

On Tue, Oct 08, 2024 at 09:44:24AM +0100, Lorenzo Stoakes wrote:
> On Tue, Oct 08, 2024 at 04:31:59PM +0800, Oliver Sang wrote:
> > hi, Lorenzo Stoakes,
> >
> > sorry for late, we are in holidays last week.
> >
> > On Mon, Sep 30, 2024 at 09:21:52AM +0100, Lorenzo Stoakes wrote:
> > > On Mon, Sep 30, 2024 at 10:21:27AM GMT, kernel test robot wrote:
> > > >
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed a -5.0% regression of aim9.brk_test.ops_per_sec on:
> > > >
> > > >
> > > > commit: cacded5e42b9609b07b22d80c10f0076d439f7d1 ("mm: avoid using vma_merge() for new VMAs")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > >
> > > > testcase: aim9
> > > > test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > >
> > > Hm, quite an old microarchitecture no?
> > >
> > > Would it be possible to try this on a range of uarch's, especially more
> > > recent noes, with some repeated runs to rule out statistical noise? Much
> > > appreciated!
> >
> > we run this test on below platforms, and observed similar regression.
> > one thing I want to mention is for performance tests, we run one commit at least
> > 6 times. for this aim9 test, the data is quite stable, so there is no %stddev
> > value in our table. we won't show this value if it's <2%
> 
> Thanks, though I do suggest going forward it's worth adding the number even
> if it's <2% or highlighting that, I found that quite misleading.
> 
> Also might I suggest reporting the most recent uarch first? As this seeming
> to be ivy bridge only delayed my responding to this 

we have 80+ testsuite but a reletively small machine pool (due to resource
constraint), the recent uarch machines are used mostly for more popular
testsuites or those easy for us to catch regression per our experience.

unfortunately, the aim9 is only allot to Ivy Bridge as regular tests now.
the data on other platforms I shared with you in last thread are from manual
runs. sorry if this causes any inconvenience.

> (not to sound
> ungrateful for the report, which is very useful, but it'd be great if you
> guys could test in -next, as this was there for weeks with no apparent
> issues).

we don't test a single tree, instead, we merged a lot of trees together to
so-called hourly kernel and test upon it. mainline is stable and is our merge
base for lots of hourly kernels, so it has big chance to be tested and bisect
successfully. -next could also be the merge base some time, but since it's
rebased frequently, hard for us to finish test and bisect in time, some time
we even cannot use it as merge base since various issues. it's really a pity
that we miss issues on -next ...

> 
> I will look into this now, if I provide patches would you be able to test
> them using the same boxes? It'd be much appreciated!

sure! that's our pleasure!

> 
> Thanks, Lorenzo
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-10-09  6:44       ` Oliver Sang
@ 2024-10-09  9:52         ` Lorenzo Stoakes
  2024-10-09 21:24         ` Lorenzo Stoakes
  1 sibling, 0 replies; 13+ messages in thread
From: Lorenzo Stoakes @ 2024-10-09  9:52 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin

On Wed, Oct 09, 2024 at 02:44:30PM +0800, Oliver Sang wrote:
> hi, Lorenzo,
>
> On Tue, Oct 08, 2024 at 09:44:24AM +0100, Lorenzo Stoakes wrote:
> > On Tue, Oct 08, 2024 at 04:31:59PM +0800, Oliver Sang wrote:
> > > hi, Lorenzo Stoakes,
> > >
> > > sorry for late, we are in holidays last week.
> > >
> > > On Mon, Sep 30, 2024 at 09:21:52AM +0100, Lorenzo Stoakes wrote:
> > > > On Mon, Sep 30, 2024 at 10:21:27AM GMT, kernel test robot wrote:
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed a -5.0% regression of aim9.brk_test.ops_per_sec on:
> > > > >
> > > > >
> > > > > commit: cacded5e42b9609b07b22d80c10f0076d439f7d1 ("mm: avoid using vma_merge() for new VMAs")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > >
> > > > > testcase: aim9
> > > > > test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
> > > >
> > > > Hm, quite an old microarchitecture no?
> > > >
> > > > Would it be possible to try this on a range of uarch's, especially more
> > > > recent noes, with some repeated runs to rule out statistical noise? Much
> > > > appreciated!
> > >
> > > we run this test on below platforms, and observed similar regression.
> > > one thing I want to mention is for performance tests, we run one commit at least
> > > 6 times. for this aim9 test, the data is quite stable, so there is no %stddev
> > > value in our table. we won't show this value if it's <2%
> >
> > Thanks, though I do suggest going forward it's worth adding the number even
> > if it's <2% or highlighting that, I found that quite misleading.
> >
> > Also might I suggest reporting the most recent uarch first? As this seeming
> > to be ivy bridge only delayed my responding to this
>
> we have 80+ testsuite but a reletively small machine pool (due to resource
> constraint), the recent uarch machines are used mostly for more popular
> testsuites or those easy for us to catch regression per our experience.
>
> unfortunately, the aim9 is only allot to Ivy Bridge as regular tests now.
> the data on other platforms I shared with you in last thread are from manual
> runs. sorry if this causes any inconvenience.

Understood, sorry I realise you are providing this service for free and
again to reiterate - I'm hugely grateful and glad you helped spot this
problem which I will now address! :)

>
> > (not to sound
> > ungrateful for the report, which is very useful, but it'd be great if you
> > guys could test in -next, as this was there for weeks with no apparent
> > issues).
>
> we don't test a single tree, instead, we merged a lot of trees together to
> so-called hourly kernel and test upon it. mainline is stable and is our merge
> base for lots of hourly kernels, so it has big chance to be tested and bisect
> successfully. -next could also be the merge base some time, but since it's
> rebased frequently, hard for us to finish test and bisect in time, some time
> we even cannot use it as merge base since various issues. it's really a pity
> that we miss issues on -next ...

Sure and I guess from my perspective it can be easy to underestimate the
combinatorial explosion of that.

It'd obviously be a nice-to-have for you to be able to take into account
-next but absolutely get it! :)

>
> >
> > I will look into this now, if I provide patches would you be able to test
> > them using the same boxes? It'd be much appreciated!
>
> sure! that's our pleasure!

Perfect, thanks very much!

>
> >
> > Thanks, Lorenzo
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-10-09  6:44       ` Oliver Sang
  2024-10-09  9:52         ` Lorenzo Stoakes
@ 2024-10-09 21:24         ` Lorenzo Stoakes
  2024-10-11  2:46           ` Oliver Sang
  1 sibling, 1 reply; 13+ messages in thread
From: Lorenzo Stoakes @ 2024-10-09 21:24 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin

On Wed, Oct 09, 2024 at 02:44:30PM +0800, Oliver Sang wrote:
[snip]
> >
> > I will look into this now, if I provide patches would you be able to test
> > them using the same boxes? It'd be much appreciated!
>
> sure! that's our pleasure!
>

Hi Oliver,

Thanks so much for this, could you give the below a try? I've not tried to
seriously test it locally yet, so it'd be good to set your test machines on
it.

If this doesn't help it suggests call stack/branching might be a thing here
in which case I have other approaches I can take before we have to
duplicate this code.

This patch is against the mm-unstable branch in Andrew's tree [0] but
hopefully should apply fine to Linus's too.

[0]:https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/

Thanks again!

Best, Lorenzo


----8<----
From 7eb4aa421b357668bc44405c58b0444abf44334a Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Wed, 9 Oct 2024 21:57:03 +0100
Subject: [PATCH] mm: explicitly enable an expand-only merge mode for brk()

Try to do less work on brk() to improve perf.
---
 mm/mmap.c |  1 +
 mm/vma.c  | 25 ++++++++++++++++---------
 mm/vma.h  | 11 +++++++++++
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 02f7b45c3076..c2c68ef45a3b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1740,6 +1740,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	if (vma && vma->vm_end == addr) {
 		VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));

+		vmg.mode = VMA_MERGE_MODE_EXPAND_ONLY;
 		vmg.prev = vma;
 		vma_iter_next_range(vmi);

diff --git a/mm/vma.c b/mm/vma.c
index 749c4881fd60..f525a0750c41 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -561,6 +561,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 	unsigned long end = vmg->end;
 	pgoff_t pgoff = vmg->pgoff;
 	pgoff_t pglen = PHYS_PFN(end - start);
+	bool expand_only = vmg_mode_expand_only(vmg);
 	bool can_merge_left, can_merge_right;

 	mmap_assert_write_locked(vmg->mm);
@@ -575,7 +576,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 		return NULL;

 	can_merge_left = can_vma_merge_left(vmg);
-	can_merge_right = can_vma_merge_right(vmg, can_merge_left);
+	can_merge_right = !expand_only && can_vma_merge_right(vmg, can_merge_left);

 	/* If we can merge with the next VMA, adjust vmg accordingly. */
 	if (can_merge_right) {
@@ -603,13 +604,18 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 		return vmg->vma;
 	}

-	/* If expansion failed, reset state. Allows us to retry merge later. */
-	vmg->vma = NULL;
-	vmg->start = start;
-	vmg->end = end;
-	vmg->pgoff = pgoff;
-	if (vmg->vma == prev)
-		vma_iter_set(vmg->vmi, start);
+	/*
+	 * Unless in expand only case and expansion failed, reset state.
+	 * Allows us to retry merge later.
+	 */
+	if (!expand_only) {
+		vmg->vma = NULL;
+		vmg->start = start;
+		vmg->end = end;
+		vmg->pgoff = pgoff;
+		if (vmg->vma == prev)
+			vma_iter_set(vmg->vmi, start);
+	}

 	return NULL;
 }
@@ -641,7 +647,8 @@ int vma_expand(struct vma_merge_struct *vmg)
 	mmap_assert_write_locked(vmg->mm);

 	vma_start_write(vma);
-	if (next && (vma != next) && (vmg->end == next->vm_end)) {
+	if (!vmg_mode_expand_only(vmg) && next &&
+	    (vma != next) && (vmg->end == next->vm_end)) {
 		int ret;

 		remove_next = true;
diff --git a/mm/vma.h b/mm/vma.h
index 82354fe5edd0..14224b36a979 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -52,6 +52,11 @@ struct vma_munmap_struct {
 	unsigned long data_vm;
 };

+enum vma_merge_mode {
+	VMA_MERGE_MODE_NORMAL,
+	VMA_MERGE_MODE_EXPAND_ONLY,
+};
+
 enum vma_merge_state {
 	VMA_MERGE_START,
 	VMA_MERGE_ERROR_NOMEM,
@@ -75,9 +80,15 @@ struct vma_merge_struct {
 	struct mempolicy *policy;
 	struct vm_userfaultfd_ctx uffd_ctx;
 	struct anon_vma_name *anon_name;
+	enum vma_merge_mode mode;
 	enum vma_merge_state state;
 };

+static inline bool vmg_mode_expand_only(struct vma_merge_struct *vmg)
+{
+	return vmg->mode == VMA_MERGE_MODE_EXPAND_ONLY;
+}
+
 static inline bool vmg_nomem(struct vma_merge_struct *vmg)
 {
 	return vmg->state == VMA_MERGE_ERROR_NOMEM;
--
2.46.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-10-09 21:24         ` Lorenzo Stoakes
@ 2024-10-11  2:46           ` Oliver Sang
  2024-10-11  7:26             ` Lorenzo Stoakes
  0 siblings, 1 reply; 13+ messages in thread
From: Oliver Sang @ 2024-10-11  2:46 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin,
	oliver.sang

[-- Attachment #1: Type: text/plain, Size: 7171 bytes --]

hi, Lorenzo,

On Wed, Oct 09, 2024 at 10:24:58PM +0100, Lorenzo Stoakes wrote:
> On Wed, Oct 09, 2024 at 02:44:30PM +0800, Oliver Sang wrote:
> [snip]
> > >
> > > I will look into this now, if I provide patches would you be able to test
> > > them using the same boxes? It'd be much appreciated!
> >
> > sure! that's our pleasure!
> >
> 
> Hi Oliver,
> 
> Thanks so much for this, could you give the below a try? I've not tried to
> seriously test it locally yet, so it'd be good to set your test machines on
> it.
> 
> If this doesn't help it suggests call stack/branching might be a thing here
> in which case I have other approaches I can take before we have to
> duplicate this code.
> 
> This patch is against the mm-unstable branch in Andrew's tree [0] but
> hopefully should apply fine to Linus's too.
> 
> [0]:https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/
> 
> Thanks again!

you are welcome!

I found the patch could be applied directly on cacded5e42, so I did it.
this is our normal practice that we want to avoid impacts from other commits.

but if your patch should reply on some new patches in mm-unstable or mainline,
please let me know. I could reapply and retest.

I mentioned patch base since I found by my applyment upon cacded5e42, your
patch seems not have obvious performance impact, still have similar regression.

for brief, I just list 2 examples here. all tests and full data are attached
as fc21959f74bc11-cacded5e42b960-2e71337ac26478

(1)

model: Sapphire Rapids
nr_node: 2
nr_cpu: 224
memory: 512G
brand: Intel(R) Xeon(R) Platinum 8480CTDX

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3540976            -6.4%    3314159            -6.7%    3302864        aim9.brk_test.ops_per_sec


(2) which is using same Ivy Bridge-EP in our original report
(test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory)

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   1322908            -5.0%    1256536            -4.1%    1268145        aim9.brk_test.ops_per_sec

> 
> Best, Lorenzo
> 
> 
> ----8<----
> From 7eb4aa421b357668bc44405c58b0444abf44334a Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Date: Wed, 9 Oct 2024 21:57:03 +0100
> Subject: [PATCH] mm: explicitly enable an expand-only merge mode for brk()
> 
> Try to do less work on brk() to improve perf.
> ---
>  mm/mmap.c |  1 +
>  mm/vma.c  | 25 ++++++++++++++++---------
>  mm/vma.h  | 11 +++++++++++
>  3 files changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 02f7b45c3076..c2c68ef45a3b 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1740,6 +1740,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  	if (vma && vma->vm_end == addr) {
>  		VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
> 
> +		vmg.mode = VMA_MERGE_MODE_EXPAND_ONLY;
>  		vmg.prev = vma;
>  		vma_iter_next_range(vmi);
> 
> diff --git a/mm/vma.c b/mm/vma.c
> index 749c4881fd60..f525a0750c41 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -561,6 +561,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  	unsigned long end = vmg->end;
>  	pgoff_t pgoff = vmg->pgoff;
>  	pgoff_t pglen = PHYS_PFN(end - start);
> +	bool expand_only = vmg_mode_expand_only(vmg);
>  	bool can_merge_left, can_merge_right;
> 
>  	mmap_assert_write_locked(vmg->mm);
> @@ -575,7 +576,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  		return NULL;
> 
>  	can_merge_left = can_vma_merge_left(vmg);
> -	can_merge_right = can_vma_merge_right(vmg, can_merge_left);
> +	can_merge_right = !expand_only && can_vma_merge_right(vmg, can_merge_left);
> 
>  	/* If we can merge with the next VMA, adjust vmg accordingly. */
>  	if (can_merge_right) {
> @@ -603,13 +604,18 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  		return vmg->vma;
>  	}
> 
> -	/* If expansion failed, reset state. Allows us to retry merge later. */
> -	vmg->vma = NULL;
> -	vmg->start = start;
> -	vmg->end = end;
> -	vmg->pgoff = pgoff;
> -	if (vmg->vma == prev)
> -		vma_iter_set(vmg->vmi, start);
> +	/*
> +	 * Unless in expand only case and expansion failed, reset state.
> +	 * Allows us to retry merge later.
> +	 */
> +	if (!expand_only) {
> +		vmg->vma = NULL;
> +		vmg->start = start;
> +		vmg->end = end;
> +		vmg->pgoff = pgoff;
> +		if (vmg->vma == prev)
> +			vma_iter_set(vmg->vmi, start);
> +	}
> 
>  	return NULL;
>  }
> @@ -641,7 +647,8 @@ int vma_expand(struct vma_merge_struct *vmg)
>  	mmap_assert_write_locked(vmg->mm);
> 
>  	vma_start_write(vma);
> -	if (next && (vma != next) && (vmg->end == next->vm_end)) {
> +	if (!vmg_mode_expand_only(vmg) && next &&
> +	    (vma != next) && (vmg->end == next->vm_end)) {
>  		int ret;
> 
>  		remove_next = true;
> diff --git a/mm/vma.h b/mm/vma.h
> index 82354fe5edd0..14224b36a979 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -52,6 +52,11 @@ struct vma_munmap_struct {
>  	unsigned long data_vm;
>  };
> 
> +enum vma_merge_mode {
> +	VMA_MERGE_MODE_NORMAL,
> +	VMA_MERGE_MODE_EXPAND_ONLY,
> +};
> +
>  enum vma_merge_state {
>  	VMA_MERGE_START,
>  	VMA_MERGE_ERROR_NOMEM,
> @@ -75,9 +80,15 @@ struct vma_merge_struct {
>  	struct mempolicy *policy;
>  	struct vm_userfaultfd_ctx uffd_ctx;
>  	struct anon_vma_name *anon_name;
> +	enum vma_merge_mode mode;
>  	enum vma_merge_state state;
>  };
> 
> +static inline bool vmg_mode_expand_only(struct vma_merge_struct *vmg)
> +{
> +	return vmg->mode == VMA_MERGE_MODE_EXPAND_ONLY;
> +}
> +
>  static inline bool vmg_nomem(struct vma_merge_struct *vmg)
>  {
>  	return vmg->state == VMA_MERGE_ERROR_NOMEM;
> --
> 2.46.2

[-- Attachment #2: fc21959f74bc11-cacded5e42b960-2e71337ac26478 --]
[-- Type: text/plain, Size: 100830 bytes --]

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-gnr-1ap1/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3220697            -6.0%    3028867            -6.4%    3014713        aim9.brk_test.ops_per_sec
     24.58            -3.9%      23.63            -5.5%      23.24        time.user_time
    119459            -3.2%     115601            -2.9%     115971        proc-vmstat.nr_active_anon
    120943            -3.2%     117079            -2.9%     117450        proc-vmstat.nr_shmem
    119459            -3.2%     115601            -2.9%     115971        proc-vmstat.nr_zone_active_anon
      0.02 ±120%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      3.27 ±  5%   +5112.4%     170.40 ±218%   +5144.5%     171.45 ±216%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.20 ±188%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.01 ± 70%    +100.0%       0.01 ± 84%   +3512.9%       0.19 ±199%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.93 ± 16%      -4.1%       0.89 ± 14%     -25.0%       0.70 ± 11%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.02 ±120%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.20 ±188%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.01 ± 70%    +100.0%       0.01 ± 84%   +3512.9%       0.19 ±199%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      0.02 ±  2%      -4.1%       0.02 ±  2%      -6.3%       0.02 ±  4%  perf-stat.i.MPKI
 1.767e+09            +4.2%  1.841e+09            +3.7%  1.833e+09        perf-stat.i.branch-instructions
      0.45            -6.2%       0.42            -5.9%       0.42        perf-stat.i.cpi
 8.347e+09            +6.6%    8.9e+09            +6.2%  8.863e+09        perf-stat.i.instructions
      2.27            +6.6%       2.42            +6.0%       2.41        perf-stat.i.ipc
      0.03 ±  4%      -2.0%       0.03 ±  3%      -7.8%       0.03 ±  4%  perf-stat.overall.MPKI
      0.44            -5.9%       0.42            -5.4%       0.42        perf-stat.overall.cpi
      2.25            +6.2%       2.39            +5.7%       2.38        perf-stat.overall.ipc
 1.761e+09            +4.2%  1.834e+09            +3.7%  1.827e+09        perf-stat.ps.branch-instructions
 8.319e+09            +6.6%   8.87e+09            +6.2%  8.834e+09        perf-stat.ps.instructions
 2.519e+12            +6.4%   2.68e+12            +5.8%  2.665e+12        perf-stat.total.instructions
      7.07            -7.1        0.00            -7.1        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.30            -6.3        0.00            -6.3        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     18.35            -1.0       17.36            -1.4       16.92        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.40            -0.9       15.47            -1.3       15.05        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
     10.17            -0.8        9.36            -1.2        8.93        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
     11.92            -0.8       11.12            -1.3       10.64        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      5.07 ±  3%      -0.2        4.84 ±  2%      -0.2        4.88 ±  3%  perf-profile.calltrace.cycles-pp.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.40 ±  3%      -0.2        5.18 ±  2%      -0.1        5.28 ±  3%  perf-profile.calltrace.cycles-pp.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      3.66 ±  2%      -0.2        3.50 ±  2%      -0.1        3.52 ±  3%  perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64
      0.60 ±  5%      -0.1        0.46 ± 45%      -0.2        0.36 ± 70%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.brk
      1.66 ±  2%      -0.1        1.56 ±  3%      -0.1        1.60 ±  2%  perf-profile.calltrace.cycles-pp.up_write.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.68 ±  3%      -0.1        0.60 ±  5%      -0.1        0.60 ±  5%  perf-profile.calltrace.cycles-pp.kfree.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      5.91 ±  2%      -0.1        5.85            -0.4        5.49        perf-profile.calltrace.cycles-pp.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.97 ±  4%      -0.1        0.91 ±  4%      -0.1        0.91 ±  3%  perf-profile.calltrace.cycles-pp.mas_next_slot.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.23 ±  2%      -0.0        4.21            -0.4        3.82        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.37 ± 70%      +0.3        0.67 ±  4%      +0.2        0.57 ± 44%  perf-profile.calltrace.cycles-pp.strlen.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      0.00            +0.5        0.47 ± 44%      +0.6        0.58 ±  5%  perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.49 ± 44%      +0.5        1.02 ±  5%      +0.5        1.02 ±  8%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     83.74            +0.5       84.28            +0.6       84.32        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.6        0.60 ±  6%      +0.6        0.58 ±  7%  perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.7        0.65 ±  7%      +0.7        0.66 ±  4%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.7        0.68 ±  4%      +0.7        0.67 ±  8%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.7        0.68 ±  2%      +0.8        0.80 ±  4%  perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     80.24            +0.7       80.95            +0.7       80.98        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.7        0.74 ±  2%      +0.8        0.76 ±  2%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +0.8        0.75 ±  4%      +0.8        0.81 ±  5%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.8        0.81 ±  3%      +0.7        0.69 ±  7%  perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.8        0.84 ±  5%      +0.8        0.84 ±  6%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.3        1.30 ±  5%      +1.3        1.32 ±  2%  perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +1.4        1.35 ±  4%      +1.3        1.32 ±  4%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.6        1.60 ±  4%      +1.6        1.56 ±  4%  perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.8        1.76 ±  2%      +1.9        1.86 ±  2%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
      0.00            +1.8        1.78 ±  2%      +1.6        1.64 ±  4%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.0        2.03            +2.0        2.04 ±  2%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +2.1        2.06 ±  3%      +2.1        2.06 ±  4%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.3        2.29 ±  3%      +2.4        2.37 ±  2%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
     53.64            +2.6       56.21            +2.6       56.28        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +3.1        3.14 ±  2%      +3.1        3.10 ±  5%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +3.2        3.25            +3.6        3.64 ±  3%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +3.8        3.84            +3.9        3.86 ±  2%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +5.3        5.31 ±  2%      +5.7        5.67 ±  2%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +6.1        6.07            +6.4        6.41        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00           +27.7       27.74           +28.3       28.33        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00           +32.4       32.43           +33.0       33.02        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     18.49            -1.0       17.47            -1.5       17.01        perf-profile.children.cycles-pp.perf_event_mmap
      6.54            -1.0        5.54 ±  2%      -0.7        5.90 ±  2%  perf-profile.children.cycles-pp.mas_preallocate
      7.40            -1.0        6.40 ±  2%      -0.6        6.76        perf-profile.children.cycles-pp.mas_store_prealloc
      5.68            -1.0        4.72            -1.0        4.66 ±  3%  perf-profile.children.cycles-pp.up_write
     16.88            -0.9       15.93            -1.4       15.51        perf-profile.children.cycles-pp.perf_event_mmap_event
     10.35            -0.8        9.53            -1.2        9.10        perf-profile.children.cycles-pp.perf_event_mmap_output
     12.16            -0.8       11.35            -1.3       10.86        perf-profile.children.cycles-pp.perf_iterate_sb
      4.02 ±  2%      -0.7        3.32            -0.3        3.72 ±  3%  perf-profile.children.cycles-pp.mas_wr_store_type
      2.97            -0.6        2.37 ±  3%      -0.5        2.45 ±  2%  perf-profile.children.cycles-pp.mas_update_gap
      1.36 ±  8%      -0.6        0.80 ±  4%      -0.5        0.86 ±  4%  perf-profile.children.cycles-pp.can_vma_merge_after
      2.26 ±  2%      -0.5        1.80 ±  2%      -0.4        1.89 ±  2%  perf-profile.children.cycles-pp.mas_leaf_max_gap
      3.71 ±  2%      -0.3        3.44            -0.3        3.42 ±  4%  perf-profile.children.cycles-pp.vma_complete
      5.62 ±  3%      -0.2        5.40 ±  2%      -0.1        5.51 ±  3%  perf-profile.children.cycles-pp.check_brk_limits
      3.83 ±  2%      -0.2        3.65 ±  2%      -0.2        3.67 ±  3%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
      0.66 ±  7%      -0.1        0.55 ±  9%      -0.1        0.54 ±  6%  perf-profile.children.cycles-pp.may_expand_vm
      1.98 ±  3%      -0.1        1.86 ±  2%      -0.3        1.71 ±  4%  perf-profile.children.cycles-pp.init_multi_vma_prep
      0.78 ±  3%      -0.1        0.69 ±  4%      -0.1        0.69 ±  5%  perf-profile.children.cycles-pp.kfree
      0.15 ± 12%      -0.1        0.08 ± 13%      -0.1        0.07 ± 23%  perf-profile.children.cycles-pp.arch_vma_name
      6.23 ±  2%      -0.1        6.17            -0.4        5.78        perf-profile.children.cycles-pp.mas_find
      0.60 ±  6%      -0.1        0.54 ±  8%      -0.1        0.51 ±  7%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      4.32 ±  2%      -0.0        4.30            -0.4        3.92        perf-profile.children.cycles-pp.mas_walk
      0.20 ±  8%      -0.0        0.17 ±  7%      -0.0        0.15 ± 14%  perf-profile.children.cycles-pp.__x64_sys_brk
      0.26 ±  5%      -0.0        0.24 ±  9%      -0.0        0.22 ±  9%  perf-profile.children.cycles-pp.__rb_insert_augmented
      0.23 ±  7%      +0.0        0.24 ± 18%      +0.0        0.27 ±  6%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.24 ±  7%      +0.0        0.24 ± 18%      +0.0        0.27 ±  5%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.58 ±  7%      +0.1        0.66 ±  7%      +0.1        0.67 ±  2%  perf-profile.children.cycles-pp.mas_wr_slot_store
      0.19 ± 10%      +0.1        0.31 ± 10%      +0.1        0.32 ± 16%  perf-profile.children.cycles-pp.rb_next
      0.50 ±  4%      +0.1        0.62 ±  7%      +0.2        0.66 ±  7%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.40 ±  6%      +0.1        0.53 ±  6%      +0.1        0.54 ±  6%  perf-profile.children.cycles-pp.strnlen
      0.58 ± 13%      +0.2        0.75 ±  4%      +0.1        0.72 ± 13%  perf-profile.children.cycles-pp.strlen
      0.96 ±  6%      +0.2        1.14 ±  3%      +0.2        1.16 ±  3%  perf-profile.children.cycles-pp.rcu_all_qs
      0.68 ±  3%      +0.3        0.98 ±  5%      +0.3        1.01 ±  6%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
      1.77 ±  4%      +0.3        2.09            +0.3        2.08 ±  4%  perf-profile.children.cycles-pp.__cond_resched
      0.00            +0.4        0.36 ±  9%      +0.4        0.36 ±  8%  perf-profile.children.cycles-pp.mas_next_setup
      0.36 ±  8%      +0.4        0.76 ±  3%      +0.4        0.76 ±  9%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.48 ±  7%      +0.4        0.90 ±  6%      +0.4        0.86 ±  4%  perf-profile.children.cycles-pp.mas_prev_setup
     84.69            +0.5       85.19            +0.6       85.27        perf-profile.children.cycles-pp.do_syscall_64
      0.67 ±  9%      +0.6        1.24 ±  4%      +0.6        1.27 ±  7%  perf-profile.children.cycles-pp.__vm_enough_memory
      3.81            +0.6        4.39            +0.6        4.40 ±  3%  perf-profile.children.cycles-pp.down_write
     80.98            +0.7       81.64            +0.7       81.71        perf-profile.children.cycles-pp.__do_sys_brk
      1.05 ±  4%      +0.7        1.72 ±  3%      +0.8        1.82 ±  3%  perf-profile.children.cycles-pp.mas_next_slot
      0.00            +0.7        0.70 ±  6%      +0.7        0.69 ±  5%  perf-profile.children.cycles-pp.mas_next_range
      1.11 ±  4%      +1.0        2.10 ±  3%      +0.8        1.92 ±  3%  perf-profile.children.cycles-pp.mas_prev
      2.82 ±  3%      +1.2        4.07            +1.3        4.09 ±  2%  perf-profile.children.cycles-pp.vma_prepare
      1.54 ±  4%      +1.3        2.88 ±  3%      +1.3        2.84 ±  3%  perf-profile.children.cycles-pp.mas_prev_slot
     54.97            +1.6       56.61            +1.8       56.79        perf-profile.children.cycles-pp.do_brk_flags
      0.00           +28.6       28.64           +29.2       29.16        perf-profile.children.cycles-pp.vma_expand
      0.00           +32.9       32.91           +33.4       33.37        perf-profile.children.cycles-pp.vma_merge_new_range
      5.90 ±  2%      -3.5        2.37 ±  4%      -3.4        2.55        perf-profile.self.cycles-pp.do_brk_flags
      5.36 ±  2%      -1.0        4.38            -1.0        4.36 ±  3%  perf-profile.self.cycles-pp.up_write
     10.18            -0.8        9.36            -1.2        8.94        perf-profile.self.cycles-pp.perf_event_mmap_output
      3.86 ±  2%      -0.7        3.18            -0.3        3.57 ±  3%  perf-profile.self.cycles-pp.mas_wr_store_type
      1.28 ±  7%      -0.5        0.74 ±  4%      -0.5        0.78 ±  4%  perf-profile.self.cycles-pp.can_vma_merge_after
      3.02 ±  2%      -0.5        2.52 ±  4%      -0.3        2.75 ±  3%  perf-profile.self.cycles-pp.mas_store_prealloc
      2.19 ±  2%      -0.4        1.78 ±  2%      -0.3        1.87 ±  2%  perf-profile.self.cycles-pp.mas_leaf_max_gap
      5.03            -0.4        4.67            -0.1        4.92 ±  2%  perf-profile.self.cycles-pp.__do_sys_brk
      2.60 ±  4%      -0.3        2.27 ±  5%      -0.3        2.25 ±  2%  perf-profile.self.cycles-pp.mas_preallocate
      1.89 ±  4%      -0.3        1.59 ±  4%      -0.3        1.62 ±  5%  perf-profile.self.cycles-pp.perf_event_mmap_event
      1.71 ±  4%      -0.2        1.53 ±  3%      -0.2        1.50 ±  5%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.74 ±  3%      -0.2        0.57 ±  7%      -0.2        0.56 ±  6%  perf-profile.self.cycles-pp.mas_update_gap
      1.89 ±  4%      -0.2        1.73 ±  2%      -0.3        1.62 ±  4%  perf-profile.self.cycles-pp.init_multi_vma_prep
      1.58 ±  4%      -0.1        1.47 ±  3%      -0.2        1.42 ±  4%  perf-profile.self.cycles-pp.perf_event_mmap
      1.27 ±  2%      -0.1        1.16 ±  2%      -0.1        1.19 ±  9%  perf-profile.self.cycles-pp.vma_complete
      1.20 ±  2%      -0.1        1.12 ±  4%      -0.1        1.08 ±  2%  perf-profile.self.cycles-pp.do_syscall_64
      0.69 ±  2%      -0.1        0.61 ±  4%      -0.1        0.60 ±  5%  perf-profile.self.cycles-pp.kfree
      0.60 ±  6%      -0.1        0.54 ±  8%      -0.1        0.51 ±  7%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      1.60 ±  2%      -0.1        1.56 ±  6%      -0.1        1.55        perf-profile.self.cycles-pp.down_write_killable
      4.24 ±  2%      -0.0        4.20            -0.4        3.84        perf-profile.self.cycles-pp.mas_walk
      0.50 ±  7%      -0.0        0.47 ±  9%      -0.1        0.45 ±  5%  perf-profile.self.cycles-pp.may_expand_vm
      0.42 ±  4%      +0.1        0.49 ±  7%      +0.1        0.53 ±  6%  perf-profile.self.cycles-pp.mas_wr_store_entry
      0.15 ± 10%      +0.1        0.24 ± 11%      +0.1        0.23 ± 19%  perf-profile.self.cycles-pp.rb_next
      0.58 ±  8%      +0.1        0.68 ±  5%      +0.1        0.71 ±  6%  perf-profile.self.cycles-pp.rcu_all_qs
      0.37 ±  5%      +0.1        0.50 ±  7%      +0.1        0.50 ±  4%  perf-profile.self.cycles-pp.strnlen
      0.54 ± 13%      +0.2        0.68 ±  4%      +0.1        0.64 ± 14%  perf-profile.self.cycles-pp.strlen
      1.01 ±  6%      +0.2        1.17 ±  2%      +0.1        1.11 ±  4%  perf-profile.self.cycles-pp.__cond_resched
      0.66 ±  6%      +0.2        0.83 ±  2%      +0.2        0.84 ±  3%  perf-profile.self.cycles-pp.vma_prepare
      0.46 ±  6%      +0.2        0.67 ±  7%      +0.2        0.70 ±  5%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
      0.34 ± 12%      +0.2        0.54 ±  3%      +0.2        0.58 ±  9%  perf-profile.self.cycles-pp.__vm_enough_memory
      0.00            +0.3        0.29 ± 10%      +0.3        0.28 ± 11%  perf-profile.self.cycles-pp.mas_next_setup
      0.32 ± 11%      +0.3        0.62 ±  7%      +0.3        0.58 ±  8%  perf-profile.self.cycles-pp.mas_prev_setup
      0.23 ±  7%      +0.3        0.55 ±  6%      +0.3        0.55 ± 10%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.00            +0.4        0.35 ±  7%      +0.3        0.30 ±  7%  perf-profile.self.cycles-pp.mas_next_range
      2.65 ±  3%      +0.4        3.00 ±  2%      +0.4        3.05 ±  2%  perf-profile.self.cycles-pp.down_write
      0.64 ±  5%      +0.6        1.21 ±  3%      +0.5        1.12 ±  5%  perf-profile.self.cycles-pp.mas_prev
      0.89 ±  5%      +0.7        1.54 ±  3%      +0.8        1.64 ±  4%  perf-profile.self.cycles-pp.mas_next_slot
      1.46 ±  4%      +1.3        2.72 ±  3%      +1.2        2.70 ±  3%  perf-profile.self.cycles-pp.mas_prev_slot
      0.00            +1.3        1.33 ±  2%      +1.2        1.16 ±  4%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +3.5        3.54 ±  3%      +3.6        3.55 ±  2%  perf-profile.self.cycles-pp.vma_expand

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-emr-2sp1/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3669298            -6.5%    3430070            -6.8%    3420919        aim9.brk_test.ops_per_sec
     23.53            -4.9%      22.38            -6.4%      22.03        time.user_time
    491107 ±  5%      -7.2%     455906 ±  6%      -5.5%     464283 ±  6%  meminfo.Active
    491011 ±  5%      -7.2%     455810 ±  6%      -5.5%     464155 ±  6%  meminfo.Active(anon)
    505666 ±  5%      -7.0%     470410 ±  5%      -5.3%     478879 ±  6%  meminfo.Shmem
    122753 ±  5%      -7.1%     113979 ±  6%      -5.5%     116019 ±  6%  proc-vmstat.nr_active_anon
    899298            -1.0%     890515            -0.7%     892592        proc-vmstat.nr_file_pages
    126417 ±  5%      -6.9%     117634 ±  5%      -5.3%     119701 ±  6%  proc-vmstat.nr_shmem
    122753 ±  5%      -7.1%     113979 ±  6%      -5.5%     116019 ±  6%  proc-vmstat.nr_zone_active_anon
    595.50 ± 22%     +53.6%     914.50 ± 12%     +17.1%     697.33 ± 20%  proc-vmstat.numa_hint_faults_local
     17958            -4.3%      17188 ±  2%      -4.3%      17180 ±  2%  proc-vmstat.pgactivate
   1817569 ± 69%     -43.1%    1035076 ±127%     +63.5%    2972153 ±  3%  numa-meminfo.node0.FilePages
     16515 ± 73%     -29.4%      11657 ±108%     +79.5%      29650        numa-meminfo.node0.Mapped
   1811617 ± 69%     -43.2%    1029482 ±128%     +63.8%    2967951 ±  3%  numa-meminfo.node0.Unevictable
     40474 ± 40%     -61.8%      15444 ± 40%     -50.4%      20065 ± 59%  numa-meminfo.node1.KReclaimable
     40474 ± 40%     -61.8%      15444 ± 40%     -50.4%      20065 ± 59%  numa-meminfo.node1.SReclaimable
    484115 ±  6%      -7.3%     448760 ±  6%     -10.1%     435387 ± 11%  numa-meminfo.node3.Active
    484083 ±  6%      -7.3%     448760 ±  6%     -10.1%     435387 ± 11%  numa-meminfo.node3.Active(anon)
    485577 ±  6%      -7.3%     450224 ±  6%     -10.0%     436799 ± 11%  numa-meminfo.node3.Shmem
    454393 ± 69%     -43.1%     258770 ±127%     +63.5%     743038 ±  3%  numa-vmstat.node0.nr_file_pages
      4178 ± 73%     -28.5%       2988 ±107%     +81.6%       7590 ±  2%  numa-vmstat.node0.nr_mapped
    452904 ± 69%     -43.2%     257370 ±128%     +63.8%     741987 ±  3%  numa-vmstat.node0.nr_unevictable
    452904 ± 69%     -43.2%     257370 ±128%     +63.8%     741987 ±  3%  numa-vmstat.node0.nr_zone_unevictable
     10118 ± 40%     -61.8%       3861 ± 40%     -50.4%       5016 ± 59%  numa-vmstat.node1.nr_slab_reclaimable
    121015 ±  6%      -7.3%     112196 ±  6%     -10.1%     108836 ± 11%  numa-vmstat.node3.nr_active_anon
    121371 ±  6%      -7.3%     112537 ±  6%     -10.1%     109168 ± 11%  numa-vmstat.node3.nr_shmem
    121015 ±  6%      -7.3%     112196 ±  6%     -10.1%     108836 ± 11%  numa-vmstat.node3.nr_zone_active_anon
      0.01 ± 52%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.01 ± 15%      +7.0%       0.01 ± 16%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.06 ± 69%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.01 ± 17%      -3.8%       0.01 ± 21%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    400.06            +0.0%     400.07          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
     10.00            +0.0%      10.00          -100.0%       0.00        perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    999.53            -0.0%     999.38          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 52%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
    400.05            +0.0%     400.06          -100.0%       0.00        perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.06 ± 69%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
    999.52            -0.0%     999.37          -100.0%       0.00        perf-sched.wait_time.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
 2.071e+09            +2.8%  2.128e+09            +2.8%   2.13e+09        perf-stat.i.branch-instructions
      0.48            -4.2%       0.46            -4.9%       0.45        perf-stat.i.cpi
 4.717e+09            -0.7%  4.686e+09            +0.1%  4.723e+09        perf-stat.i.cpu-cycles
 9.794e+09            +5.1%   1.03e+10            +5.2%   1.03e+10        perf-stat.i.instructions
      2.15            +5.8%       2.28            +5.5%       2.27        perf-stat.i.ipc
      0.34 ±  3%      -0.0        0.33            -0.0        0.34        perf-stat.overall.branch-miss-rate%
      0.48            -5.5%       0.46            -4.8%       0.46        perf-stat.overall.cpi
      2.08            +5.8%       2.20            +5.1%       2.18        perf-stat.overall.ipc
 2.063e+09            +2.8%   2.12e+09            +2.8%  2.122e+09        perf-stat.ps.branch-instructions
 4.703e+09            -0.7%  4.672e+09            +0.1%  4.709e+09        perf-stat.ps.cpu-cycles
 9.758e+09            +5.1%  1.026e+10            +5.2%  1.026e+10        perf-stat.ps.instructions
 2.944e+12            +5.5%  3.106e+12            +5.0%  3.092e+12        perf-stat.total.instructions
      6.54 ±  2%      -6.5        0.00            -6.5        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.22            -6.2        0.00            -6.2        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     17.43            -0.7       16.76            -1.0       16.39        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      9.69 ±  2%      -0.6        9.07            -1.0        8.73        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
     11.30 ±  2%      -0.6       10.71            -0.9       10.35        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
     15.57            -0.5       15.05            -0.8       14.73        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
      2.76            -0.1        2.62 ±  3%      -0.2        2.60 ±  3%  perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.84 ±  4%      -0.1        0.74 ±  8%      -0.1        0.70 ±  5%  perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.75 ±  7%      -0.1        0.68 ± 10%      -0.1        0.64 ± 10%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
      0.75 ±  7%      -0.1        0.68 ± 10%      -0.1        0.64 ± 10%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      0.75 ±  7%      -0.1        0.68 ± 10%      -0.1        0.64 ± 10%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      1.12 ±  5%      +0.2        1.29 ±  3%      +0.2        1.29 ±  2%  perf-profile.calltrace.cycles-pp.sized_strscpy.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      0.65 ±  6%      +0.4        1.07 ±  5%      +0.4        1.07 ±  4%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.5        0.48 ± 45%      +0.5        0.54 ±  6%  perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.5        0.54 ±  4%      +0.4        0.36 ± 70%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.5        0.55 ±  4%      +0.5        0.48 ± 45%  perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.7        0.66 ±  4%      +0.8        0.80 ±  4%  perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.7        0.68 ±  9%      +0.7        0.71 ±  5%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.7        0.68 ±  4%      +0.8        0.76 ±  6%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.8        0.76 ±  2%      +0.8        0.77 ±  7%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
     81.90            +0.8       82.67            +0.7       82.64        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.8        0.80 ±  3%      +0.7        0.65 ±  8%  perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
     80.94            +0.8       81.76            +0.8       81.74        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.8        0.82 ±  3%      +0.9        0.87 ±  6%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
     77.52            +1.0       78.50            +0.9       78.40        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +1.3        1.26 ±  3%      +1.3        1.33 ±  5%  perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +1.3        1.35 ±  3%      +1.3        1.30 ±  4%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.6        1.56 ±  2%      +1.6        1.56 ±  5%  perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.7        1.72 ±  3%      +1.7        1.66 ±  3%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.9        1.87 ±  4%      +1.9        1.94 ±  4%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
      0.00            +2.1        2.07 ±  2%      +2.1        2.15 ±  4%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +2.1        2.14 ±  2%      +2.1        2.12 ±  3%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.4        2.37 ±  2%      +2.4        2.41 ±  3%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
     51.80            +2.9       54.66            +2.7       54.52        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +3.0        3.02 ±  2%      +3.2        3.17        perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +3.1        3.06            +3.1        3.08 ±  2%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +3.9        3.86            +4.0        3.96 ±  2%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +5.0        5.01            +5.2        5.18 ±  2%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +5.9        5.88            +6.1        6.10        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00           +27.1       27.13           +27.4       27.39        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00           +31.6       31.63           +32.0       32.01        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.46            -1.2        5.24            -1.1        5.40 ±  2%  perf-profile.children.cycles-pp.mas_preallocate
      5.54            -0.9        4.64            -0.9        4.63 ±  3%  perf-profile.children.cycles-pp.up_write
      3.99            -0.9        3.10 ±  2%      -0.8        3.24        perf-profile.children.cycles-pp.mas_wr_store_type
     17.57            -0.7       16.87            -1.1       16.50        perf-profile.children.cycles-pp.perf_event_mmap
      9.85 ±  2%      -0.6        9.22            -1.0        8.90        perf-profile.children.cycles-pp.perf_event_mmap_output
      6.82 ±  2%      -0.6        6.22            -0.4        6.43        perf-profile.children.cycles-pp.mas_store_prealloc
      1.33 ±  5%      -0.6        0.75 ±  4%      -0.5        0.82 ±  5%  perf-profile.children.cycles-pp.can_vma_merge_after
     11.53 ±  2%      -0.6       10.96            -1.0       10.58        perf-profile.children.cycles-pp.perf_iterate_sb
     16.03            -0.5       15.50            -0.8       15.18        perf-profile.children.cycles-pp.perf_event_mmap_event
      2.65 ±  3%      -0.2        2.40 ±  3%      -0.2        2.44 ±  3%  perf-profile.children.cycles-pp.mas_update_gap
      2.18 ±  2%      -0.2        1.94 ±  3%      -0.2        1.99 ±  3%  perf-profile.children.cycles-pp.mas_leaf_max_gap
      2.85            -0.2        2.70 ±  3%      -0.2        2.68 ±  3%  perf-profile.children.cycles-pp.userfaultfd_unmap_complete
      3.52            -0.1        3.38 ±  2%      -0.1        3.38 ±  2%  perf-profile.children.cycles-pp.vma_complete
      0.62 ±  7%      -0.1        0.48 ±  9%      -0.1        0.50 ±  7%  perf-profile.children.cycles-pp.may_expand_vm
      1.92 ±  2%      -0.1        1.79 ±  3%      -0.2        1.73 ±  3%  perf-profile.children.cycles-pp.init_multi_vma_prep
      1.05 ±  3%      -0.1        0.95 ±  6%      -0.1        0.91 ±  6%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      0.75 ±  7%      -0.1        0.68 ± 10%      -0.1        0.64 ± 10%  perf-profile.children.cycles-pp.kthread
      0.40 ±  6%      -0.1        0.35 ±  9%      -0.0        0.38 ±  8%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.35 ±  2%      -0.0        0.33 ±  7%      -0.0        0.31 ±  6%  perf-profile.children.cycles-pp.brk_test
      0.11 ± 20%      +0.0        0.14 ±  8%      +0.0        0.13 ±  6%  perf-profile.children.cycles-pp.anon_vma_interval_tree_remove
      0.52 ±  3%      +0.0        0.56 ±  4%      +0.0        0.54 ± 10%  perf-profile.children.cycles-pp.mas_wr_slot_store
      0.20 ± 11%      +0.1        0.28 ±  7%      +0.1        0.31 ±  5%  perf-profile.children.cycles-pp.rb_next
      0.49 ±  3%      +0.1        0.61 ±  4%      +0.2        0.64 ±  7%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.98 ±  4%      +0.1        1.11 ±  3%      +0.1        1.11 ±  4%  perf-profile.children.cycles-pp.rcu_all_qs
      0.39 ±  7%      +0.2        0.55 ±  7%      +0.2        0.60 ±  6%  perf-profile.children.cycles-pp.strnlen
      1.18 ±  5%      +0.2        1.37 ±  3%      +0.2        1.36 ±  2%  perf-profile.children.cycles-pp.sized_strscpy
      1.78 ±  3%      +0.3        2.04 ±  2%      +0.2        2.02 ±  3%  perf-profile.children.cycles-pp.__cond_resched
      0.00            +0.3        0.33 ±  4%      +0.3        0.31 ± 10%  perf-profile.children.cycles-pp.mas_next_setup
      0.41 ±  9%      +0.4        0.76 ±  7%      +0.4        0.80 ±  6%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.58 ±  4%      +0.4        0.96 ±  2%      +0.4        1.01 ±  6%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
      0.44 ± 17%      +0.4        0.85 ±  7%      +0.4        0.80 ±  5%  perf-profile.children.cycles-pp.mas_prev_setup
      4.11 ±  2%      +0.4        4.52            +0.5        4.58        perf-profile.children.cycles-pp.down_write
      0.74 ±  6%      +0.6        1.29 ±  5%      +0.6        1.32 ±  3%  perf-profile.children.cycles-pp.__vm_enough_memory
      0.00            +0.7        0.67 ±  6%      +0.6        0.63 ±  7%  perf-profile.children.cycles-pp.mas_next_range
      0.95 ±  5%      +0.7        1.64 ±  2%      +0.9        1.84 ±  3%  perf-profile.children.cycles-pp.mas_next_slot
     78.23            +0.9       79.17            +0.8       79.07        perf-profile.children.cycles-pp.__do_sys_brk
      1.02 ± 14%      +1.0        1.99 ±  4%      +0.8        1.86 ±  6%  perf-profile.children.cycles-pp.mas_prev
      2.89 ±  3%      +1.2        4.10            +1.3        4.19 ±  2%  perf-profile.children.cycles-pp.vma_prepare
      1.38 ± 12%      +1.3        2.73 ±  4%      +1.3        2.73 ±  3%  perf-profile.children.cycles-pp.mas_prev_slot
     53.08            +1.9       55.03            +1.9       54.96        perf-profile.children.cycles-pp.do_brk_flags
      0.00           +28.0       27.95           +28.2       28.20        perf-profile.children.cycles-pp.vma_expand
      0.00           +32.1       32.10           +32.3       32.32        perf-profile.children.cycles-pp.vma_merge_new_range
      5.69            -3.4        2.34 ±  3%      -3.3        2.36 ±  3%  perf-profile.self.cycles-pp.do_brk_flags
      5.22            -0.9        4.33 ±  2%      -0.9        4.33 ±  3%  perf-profile.self.cycles-pp.up_write
      3.82            -0.9        2.95 ±  3%      -0.7        3.09        perf-profile.self.cycles-pp.mas_wr_store_type
      9.68 ±  2%      -0.6        9.05            -1.0        8.73        perf-profile.self.cycles-pp.perf_event_mmap_output
      1.28 ±  5%      -0.6        0.69 ±  6%      -0.5        0.75 ±  5%  perf-profile.self.cycles-pp.can_vma_merge_after
      2.88 ±  3%      -0.4        2.44 ±  2%      -0.2        2.65 ±  2%  perf-profile.self.cycles-pp.mas_store_prealloc
      2.55            -0.3        2.22 ±  2%      -0.3        2.22 ±  4%  perf-profile.self.cycles-pp.mas_preallocate
      4.98 ±  2%      -0.3        4.70            -0.2        4.76        perf-profile.self.cycles-pp.__do_sys_brk
      2.15 ±  3%      -0.2        1.93 ±  3%      -0.2        1.98 ±  3%  perf-profile.self.cycles-pp.mas_leaf_max_gap
      1.82            -0.2        1.60 ±  4%      -0.2        1.62 ±  3%  perf-profile.self.cycles-pp.perf_event_mmap_event
      1.51 ±  4%      -0.2        1.31 ±  4%      -0.3        1.26        perf-profile.self.cycles-pp.perf_event_mmap
      1.85 ±  2%      -0.2        1.66 ±  3%      -0.2        1.61 ±  3%  perf-profile.self.cycles-pp.init_multi_vma_prep
      2.77            -0.1        2.63 ±  3%      -0.2        2.60 ±  3%  perf-profile.self.cycles-pp.userfaultfd_unmap_complete
      0.75 ±  5%      -0.1        0.67 ±  4%      -0.1        0.65 ±  5%  perf-profile.self.cycles-pp.security_vm_enough_memory_mm
      0.88 ±  2%      -0.0        0.85 ±  3%      -0.1        0.82 ±  4%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.28 ±  5%      -0.0        0.26 ±  8%      -0.0        0.24 ±  8%  perf-profile.self.cycles-pp.brk_test
      0.03 ± 70%      +0.0        0.07 ± 14%      +0.0        0.07 ±  8%  perf-profile.self.cycles-pp.anon_vma_interval_tree_remove
      0.15 ± 12%      +0.1        0.20 ±  5%      +0.1        0.24 ±  6%  perf-profile.self.cycles-pp.rb_next
      0.40 ±  5%      +0.1        0.48 ±  5%      +0.1        0.50 ± 10%  perf-profile.self.cycles-pp.mas_wr_store_entry
      0.34 ±  6%      +0.2        0.50 ±  8%      +0.2        0.54 ±  4%  perf-profile.self.cycles-pp.strnlen
      0.66 ±  4%      +0.2        0.84 ±  5%      +0.1        0.80 ±  4%  perf-profile.self.cycles-pp.vma_prepare
      1.12 ±  5%      +0.2        1.30 ±  3%      +0.2        1.28 ±  2%  perf-profile.self.cycles-pp.sized_strscpy
      3.00 ±  2%      +0.2        3.19 ±  2%      +0.3        3.27 ±  2%  perf-profile.self.cycles-pp.down_write
      0.92 ±  4%      +0.2        1.13 ±  2%      +0.2        1.08 ±  3%  perf-profile.self.cycles-pp.__cond_resched
      0.00            +0.3        0.26 ±  7%      +0.3        0.25 ± 12%  perf-profile.self.cycles-pp.mas_next_setup
      0.28 ±  8%      +0.3        0.54 ±  8%      +0.3        0.56 ±  6%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.29 ± 12%      +0.3        0.58 ±  9%      +0.3        0.60 ±  4%  perf-profile.self.cycles-pp.__vm_enough_memory
      0.29 ± 24%      +0.3        0.58 ±  7%      +0.2        0.54 ±  7%  perf-profile.self.cycles-pp.mas_prev_setup
      0.40 ±  4%      +0.3        0.70 ±  2%      +0.3        0.72 ±  8%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
      0.00            +0.4        0.36 ±  6%      +0.3        0.28 ± 12%  perf-profile.self.cycles-pp.mas_next_range
      0.58 ± 14%      +0.5        1.12 ±  5%      +0.5        1.10 ±  6%  perf-profile.self.cycles-pp.mas_prev
      0.81 ±  4%      +0.7        1.48 ±  3%      +0.8        1.64 ±  4%  perf-profile.self.cycles-pp.mas_next_slot
      1.32 ± 11%      +1.3        2.59 ±  3%      +1.3        2.60 ±  3%  perf-profile.self.cycles-pp.mas_prev_slot
      0.00            +1.3        1.30 ±  4%      +1.1        1.14 ±  6%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +3.4        3.39            +3.4        3.36        perf-profile.self.cycles-pp.vma_expand

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3540976            -6.4%    3314159            -6.7%    3302864        aim9.brk_test.ops_per_sec
     23.65            -5.8%      22.28            -6.5%      22.10        time.user_time
    568.02 ± 10%      -0.2%     567.09 ± 12%     +18.0%     670.52 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.min
    568.02 ± 10%      -0.2%     567.09 ± 12%     +18.0%     670.52 ±  8%  sched_debug.cfs_rq:/.min_vruntime.min
    111409 ±  2%      -5.1%     105748 ±  3%      -4.9%     105984 ±  2%  proc-vmstat.nr_active_anon
    114711 ±  2%      -5.0%     109006 ±  3%      -4.7%     109341 ±  2%  proc-vmstat.nr_shmem
    111409 ±  2%      -5.1%     105748 ±  3%      -4.9%     105984 ±  2%  proc-vmstat.nr_zone_active_anon
     17422 ±  2%      -5.3%      16494            -1.9%      17084 ±  2%  proc-vmstat.pgactivate
 1.999e+09            +3.2%  2.064e+09            +2.9%  2.057e+09        perf-stat.i.branch-instructions
      0.47            -5.1%       0.44            -4.4%       0.45        perf-stat.i.cpi
 9.452e+09            +5.6%  9.983e+09            +5.3%  9.951e+09        perf-stat.i.instructions
      2.19            +5.8%       2.31            +5.2%       2.30        perf-stat.i.ipc
      0.33 ±  3%      -0.0        0.31            -0.0        0.32        perf-stat.overall.branch-miss-rate%
      0.47            -5.1%       0.45            -4.4%       0.45        perf-stat.overall.cpi
      2.12            +5.4%       2.23            +4.7%       2.21        perf-stat.overall.ipc
 1.991e+09            +3.2%  2.056e+09            +2.9%   2.05e+09        perf-stat.ps.branch-instructions
 9.417e+09            +5.6%  9.946e+09            +5.3%  9.915e+09        perf-stat.ps.instructions
 2.841e+12            +5.7%  3.002e+12            +5.2%  2.988e+12        perf-stat.total.instructions
      0.01 ± 42%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.02 ± 37%     -68.5%       0.01 ± 44%     -57.7%       0.01 ± 63%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.02 ± 38%     +31.7%       0.02 ± 21%     +53.8%       0.03 ± 10%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.04 ± 66%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.05 ± 47%     -75.3%       0.01 ± 83%     -69.2%       0.01 ± 74%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.01 ±  9%     +33.8%       0.02 ± 18%     +17.5%       0.02 ± 10%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.44 ± 49%     -29.7%       0.31 ± 35%    +145.0%       1.09 ± 41%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.08 ± 57%     -69.7%       0.02 ±146%      +6.7%       0.09 ± 66%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.10 ± 37%     +45.2%       0.15 ±  8%     +20.5%       0.12 ± 13%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      7209 ±  3%      -7.8%       6648 ±  2%      -3.6%       6948 ±  3%  perf-sched.total_wait_and_delay.count.ms
      1533 ±  6%     -10.2%       1377            -2.6%       1493 ±  7%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 42%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.02 ± 37%     -68.5%       0.01 ± 44%     -57.7%       0.01 ± 63%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.04 ± 66%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.05 ± 47%     -75.3%       0.01 ± 83%     -69.2%       0.01 ± 74%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.61            -6.6        0.00            -6.6        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.20            -6.2        0.00            -6.2        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     17.96            -1.1       16.87            -1.5       16.43        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.08            -1.0       15.10            -1.3       14.78        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
      9.85            -0.8        9.02            -1.2        8.66        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
     11.56            -0.8       10.73            -1.2       10.33 ±  2%  perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      5.32            -0.2        5.10 ±  5%      -0.3        5.03        perf-profile.calltrace.cycles-pp.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      3.57 ±  2%      -0.1        3.43 ±  3%      -0.2        3.40 ±  3%  perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64
      4.87            -0.1        4.74 ±  4%      -0.2        4.66 ±  2%  perf-profile.calltrace.cycles-pp.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.76 ±  2%      -0.1        2.66 ±  3%      -0.2        2.58 ±  3%  perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      1.11 ± 15%      -0.1        1.04 ±  4%      -0.1        0.96 ±  5%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
      0.64 ±  4%      +0.4        1.06 ±  2%      +0.4        1.06 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.6        0.56 ±  5%      +0.6        0.56 ±  4%  perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.6        0.57 ±  6%      +0.5        0.47 ± 46%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.6        0.58 ±  7%      +0.6        0.58 ±  3%  perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.7        0.69 ±  4%      +0.7        0.71 ±  5%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.7        0.70 ±  6%      +0.8        0.78 ±  6%  perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.7        0.73 ±  8%      +0.7        0.74 ±  8%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +0.7        0.74 ±  5%      +0.7        0.74 ±  7%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.8        0.84 ±  2%      +0.7        0.70 ±  4%  perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.9        0.88 ±  5%      +0.9        0.86 ±  3%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
     78.92            +0.9       79.81            +0.5       79.46        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +1.3        1.28 ±  2%      +1.3        1.30 ±  5%  perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +1.4        1.42 ±  3%      +1.4        1.36 ±  3%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.6        1.59 ±  4%      +1.6        1.64 ±  3%  perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.8        1.80 ±  4%      +1.7        1.73 ±  2%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.9        1.89 ±  4%      +1.9        1.90 ±  5%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
      0.00            +2.1        2.06 ±  3%      +2.2        2.17 ±  2%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +2.1        2.12 ±  2%      +2.1        2.14 ±  2%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.4        2.43 ±  4%      +2.4        2.40 ±  4%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
     52.76            +2.6       55.40            +2.7       55.48        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +3.0        2.98            +3.4        3.38 ±  2%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +3.1        3.11 ±  3%      +3.2        3.18 ±  5%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +3.9        3.90 ±  2%      +4.0        4.00 ±  2%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +5.0        4.96            +5.4        5.39        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +6.0        6.04 ±  2%      +6.2        6.15        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00           +27.5       27.47           +28.1       28.06        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00           +32.1       32.09           +32.7       32.73        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.44            -1.2        5.20            -0.8        5.64        perf-profile.children.cycles-pp.mas_preallocate
     18.11            -1.1       16.99            -1.6       16.55        perf-profile.children.cycles-pp.perf_event_mmap
      4.01 ±  2%      -1.0        3.06            -0.6        3.45 ±  2%  perf-profile.children.cycles-pp.mas_wr_store_type
     16.54            -0.9       15.60            -1.3       15.27        perf-profile.children.cycles-pp.perf_event_mmap_event
     10.02            -0.8        9.18            -1.2        8.80        perf-profile.children.cycles-pp.perf_event_mmap_output
      5.61            -0.8        4.77            -0.8        4.80 ±  2%  perf-profile.children.cycles-pp.up_write
     11.80            -0.8       10.97            -1.2       10.57 ±  2%  perf-profile.children.cycles-pp.perf_iterate_sb
      1.39            -0.6        0.81 ±  3%      -0.6        0.81 ±  6%  perf-profile.children.cycles-pp.can_vma_merge_after
      6.89            -0.5        6.38            -0.4        6.52        perf-profile.children.cycles-pp.mas_store_prealloc
      3.67 ±  2%      -0.3        3.41 ±  3%      -0.2        3.48 ±  5%  perf-profile.children.cycles-pp.vma_complete
      5.55            -0.2        5.32 ±  4%      -0.3        5.23        perf-profile.children.cycles-pp.check_brk_limits
      2.20 ±  4%      -0.2        1.97 ±  3%      -0.2        1.96 ±  4%  perf-profile.children.cycles-pp.mas_leaf_max_gap
      2.68 ±  3%      -0.2        2.47 ±  3%      -0.2        2.44 ±  4%  perf-profile.children.cycles-pp.mas_update_gap
      5.11            -0.2        4.91 ±  4%      -0.3        4.84 ±  2%  perf-profile.children.cycles-pp.__get_unmapped_area
      2.51 ±  3%      -0.1        2.36 ±  3%      -0.2        2.32 ±  2%  perf-profile.children.cycles-pp.down_write_killable
      0.61 ±  5%      -0.1        0.49 ±  7%      -0.1        0.50 ±  5%  perf-profile.children.cycles-pp.may_expand_vm
      3.67            -0.1        3.55 ±  3%      -0.2        3.51 ±  3%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
      1.25 ±  4%      -0.1        1.14 ±  5%      -0.1        1.19 ±  4%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      2.85 ±  2%      -0.1        2.74 ±  2%      -0.2        2.66 ±  3%  perf-profile.children.cycles-pp.userfaultfd_unmap_complete
      0.14 ± 11%      -0.1        0.08 ± 12%      -0.1        0.06 ± 14%  perf-profile.children.cycles-pp.arch_vma_name
      0.42            -0.1        0.36 ±  4%      -0.0        0.40 ±  3%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.36 ±  6%      -0.1        0.31 ±  4%      -0.0        0.34 ± 11%  perf-profile.children.cycles-pp.brk_test
      2.39            -0.1        2.34 ±  5%      -0.1        2.29        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown_vmflags
      0.25 ±  5%      -0.0        0.21 ±  9%      -0.0        0.23 ± 13%  perf-profile.children.cycles-pp.__rb_insert_augmented
      1.92 ±  2%      -0.0        1.89 ±  4%      -0.1        1.80 ±  2%  perf-profile.children.cycles-pp.init_multi_vma_prep
      0.31 ± 10%      -0.0        0.29 ± 10%      -0.0        0.26 ±  9%  perf-profile.children.cycles-pp.sched_setaffinity
      0.14 ±  3%      -0.0        0.14 ±  7%      -0.0        0.11 ±  8%  perf-profile.children.cycles-pp.intel_idle
      0.08 ± 10%      +0.0        0.12 ± 16%      +0.0        0.10 ±  9%  perf-profile.children.cycles-pp.mmap_region
      0.09 ±  8%      +0.0        0.12 ± 15%      +0.0        0.11 ±  8%  perf-profile.children.cycles-pp.do_mmap
      0.10 ± 14%      +0.0        0.15 ± 11%      +0.0        0.14 ± 13%  perf-profile.children.cycles-pp.anon_vma_interval_tree_remove
      1.01 ±  5%      +0.1        1.08 ±  5%      +0.1        1.12        perf-profile.children.cycles-pp.rcu_all_qs
      0.19 ±  5%      +0.1        0.30 ±  5%      +0.1        0.28 ±  8%  perf-profile.children.cycles-pp.rb_next
      1.27 ±  4%      +0.1        1.40 ±  2%      +0.1        1.37 ±  5%  perf-profile.children.cycles-pp.sized_strscpy
      0.42 ±  6%      +0.1        0.57 ±  6%      +0.2        0.63 ±  9%  perf-profile.children.cycles-pp.strnlen
      0.48 ±  4%      +0.2        0.64 ±  5%      +0.2        0.65 ±  5%  perf-profile.children.cycles-pp.mas_wr_store_entry
      1.80 ±  4%      +0.2        2.02 ±  2%      +0.2        2.00        perf-profile.children.cycles-pp.__cond_resched
      0.00            +0.3        0.31 ± 12%      +0.3        0.32 ±  3%  perf-profile.children.cycles-pp.mas_next_setup
      4.16 ±  3%      +0.3        4.48            +0.5        4.64 ±  3%  perf-profile.children.cycles-pp.down_write
      0.40 ±  6%      +0.4        0.78 ±  4%      +0.4        0.79 ±  3%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.48 ±  4%      +0.4        0.92 ±  5%      +0.4        0.90 ±  3%  perf-profile.children.cycles-pp.mas_prev_setup
      0.56 ±  6%      +0.5        1.02 ±  4%      +0.5        1.01 ±  4%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
      0.79 ±  4%      +0.5        1.30 ±  3%      +0.5        1.30 ±  4%  perf-profile.children.cycles-pp.__vm_enough_memory
      1.02 ±  2%      +0.7        1.72 ±  3%      +0.8        1.79 ±  4%  perf-profile.children.cycles-pp.mas_next_slot
      0.00            +0.7        0.70 ±  6%      +0.7        0.68 ±  4%  perf-profile.children.cycles-pp.mas_next_range
     79.62            +0.9       80.50            +0.5       80.13        perf-profile.children.cycles-pp.__do_sys_brk
      1.10 ±  3%      +1.0        2.10 ±  3%      +0.8        1.94 ±  3%  perf-profile.children.cycles-pp.mas_prev
      2.86 ±  3%      +1.3        4.12 ±  3%      +1.4        4.25        perf-profile.children.cycles-pp.vma_prepare
      1.45 ±  4%      +1.3        2.79 ±  3%      +1.3        2.77 ±  3%  perf-profile.children.cycles-pp.mas_prev_slot
     54.06            +1.8       55.82            +1.9       55.95        perf-profile.children.cycles-pp.do_brk_flags
      0.00           +28.3       28.30           +28.9       28.88        perf-profile.children.cycles-pp.vma_expand
      0.00           +32.6       32.58           +33.1       33.05        perf-profile.children.cycles-pp.vma_merge_new_range
      5.90 ±  2%      -3.4        2.47 ±  3%      -3.4        2.47 ±  2%  perf-profile.self.cycles-pp.do_brk_flags
      3.84 ±  2%      -0.9        2.90            -0.6        3.28 ±  3%  perf-profile.self.cycles-pp.mas_wr_store_type
      9.85            -0.8        9.02            -1.2        8.63        perf-profile.self.cycles-pp.perf_event_mmap_output
      5.26            -0.8        4.47 ±  2%      -0.8        4.49 ±  2%  perf-profile.self.cycles-pp.up_write
      1.34 ±  2%      -0.6        0.75 ±  3%      -0.6        0.74 ±  5%  perf-profile.self.cycles-pp.can_vma_merge_after
      2.86            -0.4        2.47 ±  5%      -0.2        2.70 ±  2%  perf-profile.self.cycles-pp.mas_store_prealloc
      2.50 ±  2%      -0.3        2.22 ±  2%      -0.2        2.27 ±  2%  perf-profile.self.cycles-pp.mas_preallocate
      5.02 ±  2%      -0.2        4.79            -0.3        4.69        perf-profile.self.cycles-pp.__do_sys_brk
      2.19 ±  4%      -0.2        1.96 ±  3%      -0.2        1.95 ±  4%  perf-profile.self.cycles-pp.mas_leaf_max_gap
      1.87 ±  3%      -0.2        1.66 ±  4%      -0.2        1.71 ±  6%  perf-profile.self.cycles-pp.perf_event_mmap_event
      1.52 ±  3%      -0.2        1.33 ±  2%      -0.3        1.24 ±  2%  perf-profile.self.cycles-pp.perf_event_mmap
      1.82 ±  3%      -0.1        1.68 ±  4%      -0.2        1.66 ±  3%  perf-profile.self.cycles-pp.down_write_killable
      1.84 ±  2%      -0.1        1.74 ±  4%      -0.2        1.67 ±  2%  perf-profile.self.cycles-pp.init_multi_vma_prep
      2.77 ±  2%      -0.1        2.67 ±  2%      -0.2        2.58 ±  3%  perf-profile.self.cycles-pp.userfaultfd_unmap_complete
      1.18 ±  2%      -0.1        1.09            -0.1        1.11 ±  6%  perf-profile.self.cycles-pp.do_syscall_64
      0.92 ±  3%      -0.1        0.84 ±  6%      -0.1        0.82 ±  5%  perf-profile.self.cycles-pp.__get_unmapped_area
      2.30            -0.0        2.26 ±  5%      -0.1        2.21        perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown_vmflags
      0.33 ±  4%      -0.0        0.30 ±  5%      -0.0        0.33 ±  6%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      1.08 ±  2%      -0.0        1.05 ±  4%      -0.1        1.00 ±  2%  perf-profile.self.cycles-pp.mas_find
      0.14 ±  3%      -0.0        0.14 ±  7%      -0.0        0.11 ±  8%  perf-profile.self.cycles-pp.intel_idle
      0.03 ± 70%      +0.1        0.08 ± 14%      +0.0        0.07 ± 15%  perf-profile.self.cycles-pp.anon_vma_interval_tree_remove
      0.13 ±  7%      +0.1        0.22 ± 10%      +0.1        0.22 ± 11%  perf-profile.self.cycles-pp.rb_next
      0.40 ±  7%      +0.1        0.50 ±  9%      +0.1        0.51 ±  6%  perf-profile.self.cycles-pp.mas_wr_store_entry
      1.21 ±  4%      +0.1        1.32 ±  2%      +0.1        1.30 ±  5%  perf-profile.self.cycles-pp.sized_strscpy
      0.38 ±  7%      +0.1        0.50 ±  7%      +0.2        0.57 ±  9%  perf-profile.self.cycles-pp.strnlen
      0.95 ±  7%      +0.1        1.10 ±  4%      +0.1        1.08        perf-profile.self.cycles-pp.__cond_resched
      0.63 ±  5%      +0.2        0.82 ±  9%      +0.2        0.83 ±  2%  perf-profile.self.cycles-pp.vma_prepare
      2.98 ±  2%      +0.2        3.20            +0.3        3.32 ±  4%  perf-profile.self.cycles-pp.down_write
      0.37 ±  6%      +0.2        0.60 ±  6%      +0.2        0.58 ±  4%  perf-profile.self.cycles-pp.__vm_enough_memory
      0.00            +0.2        0.24 ± 11%      +0.2        0.24 ±  5%  perf-profile.self.cycles-pp.mas_next_setup
      0.24 ±  6%      +0.3        0.54 ±  6%      +0.3        0.56 ±  4%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.32 ±  4%      +0.3        0.64 ±  5%      +0.3        0.62 ±  3%  perf-profile.self.cycles-pp.mas_prev_setup
      0.38 ± 10%      +0.3        0.72 ±  6%      +0.3        0.73 ±  6%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
      0.00            +0.4        0.41 ±  5%      +0.3        0.33 ±  9%  perf-profile.self.cycles-pp.mas_next_range
      0.63 ±  5%      +0.5        1.17 ±  3%      +0.5        1.10 ±  5%  perf-profile.self.cycles-pp.mas_prev
      0.87 ±  3%      +0.6        1.49 ±  3%      +0.7        1.61 ±  6%  perf-profile.self.cycles-pp.mas_next_slot
      1.37 ±  4%      +1.3        2.64 ±  3%      +1.3        2.62 ±  4%  perf-profile.self.cycles-pp.mas_prev_slot
      0.00            +1.3        1.30            +1.2        1.19 ±  4%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +3.4        3.45 ±  2%      +3.4        3.43 ±  2%  perf-profile.self.cycles-pp.vma_expand

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   2667734            -5.6%    2518021            -6.2%    2503505        aim9.brk_test.ops_per_sec
    196.00            +0.0%     196.00         +1038.8%       2231 ± 89%  meminfo.Inactive(file)
     23.94            -8.7%      21.86 ±  2%      -6.0%      22.51        time.user_time
      0.01 ± 47%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.06 ± 34%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.01 ± 47%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.06 ± 34%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
     49.00            +0.0%      49.00         +1039.3%     558.24 ± 89%  proc-vmstat.nr_inactive_file
     49.00            +0.0%      49.00         +1039.3%     558.24 ± 89%  proc-vmstat.nr_zone_inactive_file
    948658            +2.3%     970280            +3.2%     978780        proc-vmstat.pgalloc_normal
    792310            -1.5%     780779            -1.7%     779104        proc-vmstat.pgfault
    814343            +2.4%     833987            +3.0%     839063        proc-vmstat.pgfree
 1.721e+09            +3.0%  1.773e+09            +2.6%  1.765e+09        perf-stat.i.branch-instructions
      0.54            -5.4%       0.52            -4.8%       0.52        perf-stat.i.cpi
 7.553e+09            +6.0%  8.003e+09            +5.5%  7.968e+09        perf-stat.i.instructions
      1.86            +6.1%       1.97            +5.3%       1.96        perf-stat.i.ipc
      2399            -1.1%       2372            -1.3%       2367        perf-stat.i.minor-faults
      2399            -1.1%       2372            -1.3%       2367        perf-stat.i.page-faults
      0.36 ±  2%      -0.0        0.35            +0.0        0.36        perf-stat.overall.branch-miss-rate%
      0.55            -5.3%       0.52            -4.6%       0.52        perf-stat.overall.cpi
      1.82            +5.6%       1.92            +4.8%       1.91        perf-stat.overall.ipc
 1.715e+09            +3.0%  1.767e+09            +2.6%   1.76e+09        perf-stat.ps.branch-instructions
 7.529e+09            +5.9%  7.977e+09            +5.5%  7.942e+09        perf-stat.ps.instructions
      2391            -1.1%       2364            -1.3%       2359        perf-stat.ps.minor-faults
      2391            -1.1%       2364            -1.3%       2359        perf-stat.ps.page-faults
 2.275e+12            +5.8%  2.408e+12            +5.3%  2.395e+12        perf-stat.total.instructions
      6.58 ±  2%      -6.6        0.00            -6.6        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.76 ±  2%      -5.8        0.00            -5.8        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     18.35            -1.3       17.10            -1.0       17.32        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     15.92            -1.1       14.78 ±  2%      -0.9       15.03        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
     11.03            -0.7       10.33            -0.7       10.36        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      4.22 ±  3%      -0.4        3.79 ±  2%      -0.5        3.69        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      8.48            -0.4        8.08 ±  2%      -0.5        7.95        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
      5.32            -0.3        4.98            -0.2        5.10 ±  3%  perf-profile.calltrace.cycles-pp.clear_bhb_loop.brk
      5.38 ±  3%      -0.3        5.06            -0.5        4.89        perf-profile.calltrace.cycles-pp.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      1.16 ±  7%      -0.3        0.86 ±  5%      -0.3        0.90 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.88 ± 14%      -0.2        0.64 ±  8%      -0.2        0.64 ±  4%  perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.56            -0.2        5.38            -0.2        5.34        perf-profile.calltrace.cycles-pp.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.74 ±  6%      -0.2        0.57 ±  6%      -0.1        0.62 ±  7%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
      5.09            -0.2        4.92            -0.2        4.90        perf-profile.calltrace.cycles-pp.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.73            -0.2        3.56 ±  2%      -0.2        3.58 ±  2%  perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64
      1.25 ±  2%      -0.2        1.08 ±  9%      -0.1        1.13 ±  4%  perf-profile.calltrace.cycles-pp.sized_strscpy.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      1.98 ±  2%      -0.1        1.84 ±  3%      -0.1        1.88 ±  2%  perf-profile.calltrace.cycles-pp.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.55 ±  2%      -0.1        0.42 ± 44%      -0.1        0.46 ± 44%  perf-profile.calltrace.cycles-pp.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.78 ±  3%      -0.1        0.72 ±  4%      -0.1        0.72 ±  3%  perf-profile.calltrace.cycles-pp.mas_prev.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.6        0.56 ±  5%      +0.1        0.09 ±223%  perf-profile.calltrace.cycles-pp.anon_vma_interval_tree_insert.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.6        0.64            +0.6        0.60 ±  7%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +0.7        0.69 ±  8%      +0.7        0.70 ±  8%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.8        0.78 ±  4%      +0.9        0.88 ±  5%  perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.8        0.80 ±  2%      +0.7        0.72 ±  3%  perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
     82.26            +0.8       83.07            +0.7       82.95        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.8        0.82 ±  4%      +0.8        0.80 ±  5%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
     81.38            +0.8       82.20            +0.7       82.07        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.8        0.84 ±  4%      +0.8        0.80 ±  7%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
     77.94            +1.0       78.92            +0.8       78.74        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +1.2        1.24 ±  3%      +1.2        1.18 ±  4%  perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +1.3        1.32 ±  2%      +1.4        1.37 ±  3%  perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.4        1.38 ±  2%      +1.4        1.42 ±  4%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.4        1.40 ±  4%      +1.4        1.38        perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.6        1.64            +1.7        1.66 ±  3%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.8        1.82 ±  5%      +1.7        1.73 ±  3%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.9        1.89 ±  3%      +1.9        1.89 ±  2%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.9        1.92 ±  3%      +1.9        1.91        perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
      0.00            +2.3        2.31            +2.3        2.31 ±  3%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +2.7        2.68 ±  2%      +2.6        2.62        perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +2.9        2.92 ±  4%      +2.8        2.81        perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.9        2.93 ±  2%      +2.9        2.94 ±  3%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
     53.19            +3.0       56.14            +2.6       55.83        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +4.4        4.42            +4.4        4.40        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +7.1        7.09            +7.0        6.95        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00           +26.8       26.83           +26.3       26.31        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00           +31.4       31.41           +30.8       30.83        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.93            -1.3        4.62            -1.3        4.60        perf-profile.children.cycles-pp.mas_preallocate
     18.48            -1.3       17.23            -1.0       17.46        perf-profile.children.cycles-pp.perf_event_mmap
     16.41            -1.2       15.23            -0.9       15.47        perf-profile.children.cycles-pp.perf_event_mmap_event
      3.46            -1.1        2.37            -1.1        2.38 ±  3%  perf-profile.children.cycles-pp.mas_wr_store_type
     11.24            -0.7       10.52 ±  2%      -0.7       10.56        perf-profile.children.cycles-pp.perf_iterate_sb
      4.29 ±  3%      -0.4        3.86 ±  2%      -0.5        3.76        perf-profile.children.cycles-pp.mas_walk
      8.61            -0.4        8.21 ±  2%      -0.5        8.07        perf-profile.children.cycles-pp.perf_event_mmap_output
      0.83 ±  7%      -0.4        0.47 ±  8%      -0.4        0.45 ± 10%  perf-profile.children.cycles-pp.may_expand_vm
      3.82            -0.3        3.48 ±  3%      -0.4        3.41        perf-profile.children.cycles-pp.down_write
      5.39            -0.3        5.06            -0.2        5.18 ±  3%  perf-profile.children.cycles-pp.clear_bhb_loop
      1.36 ±  5%      -0.3        1.03 ±  5%      -0.3        1.08 ±  4%  perf-profile.children.cycles-pp.__vm_enough_memory
      5.64 ±  3%      -0.3        5.32            -0.5        5.13        perf-profile.children.cycles-pp.mas_find
      1.18 ±  5%      -0.3        0.88 ±  4%      -0.3        0.87 ±  3%  perf-profile.children.cycles-pp.can_vma_merge_after
      0.57 ± 22%      -0.2        0.33 ± 12%      -0.2        0.36 ±  7%  perf-profile.children.cycles-pp.cap_vm_enough_memory
      1.06 ± 11%      -0.2        0.83 ±  9%      -0.2        0.82 ±  5%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      5.30            -0.2        5.11            -0.2        5.10        perf-profile.children.cycles-pp.__get_unmapped_area
      5.75            -0.2        5.56            -0.2        5.52        perf-profile.children.cycles-pp.check_brk_limits
      0.82 ±  4%      -0.2        0.64 ±  4%      -0.1        0.70 ±  7%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      3.86            -0.2        3.69 ±  2%      -0.1        3.71 ±  2%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
      1.32 ±  2%      -0.2        1.16 ±  8%      -0.1        1.20 ±  3%  perf-profile.children.cycles-pp.sized_strscpy
      2.10 ±  2%      -0.1        1.96 ±  2%      -0.1        2.00 ±  3%  perf-profile.children.cycles-pp.down_write_killable
      1.86 ±  3%      -0.1        1.74 ±  2%      -0.1        1.80 ±  4%  perf-profile.children.cycles-pp.__cond_resched
      0.57 ±  7%      -0.1        0.45 ± 13%      -0.1        0.46 ±  2%  perf-profile.children.cycles-pp.strlen
      2.78            -0.1        2.66 ±  2%      -0.1        2.64 ±  2%  perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown_vmflags
      0.10 ± 13%      +0.0        0.11 ± 16%      +0.0        0.12 ± 11%  perf-profile.children.cycles-pp.vfs_read
      0.10 ± 11%      +0.0        0.12 ± 13%      +0.0        0.14 ±  8%  perf-profile.children.cycles-pp.read
      0.70 ±  2%      +0.1        0.81 ±  6%      +0.1        0.82 ±  7%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
      0.68 ±  9%      +0.2        0.90 ±  3%      +0.2        0.86 ±  6%  perf-profile.children.cycles-pp.mas_wr_slot_store
      0.34 ±  5%      +0.3        0.68 ±  6%      +0.3        0.68 ±  4%  perf-profile.children.cycles-pp.mas_prev_setup
      0.00            +0.4        0.43 ±  2%      +0.4        0.41 ±  7%  perf-profile.children.cycles-pp.mas_next_setup
      6.91 ±  2%      +0.5        7.37            +0.4        7.26        perf-profile.children.cycles-pp.mas_store_prealloc
      0.92 ±  3%      +0.8        1.76 ±  3%      +0.8        1.69 ±  2%  perf-profile.children.cycles-pp.mas_prev
     83.20            +0.9       84.05            +0.7       83.90        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     82.36            +0.9       83.23            +0.7       83.08        perf-profile.children.cycles-pp.do_syscall_64
     78.68            +0.9       79.62            +0.8       79.46        perf-profile.children.cycles-pp.__do_sys_brk
      0.00            +0.9        0.94 ±  3%      +1.1        1.05 ±  5%  perf-profile.children.cycles-pp.mas_next_range
      1.35 ±  5%      +1.2        2.59 ±  3%      +1.2        2.52        perf-profile.children.cycles-pp.mas_prev_slot
      0.84 ±  3%      +1.4        2.26 ±  3%      +1.4        2.27 ±  2%  perf-profile.children.cycles-pp.mas_next_slot
     54.26            +2.3       56.54            +2.0       56.30        perf-profile.children.cycles-pp.do_brk_flags
      0.00           +27.5       27.55           +27.0       27.00        perf-profile.children.cycles-pp.vma_expand
      0.00           +31.9       31.86           +31.1       31.13        perf-profile.children.cycles-pp.vma_merge_new_range
      6.50            -3.3        3.19 ±  2%      -3.1        3.35        perf-profile.self.cycles-pp.do_brk_flags
      3.35            -1.1        2.25            -1.1        2.25 ±  2%  perf-profile.self.cycles-pp.mas_wr_store_type
      5.31 ±  2%      -0.4        4.88            -0.2        5.14 ±  2%  perf-profile.self.cycles-pp.__do_sys_brk
      4.22 ±  3%      -0.4        3.80 ±  2%      -0.5        3.70        perf-profile.self.cycles-pp.mas_walk
      8.47            -0.4        8.07 ±  2%      -0.5        7.94        perf-profile.self.cycles-pp.perf_event_mmap_output
      0.71 ±  8%      -0.3        0.38 ±  8%      -0.3        0.36 ± 12%  perf-profile.self.cycles-pp.may_expand_vm
      5.32            -0.3        5.00            -0.2        5.11 ±  3%  perf-profile.self.cycles-pp.clear_bhb_loop
      1.12 ±  5%      -0.3        0.82 ±  4%      -0.3        0.80 ±  5%  perf-profile.self.cycles-pp.can_vma_merge_after
      2.62            -0.2        2.38 ±  5%      -0.3        2.32 ±  3%  perf-profile.self.cycles-pp.down_write
      0.44 ± 28%      -0.2        0.20 ± 13%      -0.2        0.24 ±  7%  perf-profile.self.cycles-pp.cap_vm_enough_memory
      2.50 ±  3%      -0.2        2.31 ±  2%      -0.2        2.28        perf-profile.self.cycles-pp.mas_preallocate
      0.61 ±  9%      -0.2        0.42 ±  7%      -0.2        0.42 ±  6%  perf-profile.self.cycles-pp.__vm_enough_memory
      1.26 ±  2%      -0.2        1.09 ±  9%      -0.1        1.14 ±  3%  perf-profile.self.cycles-pp.sized_strscpy
      0.57 ±  5%      -0.1        0.44 ±  4%      -0.1        0.47 ±  8%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      2.25 ±  3%      -0.1        2.13 ±  3%      -0.0        2.25 ±  2%  perf-profile.self.cycles-pp.perf_event_mmap_event
      2.70            -0.1        2.60 ±  2%      -0.1        2.58 ±  2%  perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown_vmflags
      0.50 ±  8%      -0.1        0.40 ± 12%      -0.1        0.41 ±  2%  perf-profile.self.cycles-pp.strlen
      0.57 ±  4%      -0.1        0.52 ±  4%      -0.0        0.55 ±  4%  perf-profile.self.cycles-pp.strnlen
      1.40 ±  3%      -0.0        1.34 ±  2%      -0.1        1.33 ±  2%  perf-profile.self.cycles-pp.down_write_killable
      0.01 ±223%      +0.1        0.06 ± 15%      +0.0        0.05 ± 48%  perf-profile.self.cycles-pp.anon_vma_interval_tree_remove
      0.51 ±  4%      +0.1        0.60 ±  5%      +0.1        0.60 ±  7%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
      2.87 ±  2%      +0.2        3.10 ±  2%      +0.2        3.09 ±  2%  perf-profile.self.cycles-pp.mas_store_prealloc
      0.61 ±  8%      +0.2        0.84 ±  4%      +0.2        0.81 ±  7%  perf-profile.self.cycles-pp.mas_wr_slot_store
      0.26 ±  7%      +0.3        0.56 ±  5%      +0.3        0.56 ±  7%  perf-profile.self.cycles-pp.mas_prev_setup
      0.00            +0.3        0.33 ±  3%      +0.3        0.31 ± 10%  perf-profile.self.cycles-pp.mas_next_setup
      0.53 ±  6%      +0.4        0.98 ±  3%      +0.4        0.94 ±  7%  perf-profile.self.cycles-pp.mas_prev
      0.00            +0.6        0.56 ±  5%      +0.6        0.62 ±  5%  perf-profile.self.cycles-pp.mas_next_range
      1.29 ±  4%      +1.2        2.46 ±  3%      +1.1        2.39        perf-profile.self.cycles-pp.mas_prev_slot
      0.72 ±  4%      +1.4        2.07 ±  3%      +1.4        2.09        perf-profile.self.cycles-pp.mas_next_slot
      0.00            +1.4        1.40 ±  4%      +1.2        1.24 ±  5%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +3.6        3.55 ±  6%      +3.4        3.42 ±  2%  perf-profile.self.cycles-pp.vma_expand

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    201.54            +2.9%     207.44            +2.5%     206.52        time.system_time
     97.58            -6.0%      91.75            -5.0%      92.66        time.user_time
   1322908            -5.0%    1256536            -4.1%    1268145        aim9.brk_test.ops_per_sec
    201.54            +2.9%     207.44            +2.5%     206.52        aim9.time.system_time
     97.58            -6.0%      91.75            -5.0%      92.66        aim9.time.user_time
      0.04 ± 82%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.10 ± 60%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
     90.66 ± 71%    +411.1%     463.37 ±113%    +160.5%     236.20 ± 12%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.blk_execute_rq
    127.98 ± 86%    +586.2%     878.13 ±150%    +192.6%     374.47 ± 56%  perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.blk_execute_rq
      0.04 ± 82%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
    111.98 ± 31%    +323.3%     474.03 ±108%    +110.6%     235.86 ± 12%  perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.blk_execute_rq
      0.10 ± 60%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
    149.30 ± 58%    +495.3%     888.79 ±147%    +150.4%     373.80 ± 57%  perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.blk_execute_rq
      0.30 ±  2%      -9.0%       0.27 ±  4%     -11.5%       0.27 ±  7%  perf-stat.i.MPKI
  8.33e+08            +3.9%  8.654e+08            +4.5%  8.708e+08        perf-stat.i.branch-instructions
      1.15            -0.1        1.09            -0.1        1.08        perf-stat.i.branch-miss-rate%
  12964626            -1.9%   12711922            -2.6%   12624576        perf-stat.i.branch-misses
      1.11            -7.4%       1.03            -7.9%       1.03        perf-stat.i.cpi
 3.943e+09            +6.0%   4.18e+09            +6.7%  4.206e+09        perf-stat.i.instructions
      0.91            +7.9%       0.98            +8.5%       0.99        perf-stat.i.ipc
      0.29 ±  2%      -9.1%       0.27 ±  4%     -10.8%       0.26 ±  7%  perf-stat.overall.MPKI
      1.56            -0.1        1.47            -0.1        1.45        perf-stat.overall.branch-miss-rate%
      1.08            -6.8%       1.01            -7.2%       1.01        perf-stat.overall.cpi
      0.92            +7.2%       0.99            +7.8%       0.99        perf-stat.overall.ipc
 8.303e+08            +3.9%  8.627e+08            +4.5%  8.681e+08        perf-stat.ps.branch-instructions
  12931205            -2.0%   12678170            -2.6%   12593410        perf-stat.ps.branch-misses
  3.93e+09            +6.0%  4.167e+09            +6.7%  4.193e+09        perf-stat.ps.instructions
 1.184e+12            +6.1%  1.256e+12            +6.7%  1.263e+12        perf-stat.total.instructions
      7.16 ±  2%      -0.4        6.76 ±  4%      -0.3        6.83 ±  5%  perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.brk
      5.72 ±  2%      -0.4        5.35 ±  3%      -0.2        5.53 ±  4%  perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
      6.13 ±  2%      -0.3        5.84 ±  3%      -0.2        5.97 ±  4%  perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.83 ± 11%      -0.1        0.71 ±  5%      -0.1        0.76 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.6        0.58 ±  5%      +0.6        0.57 ±  8%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
     16.73 ±  2%      +0.6       17.34            +0.5       17.27 ±  4%  perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.7        0.66 ±  6%      +0.6        0.61 ± 45%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
     24.21            +0.7       24.90            +0.5       24.71 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     23.33            +0.7       24.05 ±  2%      +0.5       23.87 ±  3%  perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.8        0.82 ±  4%      +0.9        0.92 ± 11%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +0.9        0.87 ±  5%      +0.9        0.86 ±  6%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.1        1.07 ±  9%      +1.0        1.01 ± 14%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.1        1.10 ±  6%      +1.2        1.15 ± 10%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.3        2.26 ±  5%      +2.2        2.19 ±  5%  perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +7.6        7.56 ±  3%      +7.5        7.48 ±  4%  perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +8.6        8.62 ±  4%      +8.4        8.40 ±  4%  perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.74 ±  2%      -0.4        7.30 ±  4%      -0.4        7.38 ±  5%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      5.81 ±  2%      -0.4        5.43 ±  3%      -0.2        5.60 ±  4%  perf-profile.children.cycles-pp.perf_event_mmap_event
      6.18 ±  2%      -0.3        5.88 ±  3%      -0.2        6.00 ±  4%  perf-profile.children.cycles-pp.perf_event_mmap
      3.93            -0.2        3.73 ±  3%      -0.1        3.81 ±  4%  perf-profile.children.cycles-pp.perf_iterate_sb
      0.22 ± 29%      -0.1        0.08 ± 17%      -0.1        0.09 ± 42%  perf-profile.children.cycles-pp.may_expand_vm
      0.96 ±  3%      -0.1        0.83 ±  4%      -0.0        0.93 ± 11%  perf-profile.children.cycles-pp.vma_complete
      0.61 ± 14%      -0.1        0.52 ±  7%      -0.0        0.57 ±  9%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.15 ±  7%      -0.1        0.08 ± 20%      -0.1        0.08 ± 25%  perf-profile.children.cycles-pp.brk_test
      0.10 ± 11%      +0.0        0.10 ± 28%      +0.0        0.12 ± 10%  perf-profile.children.cycles-pp.run_posix_cpu_timers
      0.08 ± 11%      +0.0        0.12 ± 14%      +0.0        0.12 ± 12%  perf-profile.children.cycles-pp.mas_prev_setup
      0.00            +0.0        0.05 ± 46%      +0.1        0.08 ± 16%  perf-profile.children.cycles-pp.mas_next_setup
      0.24 ± 19%      +0.1        0.31 ±  9%      +0.1        0.32 ±  9%  perf-profile.children.cycles-pp.mas_prev
      0.17 ± 12%      +0.1        0.27 ± 10%      +0.0        0.19 ± 16%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.00            +0.2        0.15 ± 11%      +0.2        0.17 ±  8%  perf-profile.children.cycles-pp.mas_next_range
      0.19 ±  8%      +0.2        0.38 ± 10%      +0.2        0.41 ±  8%  perf-profile.children.cycles-pp.mas_next_slot
      0.34 ± 17%      +0.3        0.64 ±  6%      +0.3        0.61 ±  6%  perf-profile.children.cycles-pp.mas_prev_slot
     23.40            +0.7       24.12 ±  2%      +0.5       23.94 ±  3%  perf-profile.children.cycles-pp.__do_sys_brk
      0.00            +7.6        7.59 ±  3%      +7.5        7.49 ±  4%  perf-profile.children.cycles-pp.vma_expand
      0.00            +8.7        8.66 ±  4%      +8.5        8.46 ±  4%  perf-profile.children.cycles-pp.vma_merge_new_range
      1.61 ± 10%      -0.9        0.69 ±  8%      -0.8        0.83 ± 14%  perf-profile.self.cycles-pp.do_brk_flags
      7.64 ±  2%      -0.4        7.20 ±  4%      -0.4        7.28 ±  5%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.22 ± 30%      -0.1        0.08 ± 17%      -0.1        0.09 ± 42%  perf-profile.self.cycles-pp.may_expand_vm
      0.57 ± 15%      -0.1        0.46 ±  6%      -0.0        0.53 ± 10%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.77 ±  7%      -0.1        0.69 ±  5%      -0.1        0.69 ±  5%  perf-profile.self.cycles-pp.perf_event_mmap_event
      0.15 ±  7%      -0.1        0.08 ± 20%      -0.1        0.08 ± 24%  perf-profile.self.cycles-pp.brk_test
      0.20 ±  5%      -0.0        0.18 ±  4%      +0.0        0.20 ±  9%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
      0.10 ± 11%      +0.0        0.10 ± 28%      +0.0        0.12 ± 10%  perf-profile.self.cycles-pp.run_posix_cpu_timers
      0.07 ± 18%      +0.0        0.10 ± 18%      +0.0        0.11 ± 11%  perf-profile.self.cycles-pp.mas_prev_setup
      0.00            +0.1        0.09 ± 12%      +0.1        0.11 ±  9%  perf-profile.self.cycles-pp.mas_next_range
      0.36 ±  8%      +0.1        0.45 ±  6%      +0.0        0.40 ± 11%  perf-profile.self.cycles-pp.perf_event_mmap
      0.15 ± 13%      +0.1        0.25 ± 14%      +0.0        0.17 ± 16%  perf-profile.self.cycles-pp.mas_wr_store_entry
      0.17 ± 11%      +0.2        0.37 ± 11%      +0.2        0.40 ±  9%  perf-profile.self.cycles-pp.mas_next_slot
      0.34 ± 17%      +0.3        0.64 ±  6%      +0.3        0.61 ±  6%  perf-profile.self.cycles-pp.mas_prev_slot
      0.00            +0.3        0.33 ±  5%      +0.3        0.30 ±  7%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +0.8        0.81 ±  9%      +0.7        0.74 ±  9%  perf-profile.self.cycles-pp.vma_expand


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-10-11  2:46           ` Oliver Sang
@ 2024-10-11  7:26             ` Lorenzo Stoakes
  2024-10-15 19:56               ` Lorenzo Stoakes
  0 siblings, 1 reply; 13+ messages in thread
From: Lorenzo Stoakes @ 2024-10-11  7:26 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin

On Fri, Oct 11, 2024 at 10:46:13AM +0800, Oliver Sang wrote:
> hi, Lorenzo,
>
> On Wed, Oct 09, 2024 at 10:24:58PM +0100, Lorenzo Stoakes wrote:
> > On Wed, Oct 09, 2024 at 02:44:30PM +0800, Oliver Sang wrote:
> > [snip]
> > > >
> > > > I will look into this now, if I provide patches would you be able to test
> > > > them using the same boxes? It'd be much appreciated!
> > >
> > > sure! that's our pleasure!
> > >
> >
> > Hi Oliver,
> >
> > Thanks so much for this, could you give the below a try? I've not tried to
> > seriously test it locally yet, so it'd be good to set your test machines on
> > it.
> >
> > If this doesn't help it suggests call stack/branching might be a thing here
> > in which case I have other approaches I can take before we have to
> > duplicate this code.
> >
> > This patch is against the mm-unstable branch in Andrew's tree [0] but
> > hopefully should apply fine to Linus's too.
> >
> > [0]:https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/
> >
> > Thanks again!
>
> you are welcome!
>
> I found the patch could be applied directly on cacded5e42, so I did it.
> this is our normal practice that we want to avoid impacts from other commits.
>
> but if your patch should reply on some new patches in mm-unstable or mainline,
> please let me know. I could reapply and retest.
>
> I mentioned patch base since I found by my applyment upon cacded5e42, your
> patch seems not have obvious performance impact, still have similar regression.
>
> for brief, I just list 2 examples here. all tests and full data are attached
> as fc21959f74bc11-cacded5e42b960-2e71337ac26478

Thanks for testing this suffices to rule this one out... I will try to get a
functional and reliable performance environment locally so I can properly
address this and then we can try something else.

Thanks!
Lorenzo

>
> (1)
>
> model: Sapphire Rapids
> nr_node: 2
> nr_cpu: 224
> memory: 512G
> brand: Intel(R) Xeon(R) Platinum 8480CTDX
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/brk_test/aim9/300s
>
> commit:
>   fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
>   cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
>   2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>    3540976            -6.4%    3314159            -6.7%    3302864        aim9.brk_test.ops_per_sec
>
>
> (2) which is using same Ivy Bridge-EP in our original report
> (test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory)
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s
>
> commit:
>   fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
>   cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
>   2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>    1322908            -5.0%    1256536            -4.1%    1268145        aim9.brk_test.ops_per_sec
>
> >
> > Best, Lorenzo
> >
> >
> > ----8<----
> > From 7eb4aa421b357668bc44405c58b0444abf44334a Mon Sep 17 00:00:00 2001
> > From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Date: Wed, 9 Oct 2024 21:57:03 +0100
> > Subject: [PATCH] mm: explicitly enable an expand-only merge mode for brk()
> >
> > Try to do less work on brk() to improve perf.
> > ---
> >  mm/mmap.c |  1 +
> >  mm/vma.c  | 25 ++++++++++++++++---------
> >  mm/vma.h  | 11 +++++++++++
> >  3 files changed, 28 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 02f7b45c3076..c2c68ef45a3b 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1740,6 +1740,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> >  	if (vma && vma->vm_end == addr) {
> >  		VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
> >
> > +		vmg.mode = VMA_MERGE_MODE_EXPAND_ONLY;
> >  		vmg.prev = vma;
> >  		vma_iter_next_range(vmi);
> >
> > diff --git a/mm/vma.c b/mm/vma.c
> > index 749c4881fd60..f525a0750c41 100644
> > --- a/mm/vma.c
> > +++ b/mm/vma.c
> > @@ -561,6 +561,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  	unsigned long end = vmg->end;
> >  	pgoff_t pgoff = vmg->pgoff;
> >  	pgoff_t pglen = PHYS_PFN(end - start);
> > +	bool expand_only = vmg_mode_expand_only(vmg);
> >  	bool can_merge_left, can_merge_right;
> >
> >  	mmap_assert_write_locked(vmg->mm);
> > @@ -575,7 +576,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  		return NULL;
> >
> >  	can_merge_left = can_vma_merge_left(vmg);
> > -	can_merge_right = can_vma_merge_right(vmg, can_merge_left);
> > +	can_merge_right = !expand_only && can_vma_merge_right(vmg, can_merge_left);
> >
> >  	/* If we can merge with the next VMA, adjust vmg accordingly. */
> >  	if (can_merge_right) {
> > @@ -603,13 +604,18 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
> >  		return vmg->vma;
> >  	}
> >
> > -	/* If expansion failed, reset state. Allows us to retry merge later. */
> > -	vmg->vma = NULL;
> > -	vmg->start = start;
> > -	vmg->end = end;
> > -	vmg->pgoff = pgoff;
> > -	if (vmg->vma == prev)
> > -		vma_iter_set(vmg->vmi, start);
> > +	/*
> > +	 * Unless in expand only case and expansion failed, reset state.
> > +	 * Allows us to retry merge later.
> > +	 */
> > +	if (!expand_only) {
> > +		vmg->vma = NULL;
> > +		vmg->start = start;
> > +		vmg->end = end;
> > +		vmg->pgoff = pgoff;
> > +		if (vmg->vma == prev)
> > +			vma_iter_set(vmg->vmi, start);
> > +	}
> >
> >  	return NULL;
> >  }
> > @@ -641,7 +647,8 @@ int vma_expand(struct vma_merge_struct *vmg)
> >  	mmap_assert_write_locked(vmg->mm);
> >
> >  	vma_start_write(vma);
> > -	if (next && (vma != next) && (vmg->end == next->vm_end)) {
> > +	if (!vmg_mode_expand_only(vmg) && next &&
> > +	    (vma != next) && (vmg->end == next->vm_end)) {
> >  		int ret;
> >
> >  		remove_next = true;
> > diff --git a/mm/vma.h b/mm/vma.h
> > index 82354fe5edd0..14224b36a979 100644
> > --- a/mm/vma.h
> > +++ b/mm/vma.h
> > @@ -52,6 +52,11 @@ struct vma_munmap_struct {
> >  	unsigned long data_vm;
> >  };
> >
> > +enum vma_merge_mode {
> > +	VMA_MERGE_MODE_NORMAL,
> > +	VMA_MERGE_MODE_EXPAND_ONLY,
> > +};
> > +
> >  enum vma_merge_state {
> >  	VMA_MERGE_START,
> >  	VMA_MERGE_ERROR_NOMEM,
> > @@ -75,9 +80,15 @@ struct vma_merge_struct {
> >  	struct mempolicy *policy;
> >  	struct vm_userfaultfd_ctx uffd_ctx;
> >  	struct anon_vma_name *anon_name;
> > +	enum vma_merge_mode mode;
> >  	enum vma_merge_state state;
> >  };
> >
> > +static inline bool vmg_mode_expand_only(struct vma_merge_struct *vmg)
> > +{
> > +	return vmg->mode == VMA_MERGE_MODE_EXPAND_ONLY;
> > +}
> > +
> >  static inline bool vmg_nomem(struct vma_merge_struct *vmg)
> >  {
> >  	return vmg->state == VMA_MERGE_ERROR_NOMEM;
> > --
> > 2.46.2

> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-gnr-1ap1/brk_test/aim9/300s
>
> commit:
>   fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
>   cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
>   2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>    3220697            -6.0%    3028867            -6.4%    3014713        aim9.brk_test.ops_per_sec
>      24.58            -3.9%      23.63            -5.5%      23.24        time.user_time
>     119459            -3.2%     115601            -2.9%     115971        proc-vmstat.nr_active_anon
>     120943            -3.2%     117079            -2.9%     117450        proc-vmstat.nr_shmem
>     119459            -3.2%     115601            -2.9%     115971        proc-vmstat.nr_zone_active_anon
>       0.02 �120%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       3.27 �  5%   +5112.4%     170.40 �218%   +5144.5%     171.45 �216%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>       0.20 �188%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.01 � 70%    +100.0%       0.01 � 84%   +3512.9%       0.19 �199%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       0.93 � 16%      -4.1%       0.89 � 14%     -25.0%       0.70 � 11%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>       0.02 �120%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.20 �188%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.01 � 70%    +100.0%       0.01 � 84%   +3512.9%       0.19 �199%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
>       0.02 �  2%      -4.1%       0.02 �  2%      -6.3%       0.02 �  4%  perf-stat.i.MPKI
>  1.767e+09            +4.2%  1.841e+09            +3.7%  1.833e+09        perf-stat.i.branch-instructions
>       0.45            -6.2%       0.42            -5.9%       0.42        perf-stat.i.cpi
>  8.347e+09            +6.6%    8.9e+09            +6.2%  8.863e+09        perf-stat.i.instructions
>       2.27            +6.6%       2.42            +6.0%       2.41        perf-stat.i.ipc
>       0.03 �  4%      -2.0%       0.03 �  3%      -7.8%       0.03 �  4%  perf-stat.overall.MPKI
>       0.44            -5.9%       0.42            -5.4%       0.42        perf-stat.overall.cpi
>       2.25            +6.2%       2.39            +5.7%       2.38        perf-stat.overall.ipc
>  1.761e+09            +4.2%  1.834e+09            +3.7%  1.827e+09        perf-stat.ps.branch-instructions
>  8.319e+09            +6.6%   8.87e+09            +6.2%  8.834e+09        perf-stat.ps.instructions
>  2.519e+12            +6.4%   2.68e+12            +5.8%  2.665e+12        perf-stat.total.instructions
>       7.07            -7.1        0.00            -7.1        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.30            -6.3        0.00            -6.3        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      18.35            -1.0       17.36            -1.4       16.92        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      16.40            -0.9       15.47            -1.3       15.05        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
>      10.17            -0.8        9.36            -1.2        8.93        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
>      11.92            -0.8       11.12            -1.3       10.64        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
>       5.07 �  3%      -0.2        4.84 �  2%      -0.2        4.88 �  3%  perf-profile.calltrace.cycles-pp.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       5.40 �  3%      -0.2        5.18 �  2%      -0.1        5.28 �  3%  perf-profile.calltrace.cycles-pp.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       3.66 �  2%      -0.2        3.50 �  2%      -0.1        3.52 �  3%  perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64
>       0.60 �  5%      -0.1        0.46 � 45%      -0.2        0.36 � 70%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.brk
>       1.66 �  2%      -0.1        1.56 �  3%      -0.1        1.60 �  2%  perf-profile.calltrace.cycles-pp.up_write.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.68 �  3%      -0.1        0.60 �  5%      -0.1        0.60 �  5%  perf-profile.calltrace.cycles-pp.kfree.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
>       5.91 �  2%      -0.1        5.85            -0.4        5.49        perf-profile.calltrace.cycles-pp.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.97 �  4%      -0.1        0.91 �  4%      -0.1        0.91 �  3%  perf-profile.calltrace.cycles-pp.mas_next_slot.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       4.23 �  2%      -0.0        4.21            -0.4        3.82        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.37 � 70%      +0.3        0.67 �  4%      +0.2        0.57 � 44%  perf-profile.calltrace.cycles-pp.strlen.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
>       0.00            +0.5        0.47 � 44%      +0.6        0.58 �  5%  perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>       0.49 � 44%      +0.5        1.02 �  5%      +0.5        1.02 �  8%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      83.74            +0.5       84.28            +0.6       84.32        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.6        0.60 �  6%      +0.6        0.58 �  7%  perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.7        0.65 �  7%      +0.7        0.66 �  4%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +0.7        0.68 �  4%      +0.7        0.67 �  8%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +0.7        0.68 �  2%      +0.8        0.80 �  4%  perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      80.24            +0.7       80.95            +0.7       80.98        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.7        0.74 �  2%      +0.8        0.76 �  2%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +0.8        0.75 �  4%      +0.8        0.81 �  5%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +0.8        0.81 �  3%      +0.7        0.69 �  7%  perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +0.8        0.84 �  5%      +0.8        0.84 �  6%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +1.3        1.30 �  5%      +1.3        1.32 �  2%  perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +1.4        1.35 �  4%      +1.3        1.32 �  4%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.6        1.60 �  4%      +1.6        1.56 �  4%  perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +1.8        1.76 �  2%      +1.9        1.86 �  2%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
>       0.00            +1.8        1.78 �  2%      +1.6        1.64 �  4%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +2.0        2.03            +2.0        2.04 �  2%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +2.1        2.06 �  3%      +2.1        2.06 �  4%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +2.3        2.29 �  3%      +2.4        2.37 �  2%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>      53.64            +2.6       56.21            +2.6       56.28        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +3.1        3.14 �  2%      +3.1        3.10 �  5%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +3.2        3.25            +3.6        3.64 �  3%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +3.8        3.84            +3.9        3.86 �  2%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +5.3        5.31 �  2%      +5.7        5.67 �  2%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +6.1        6.07            +6.4        6.41        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00           +27.7       27.74           +28.3       28.33        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00           +32.4       32.43           +33.0       33.02        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      18.49            -1.0       17.47            -1.5       17.01        perf-profile.children.cycles-pp.perf_event_mmap
>       6.54            -1.0        5.54 �  2%      -0.7        5.90 �  2%  perf-profile.children.cycles-pp.mas_preallocate
>       7.40            -1.0        6.40 �  2%      -0.6        6.76        perf-profile.children.cycles-pp.mas_store_prealloc
>       5.68            -1.0        4.72            -1.0        4.66 �  3%  perf-profile.children.cycles-pp.up_write
>      16.88            -0.9       15.93            -1.4       15.51        perf-profile.children.cycles-pp.perf_event_mmap_event
>      10.35            -0.8        9.53            -1.2        9.10        perf-profile.children.cycles-pp.perf_event_mmap_output
>      12.16            -0.8       11.35            -1.3       10.86        perf-profile.children.cycles-pp.perf_iterate_sb
>       4.02 �  2%      -0.7        3.32            -0.3        3.72 �  3%  perf-profile.children.cycles-pp.mas_wr_store_type
>       2.97            -0.6        2.37 �  3%      -0.5        2.45 �  2%  perf-profile.children.cycles-pp.mas_update_gap
>       1.36 �  8%      -0.6        0.80 �  4%      -0.5        0.86 �  4%  perf-profile.children.cycles-pp.can_vma_merge_after
>       2.26 �  2%      -0.5        1.80 �  2%      -0.4        1.89 �  2%  perf-profile.children.cycles-pp.mas_leaf_max_gap
>       3.71 �  2%      -0.3        3.44            -0.3        3.42 �  4%  perf-profile.children.cycles-pp.vma_complete
>       5.62 �  3%      -0.2        5.40 �  2%      -0.1        5.51 �  3%  perf-profile.children.cycles-pp.check_brk_limits
>       3.83 �  2%      -0.2        3.65 �  2%      -0.2        3.67 �  3%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
>       0.66 �  7%      -0.1        0.55 �  9%      -0.1        0.54 �  6%  perf-profile.children.cycles-pp.may_expand_vm
>       1.98 �  3%      -0.1        1.86 �  2%      -0.3        1.71 �  4%  perf-profile.children.cycles-pp.init_multi_vma_prep
>       0.78 �  3%      -0.1        0.69 �  4%      -0.1        0.69 �  5%  perf-profile.children.cycles-pp.kfree
>       0.15 � 12%      -0.1        0.08 � 13%      -0.1        0.07 � 23%  perf-profile.children.cycles-pp.arch_vma_name
>       6.23 �  2%      -0.1        6.17            -0.4        5.78        perf-profile.children.cycles-pp.mas_find
>       0.60 �  6%      -0.1        0.54 �  8%      -0.1        0.51 �  7%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
>       4.32 �  2%      -0.0        4.30            -0.4        3.92        perf-profile.children.cycles-pp.mas_walk
>       0.20 �  8%      -0.0        0.17 �  7%      -0.0        0.15 � 14%  perf-profile.children.cycles-pp.__x64_sys_brk
>       0.26 �  5%      -0.0        0.24 �  9%      -0.0        0.22 �  9%  perf-profile.children.cycles-pp.__rb_insert_augmented
>       0.23 �  7%      +0.0        0.24 � 18%      +0.0        0.27 �  6%  perf-profile.children.cycles-pp.hrtimer_interrupt
>       0.24 �  7%      +0.0        0.24 � 18%      +0.0        0.27 �  5%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>       0.58 �  7%      +0.1        0.66 �  7%      +0.1        0.67 �  2%  perf-profile.children.cycles-pp.mas_wr_slot_store
>       0.19 � 10%      +0.1        0.31 � 10%      +0.1        0.32 � 16%  perf-profile.children.cycles-pp.rb_next
>       0.50 �  4%      +0.1        0.62 �  7%      +0.2        0.66 �  7%  perf-profile.children.cycles-pp.mas_wr_store_entry
>       0.40 �  6%      +0.1        0.53 �  6%      +0.1        0.54 �  6%  perf-profile.children.cycles-pp.strnlen
>       0.58 � 13%      +0.2        0.75 �  4%      +0.1        0.72 � 13%  perf-profile.children.cycles-pp.strlen
>       0.96 �  6%      +0.2        1.14 �  3%      +0.2        1.16 �  3%  perf-profile.children.cycles-pp.rcu_all_qs
>       0.68 �  3%      +0.3        0.98 �  5%      +0.3        1.01 �  6%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
>       1.77 �  4%      +0.3        2.09            +0.3        2.08 �  4%  perf-profile.children.cycles-pp.__cond_resched
>       0.00            +0.4        0.36 �  9%      +0.4        0.36 �  8%  perf-profile.children.cycles-pp.mas_next_setup
>       0.36 �  8%      +0.4        0.76 �  3%      +0.4        0.76 �  9%  perf-profile.children.cycles-pp.percpu_counter_add_batch
>       0.48 �  7%      +0.4        0.90 �  6%      +0.4        0.86 �  4%  perf-profile.children.cycles-pp.mas_prev_setup
>      84.69            +0.5       85.19            +0.6       85.27        perf-profile.children.cycles-pp.do_syscall_64
>       0.67 �  9%      +0.6        1.24 �  4%      +0.6        1.27 �  7%  perf-profile.children.cycles-pp.__vm_enough_memory
>       3.81            +0.6        4.39            +0.6        4.40 �  3%  perf-profile.children.cycles-pp.down_write
>      80.98            +0.7       81.64            +0.7       81.71        perf-profile.children.cycles-pp.__do_sys_brk
>       1.05 �  4%      +0.7        1.72 �  3%      +0.8        1.82 �  3%  perf-profile.children.cycles-pp.mas_next_slot
>       0.00            +0.7        0.70 �  6%      +0.7        0.69 �  5%  perf-profile.children.cycles-pp.mas_next_range
>       1.11 �  4%      +1.0        2.10 �  3%      +0.8        1.92 �  3%  perf-profile.children.cycles-pp.mas_prev
>       2.82 �  3%      +1.2        4.07            +1.3        4.09 �  2%  perf-profile.children.cycles-pp.vma_prepare
>       1.54 �  4%      +1.3        2.88 �  3%      +1.3        2.84 �  3%  perf-profile.children.cycles-pp.mas_prev_slot
>      54.97            +1.6       56.61            +1.8       56.79        perf-profile.children.cycles-pp.do_brk_flags
>       0.00           +28.6       28.64           +29.2       29.16        perf-profile.children.cycles-pp.vma_expand
>       0.00           +32.9       32.91           +33.4       33.37        perf-profile.children.cycles-pp.vma_merge_new_range
>       5.90 �  2%      -3.5        2.37 �  4%      -3.4        2.55        perf-profile.self.cycles-pp.do_brk_flags
>       5.36 �  2%      -1.0        4.38            -1.0        4.36 �  3%  perf-profile.self.cycles-pp.up_write
>      10.18            -0.8        9.36            -1.2        8.94        perf-profile.self.cycles-pp.perf_event_mmap_output
>       3.86 �  2%      -0.7        3.18            -0.3        3.57 �  3%  perf-profile.self.cycles-pp.mas_wr_store_type
>       1.28 �  7%      -0.5        0.74 �  4%      -0.5        0.78 �  4%  perf-profile.self.cycles-pp.can_vma_merge_after
>       3.02 �  2%      -0.5        2.52 �  4%      -0.3        2.75 �  3%  perf-profile.self.cycles-pp.mas_store_prealloc
>       2.19 �  2%      -0.4        1.78 �  2%      -0.3        1.87 �  2%  perf-profile.self.cycles-pp.mas_leaf_max_gap
>       5.03            -0.4        4.67            -0.1        4.92 �  2%  perf-profile.self.cycles-pp.__do_sys_brk
>       2.60 �  4%      -0.3        2.27 �  5%      -0.3        2.25 �  2%  perf-profile.self.cycles-pp.mas_preallocate
>       1.89 �  4%      -0.3        1.59 �  4%      -0.3        1.62 �  5%  perf-profile.self.cycles-pp.perf_event_mmap_event
>       1.71 �  4%      -0.2        1.53 �  3%      -0.2        1.50 �  5%  perf-profile.self.cycles-pp.entry_SYSCALL_64
>       0.74 �  3%      -0.2        0.57 �  7%      -0.2        0.56 �  6%  perf-profile.self.cycles-pp.mas_update_gap
>       1.89 �  4%      -0.2        1.73 �  2%      -0.3        1.62 �  4%  perf-profile.self.cycles-pp.init_multi_vma_prep
>       1.58 �  4%      -0.1        1.47 �  3%      -0.2        1.42 �  4%  perf-profile.self.cycles-pp.perf_event_mmap
>       1.27 �  2%      -0.1        1.16 �  2%      -0.1        1.19 �  9%  perf-profile.self.cycles-pp.vma_complete
>       1.20 �  2%      -0.1        1.12 �  4%      -0.1        1.08 �  2%  perf-profile.self.cycles-pp.do_syscall_64
>       0.69 �  2%      -0.1        0.61 �  4%      -0.1        0.60 �  5%  perf-profile.self.cycles-pp.kfree
>       0.60 �  6%      -0.1        0.54 �  8%      -0.1        0.51 �  7%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
>       1.60 �  2%      -0.1        1.56 �  6%      -0.1        1.55        perf-profile.self.cycles-pp.down_write_killable
>       4.24 �  2%      -0.0        4.20            -0.4        3.84        perf-profile.self.cycles-pp.mas_walk
>       0.50 �  7%      -0.0        0.47 �  9%      -0.1        0.45 �  5%  perf-profile.self.cycles-pp.may_expand_vm
>       0.42 �  4%      +0.1        0.49 �  7%      +0.1        0.53 �  6%  perf-profile.self.cycles-pp.mas_wr_store_entry
>       0.15 � 10%      +0.1        0.24 � 11%      +0.1        0.23 � 19%  perf-profile.self.cycles-pp.rb_next
>       0.58 �  8%      +0.1        0.68 �  5%      +0.1        0.71 �  6%  perf-profile.self.cycles-pp.rcu_all_qs
>       0.37 �  5%      +0.1        0.50 �  7%      +0.1        0.50 �  4%  perf-profile.self.cycles-pp.strnlen
>       0.54 � 13%      +0.2        0.68 �  4%      +0.1        0.64 � 14%  perf-profile.self.cycles-pp.strlen
>       1.01 �  6%      +0.2        1.17 �  2%      +0.1        1.11 �  4%  perf-profile.self.cycles-pp.__cond_resched
>       0.66 �  6%      +0.2        0.83 �  2%      +0.2        0.84 �  3%  perf-profile.self.cycles-pp.vma_prepare
>       0.46 �  6%      +0.2        0.67 �  7%      +0.2        0.70 �  5%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
>       0.34 � 12%      +0.2        0.54 �  3%      +0.2        0.58 �  9%  perf-profile.self.cycles-pp.__vm_enough_memory
>       0.00            +0.3        0.29 � 10%      +0.3        0.28 � 11%  perf-profile.self.cycles-pp.mas_next_setup
>       0.32 � 11%      +0.3        0.62 �  7%      +0.3        0.58 �  8%  perf-profile.self.cycles-pp.mas_prev_setup
>       0.23 �  7%      +0.3        0.55 �  6%      +0.3        0.55 � 10%  perf-profile.self.cycles-pp.percpu_counter_add_batch
>       0.00            +0.4        0.35 �  7%      +0.3        0.30 �  7%  perf-profile.self.cycles-pp.mas_next_range
>       2.65 �  3%      +0.4        3.00 �  2%      +0.4        3.05 �  2%  perf-profile.self.cycles-pp.down_write
>       0.64 �  5%      +0.6        1.21 �  3%      +0.5        1.12 �  5%  perf-profile.self.cycles-pp.mas_prev
>       0.89 �  5%      +0.7        1.54 �  3%      +0.8        1.64 �  4%  perf-profile.self.cycles-pp.mas_next_slot
>       1.46 �  4%      +1.3        2.72 �  3%      +1.2        2.70 �  3%  perf-profile.self.cycles-pp.mas_prev_slot
>       0.00            +1.3        1.33 �  2%      +1.2        1.16 �  4%  perf-profile.self.cycles-pp.vma_merge_new_range
>       0.00            +3.5        3.54 �  3%      +3.6        3.55 �  2%  perf-profile.self.cycles-pp.vma_expand
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-emr-2sp1/brk_test/aim9/300s
>
> commit:
>   fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
>   cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
>   2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>    3669298            -6.5%    3430070            -6.8%    3420919        aim9.brk_test.ops_per_sec
>      23.53            -4.9%      22.38            -6.4%      22.03        time.user_time
>     491107 �  5%      -7.2%     455906 �  6%      -5.5%     464283 �  6%  meminfo.Active
>     491011 �  5%      -7.2%     455810 �  6%      -5.5%     464155 �  6%  meminfo.Active(anon)
>     505666 �  5%      -7.0%     470410 �  5%      -5.3%     478879 �  6%  meminfo.Shmem
>     122753 �  5%      -7.1%     113979 �  6%      -5.5%     116019 �  6%  proc-vmstat.nr_active_anon
>     899298            -1.0%     890515            -0.7%     892592        proc-vmstat.nr_file_pages
>     126417 �  5%      -6.9%     117634 �  5%      -5.3%     119701 �  6%  proc-vmstat.nr_shmem
>     122753 �  5%      -7.1%     113979 �  6%      -5.5%     116019 �  6%  proc-vmstat.nr_zone_active_anon
>     595.50 � 22%     +53.6%     914.50 � 12%     +17.1%     697.33 � 20%  proc-vmstat.numa_hint_faults_local
>      17958            -4.3%      17188 �  2%      -4.3%      17180 �  2%  proc-vmstat.pgactivate
>    1817569 � 69%     -43.1%    1035076 �127%     +63.5%    2972153 �  3%  numa-meminfo.node0.FilePages
>      16515 � 73%     -29.4%      11657 �108%     +79.5%      29650        numa-meminfo.node0.Mapped
>    1811617 � 69%     -43.2%    1029482 �128%     +63.8%    2967951 �  3%  numa-meminfo.node0.Unevictable
>      40474 � 40%     -61.8%      15444 � 40%     -50.4%      20065 � 59%  numa-meminfo.node1.KReclaimable
>      40474 � 40%     -61.8%      15444 � 40%     -50.4%      20065 � 59%  numa-meminfo.node1.SReclaimable
>     484115 �  6%      -7.3%     448760 �  6%     -10.1%     435387 � 11%  numa-meminfo.node3.Active
>     484083 �  6%      -7.3%     448760 �  6%     -10.1%     435387 � 11%  numa-meminfo.node3.Active(anon)
>     485577 �  6%      -7.3%     450224 �  6%     -10.0%     436799 � 11%  numa-meminfo.node3.Shmem
>     454393 � 69%     -43.1%     258770 �127%     +63.5%     743038 �  3%  numa-vmstat.node0.nr_file_pages
>       4178 � 73%     -28.5%       2988 �107%     +81.6%       7590 �  2%  numa-vmstat.node0.nr_mapped
>     452904 � 69%     -43.2%     257370 �128%     +63.8%     741987 �  3%  numa-vmstat.node0.nr_unevictable
>     452904 � 69%     -43.2%     257370 �128%     +63.8%     741987 �  3%  numa-vmstat.node0.nr_zone_unevictable
>      10118 � 40%     -61.8%       3861 � 40%     -50.4%       5016 � 59%  numa-vmstat.node1.nr_slab_reclaimable
>     121015 �  6%      -7.3%     112196 �  6%     -10.1%     108836 � 11%  numa-vmstat.node3.nr_active_anon
>     121371 �  6%      -7.3%     112537 �  6%     -10.1%     109168 � 11%  numa-vmstat.node3.nr_shmem
>     121015 �  6%      -7.3%     112196 �  6%     -10.1%     108836 � 11%  numa-vmstat.node3.nr_zone_active_anon
>       0.01 � 52%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.01 � 15%      +7.0%       0.01 � 16%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>       0.06 � 69%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.01 � 17%      -3.8%       0.01 � 21%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>     400.06            +0.0%     400.07          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>      10.00            +0.0%      10.00          -100.0%       0.00        perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>     999.53            -0.0%     999.38          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>       0.01 � 52%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>     400.05            +0.0%     400.06          -100.0%       0.00        perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>       0.06 � 69%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>     999.52            -0.0%     999.37          -100.0%       0.00        perf-sched.wait_time.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
>  2.071e+09            +2.8%  2.128e+09            +2.8%   2.13e+09        perf-stat.i.branch-instructions
>       0.48            -4.2%       0.46            -4.9%       0.45        perf-stat.i.cpi
>  4.717e+09            -0.7%  4.686e+09            +0.1%  4.723e+09        perf-stat.i.cpu-cycles
>  9.794e+09            +5.1%   1.03e+10            +5.2%   1.03e+10        perf-stat.i.instructions
>       2.15            +5.8%       2.28            +5.5%       2.27        perf-stat.i.ipc
>       0.34 �  3%      -0.0        0.33            -0.0        0.34        perf-stat.overall.branch-miss-rate%
>       0.48            -5.5%       0.46            -4.8%       0.46        perf-stat.overall.cpi
>       2.08            +5.8%       2.20            +5.1%       2.18        perf-stat.overall.ipc
>  2.063e+09            +2.8%   2.12e+09            +2.8%  2.122e+09        perf-stat.ps.branch-instructions
>  4.703e+09            -0.7%  4.672e+09            +0.1%  4.709e+09        perf-stat.ps.cpu-cycles
>  9.758e+09            +5.1%  1.026e+10            +5.2%  1.026e+10        perf-stat.ps.instructions
>  2.944e+12            +5.5%  3.106e+12            +5.0%  3.092e+12        perf-stat.total.instructions
>       6.54 �  2%      -6.5        0.00            -6.5        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.22            -6.2        0.00            -6.2        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      17.43            -0.7       16.76            -1.0       16.39        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       9.69 �  2%      -0.6        9.07            -1.0        8.73        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
>      11.30 �  2%      -0.6       10.71            -0.9       10.35        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
>      15.57            -0.5       15.05            -0.8       14.73        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
>       2.76            -0.1        2.62 �  3%      -0.2        2.60 �  3%  perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.84 �  4%      -0.1        0.74 �  8%      -0.1        0.70 �  5%  perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.75 �  7%      -0.1        0.68 � 10%      -0.1        0.64 � 10%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>       0.75 �  7%      -0.1        0.68 � 10%      -0.1        0.64 � 10%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>       0.75 �  7%      -0.1        0.68 � 10%      -0.1        0.64 � 10%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>       1.12 �  5%      +0.2        1.29 �  3%      +0.2        1.29 �  2%  perf-profile.calltrace.cycles-pp.sized_strscpy.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
>       0.65 �  6%      +0.4        1.07 �  5%      +0.4        1.07 �  4%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.5        0.48 � 45%      +0.5        0.54 �  6%  perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.5        0.54 �  4%      +0.4        0.36 � 70%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +0.5        0.55 �  4%      +0.5        0.48 � 45%  perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +0.7        0.66 �  4%      +0.8        0.80 �  4%  perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.7        0.68 �  9%      +0.7        0.71 �  5%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +0.7        0.68 �  4%      +0.8        0.76 �  6%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +0.8        0.76 �  2%      +0.8        0.77 �  7%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>      81.90            +0.8       82.67            +0.7       82.64        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.8        0.80 �  3%      +0.7        0.65 �  8%  perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>      80.94            +0.8       81.76            +0.8       81.74        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.8        0.82 �  3%      +0.9        0.87 �  6%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
>      77.52            +1.0       78.50            +0.9       78.40        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +1.3        1.26 �  3%      +1.3        1.33 �  5%  perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +1.3        1.35 �  3%      +1.3        1.30 �  4%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.6        1.56 �  2%      +1.6        1.56 �  5%  perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +1.7        1.72 �  3%      +1.7        1.66 �  3%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.9        1.87 �  4%      +1.9        1.94 �  4%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
>       0.00            +2.1        2.07 �  2%      +2.1        2.15 �  4%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +2.1        2.14 �  2%      +2.1        2.12 �  3%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +2.4        2.37 �  2%      +2.4        2.41 �  3%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>      51.80            +2.9       54.66            +2.7       54.52        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +3.0        3.02 �  2%      +3.2        3.17        perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +3.1        3.06            +3.1        3.08 �  2%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +3.9        3.86            +4.0        3.96 �  2%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +5.0        5.01            +5.2        5.18 �  2%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +5.9        5.88            +6.1        6.10        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00           +27.1       27.13           +27.4       27.39        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00           +31.6       31.63           +32.0       32.01        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.46            -1.2        5.24            -1.1        5.40 �  2%  perf-profile.children.cycles-pp.mas_preallocate
>       5.54            -0.9        4.64            -0.9        4.63 �  3%  perf-profile.children.cycles-pp.up_write
>       3.99            -0.9        3.10 �  2%      -0.8        3.24        perf-profile.children.cycles-pp.mas_wr_store_type
>      17.57            -0.7       16.87            -1.1       16.50        perf-profile.children.cycles-pp.perf_event_mmap
>       9.85 �  2%      -0.6        9.22            -1.0        8.90        perf-profile.children.cycles-pp.perf_event_mmap_output
>       6.82 �  2%      -0.6        6.22            -0.4        6.43        perf-profile.children.cycles-pp.mas_store_prealloc
>       1.33 �  5%      -0.6        0.75 �  4%      -0.5        0.82 �  5%  perf-profile.children.cycles-pp.can_vma_merge_after
>      11.53 �  2%      -0.6       10.96            -1.0       10.58        perf-profile.children.cycles-pp.perf_iterate_sb
>      16.03            -0.5       15.50            -0.8       15.18        perf-profile.children.cycles-pp.perf_event_mmap_event
>       2.65 �  3%      -0.2        2.40 �  3%      -0.2        2.44 �  3%  perf-profile.children.cycles-pp.mas_update_gap
>       2.18 �  2%      -0.2        1.94 �  3%      -0.2        1.99 �  3%  perf-profile.children.cycles-pp.mas_leaf_max_gap
>       2.85            -0.2        2.70 �  3%      -0.2        2.68 �  3%  perf-profile.children.cycles-pp.userfaultfd_unmap_complete
>       3.52            -0.1        3.38 �  2%      -0.1        3.38 �  2%  perf-profile.children.cycles-pp.vma_complete
>       0.62 �  7%      -0.1        0.48 �  9%      -0.1        0.50 �  7%  perf-profile.children.cycles-pp.may_expand_vm
>       1.92 �  2%      -0.1        1.79 �  3%      -0.2        1.73 �  3%  perf-profile.children.cycles-pp.init_multi_vma_prep
>       1.05 �  3%      -0.1        0.95 �  6%      -0.1        0.91 �  6%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
>       0.75 �  7%      -0.1        0.68 � 10%      -0.1        0.64 � 10%  perf-profile.children.cycles-pp.kthread
>       0.40 �  6%      -0.1        0.35 �  9%      -0.0        0.38 �  8%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.35 �  2%      -0.0        0.33 �  7%      -0.0        0.31 �  6%  perf-profile.children.cycles-pp.brk_test
>       0.11 � 20%      +0.0        0.14 �  8%      +0.0        0.13 �  6%  perf-profile.children.cycles-pp.anon_vma_interval_tree_remove
>       0.52 �  3%      +0.0        0.56 �  4%      +0.0        0.54 � 10%  perf-profile.children.cycles-pp.mas_wr_slot_store
>       0.20 � 11%      +0.1        0.28 �  7%      +0.1        0.31 �  5%  perf-profile.children.cycles-pp.rb_next
>       0.49 �  3%      +0.1        0.61 �  4%      +0.2        0.64 �  7%  perf-profile.children.cycles-pp.mas_wr_store_entry
>       0.98 �  4%      +0.1        1.11 �  3%      +0.1        1.11 �  4%  perf-profile.children.cycles-pp.rcu_all_qs
>       0.39 �  7%      +0.2        0.55 �  7%      +0.2        0.60 �  6%  perf-profile.children.cycles-pp.strnlen
>       1.18 �  5%      +0.2        1.37 �  3%      +0.2        1.36 �  2%  perf-profile.children.cycles-pp.sized_strscpy
>       1.78 �  3%      +0.3        2.04 �  2%      +0.2        2.02 �  3%  perf-profile.children.cycles-pp.__cond_resched
>       0.00            +0.3        0.33 �  4%      +0.3        0.31 � 10%  perf-profile.children.cycles-pp.mas_next_setup
>       0.41 �  9%      +0.4        0.76 �  7%      +0.4        0.80 �  6%  perf-profile.children.cycles-pp.percpu_counter_add_batch
>       0.58 �  4%      +0.4        0.96 �  2%      +0.4        1.01 �  6%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
>       0.44 � 17%      +0.4        0.85 �  7%      +0.4        0.80 �  5%  perf-profile.children.cycles-pp.mas_prev_setup
>       4.11 �  2%      +0.4        4.52            +0.5        4.58        perf-profile.children.cycles-pp.down_write
>       0.74 �  6%      +0.6        1.29 �  5%      +0.6        1.32 �  3%  perf-profile.children.cycles-pp.__vm_enough_memory
>       0.00            +0.7        0.67 �  6%      +0.6        0.63 �  7%  perf-profile.children.cycles-pp.mas_next_range
>       0.95 �  5%      +0.7        1.64 �  2%      +0.9        1.84 �  3%  perf-profile.children.cycles-pp.mas_next_slot
>      78.23            +0.9       79.17            +0.8       79.07        perf-profile.children.cycles-pp.__do_sys_brk
>       1.02 � 14%      +1.0        1.99 �  4%      +0.8        1.86 �  6%  perf-profile.children.cycles-pp.mas_prev
>       2.89 �  3%      +1.2        4.10            +1.3        4.19 �  2%  perf-profile.children.cycles-pp.vma_prepare
>       1.38 � 12%      +1.3        2.73 �  4%      +1.3        2.73 �  3%  perf-profile.children.cycles-pp.mas_prev_slot
>      53.08            +1.9       55.03            +1.9       54.96        perf-profile.children.cycles-pp.do_brk_flags
>       0.00           +28.0       27.95           +28.2       28.20        perf-profile.children.cycles-pp.vma_expand
>       0.00           +32.1       32.10           +32.3       32.32        perf-profile.children.cycles-pp.vma_merge_new_range
>       5.69            -3.4        2.34 �  3%      -3.3        2.36 �  3%  perf-profile.self.cycles-pp.do_brk_flags
>       5.22            -0.9        4.33 �  2%      -0.9        4.33 �  3%  perf-profile.self.cycles-pp.up_write
>       3.82            -0.9        2.95 �  3%      -0.7        3.09        perf-profile.self.cycles-pp.mas_wr_store_type
>       9.68 �  2%      -0.6        9.05            -1.0        8.73        perf-profile.self.cycles-pp.perf_event_mmap_output
>       1.28 �  5%      -0.6        0.69 �  6%      -0.5        0.75 �  5%  perf-profile.self.cycles-pp.can_vma_merge_after
>       2.88 �  3%      -0.4        2.44 �  2%      -0.2        2.65 �  2%  perf-profile.self.cycles-pp.mas_store_prealloc
>       2.55            -0.3        2.22 �  2%      -0.3        2.22 �  4%  perf-profile.self.cycles-pp.mas_preallocate
>       4.98 �  2%      -0.3        4.70            -0.2        4.76        perf-profile.self.cycles-pp.__do_sys_brk
>       2.15 �  3%      -0.2        1.93 �  3%      -0.2        1.98 �  3%  perf-profile.self.cycles-pp.mas_leaf_max_gap
>       1.82            -0.2        1.60 �  4%      -0.2        1.62 �  3%  perf-profile.self.cycles-pp.perf_event_mmap_event
>       1.51 �  4%      -0.2        1.31 �  4%      -0.3        1.26        perf-profile.self.cycles-pp.perf_event_mmap
>       1.85 �  2%      -0.2        1.66 �  3%      -0.2        1.61 �  3%  perf-profile.self.cycles-pp.init_multi_vma_prep
>       2.77            -0.1        2.63 �  3%      -0.2        2.60 �  3%  perf-profile.self.cycles-pp.userfaultfd_unmap_complete
>       0.75 �  5%      -0.1        0.67 �  4%      -0.1        0.65 �  5%  perf-profile.self.cycles-pp.security_vm_enough_memory_mm
>       0.88 �  2%      -0.0        0.85 �  3%      -0.1        0.82 �  4%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       0.28 �  5%      -0.0        0.26 �  8%      -0.0        0.24 �  8%  perf-profile.self.cycles-pp.brk_test
>       0.03 � 70%      +0.0        0.07 � 14%      +0.0        0.07 �  8%  perf-profile.self.cycles-pp.anon_vma_interval_tree_remove
>       0.15 � 12%      +0.1        0.20 �  5%      +0.1        0.24 �  6%  perf-profile.self.cycles-pp.rb_next
>       0.40 �  5%      +0.1        0.48 �  5%      +0.1        0.50 � 10%  perf-profile.self.cycles-pp.mas_wr_store_entry
>       0.34 �  6%      +0.2        0.50 �  8%      +0.2        0.54 �  4%  perf-profile.self.cycles-pp.strnlen
>       0.66 �  4%      +0.2        0.84 �  5%      +0.1        0.80 �  4%  perf-profile.self.cycles-pp.vma_prepare
>       1.12 �  5%      +0.2        1.30 �  3%      +0.2        1.28 �  2%  perf-profile.self.cycles-pp.sized_strscpy
>       3.00 �  2%      +0.2        3.19 �  2%      +0.3        3.27 �  2%  perf-profile.self.cycles-pp.down_write
>       0.92 �  4%      +0.2        1.13 �  2%      +0.2        1.08 �  3%  perf-profile.self.cycles-pp.__cond_resched
>       0.00            +0.3        0.26 �  7%      +0.3        0.25 � 12%  perf-profile.self.cycles-pp.mas_next_setup
>       0.28 �  8%      +0.3        0.54 �  8%      +0.3        0.56 �  6%  perf-profile.self.cycles-pp.percpu_counter_add_batch
>       0.29 � 12%      +0.3        0.58 �  9%      +0.3        0.60 �  4%  perf-profile.self.cycles-pp.__vm_enough_memory
>       0.29 � 24%      +0.3        0.58 �  7%      +0.2        0.54 �  7%  perf-profile.self.cycles-pp.mas_prev_setup
>       0.40 �  4%      +0.3        0.70 �  2%      +0.3        0.72 �  8%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
>       0.00            +0.4        0.36 �  6%      +0.3        0.28 � 12%  perf-profile.self.cycles-pp.mas_next_range
>       0.58 � 14%      +0.5        1.12 �  5%      +0.5        1.10 �  6%  perf-profile.self.cycles-pp.mas_prev
>       0.81 �  4%      +0.7        1.48 �  3%      +0.8        1.64 �  4%  perf-profile.self.cycles-pp.mas_next_slot
>       1.32 � 11%      +1.3        2.59 �  3%      +1.3        2.60 �  3%  perf-profile.self.cycles-pp.mas_prev_slot
>       0.00            +1.3        1.30 �  4%      +1.1        1.14 �  6%  perf-profile.self.cycles-pp.vma_merge_new_range
>       0.00            +3.4        3.39            +3.4        3.36        perf-profile.self.cycles-pp.vma_expand
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/brk_test/aim9/300s
>
> commit:
>   fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
>   cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
>   2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>    3540976            -6.4%    3314159            -6.7%    3302864        aim9.brk_test.ops_per_sec
>      23.65            -5.8%      22.28            -6.5%      22.10        time.user_time
>     568.02 � 10%      -0.2%     567.09 � 12%     +18.0%     670.52 �  8%  sched_debug.cfs_rq:/.avg_vruntime.min
>     568.02 � 10%      -0.2%     567.09 � 12%     +18.0%     670.52 �  8%  sched_debug.cfs_rq:/.min_vruntime.min
>     111409 �  2%      -5.1%     105748 �  3%      -4.9%     105984 �  2%  proc-vmstat.nr_active_anon
>     114711 �  2%      -5.0%     109006 �  3%      -4.7%     109341 �  2%  proc-vmstat.nr_shmem
>     111409 �  2%      -5.1%     105748 �  3%      -4.9%     105984 �  2%  proc-vmstat.nr_zone_active_anon
>      17422 �  2%      -5.3%      16494            -1.9%      17084 �  2%  proc-vmstat.pgactivate
>  1.999e+09            +3.2%  2.064e+09            +2.9%  2.057e+09        perf-stat.i.branch-instructions
>       0.47            -5.1%       0.44            -4.4%       0.45        perf-stat.i.cpi
>  9.452e+09            +5.6%  9.983e+09            +5.3%  9.951e+09        perf-stat.i.instructions
>       2.19            +5.8%       2.31            +5.2%       2.30        perf-stat.i.ipc
>       0.33 �  3%      -0.0        0.31            -0.0        0.32        perf-stat.overall.branch-miss-rate%
>       0.47            -5.1%       0.45            -4.4%       0.45        perf-stat.overall.cpi
>       2.12            +5.4%       2.23            +4.7%       2.21        perf-stat.overall.ipc
>  1.991e+09            +3.2%  2.056e+09            +2.9%   2.05e+09        perf-stat.ps.branch-instructions
>  9.417e+09            +5.6%  9.946e+09            +5.3%  9.915e+09        perf-stat.ps.instructions
>  2.841e+12            +5.7%  3.002e+12            +5.2%  2.988e+12        perf-stat.total.instructions
>       0.01 � 42%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.02 � 37%     -68.5%       0.01 � 44%     -57.7%       0.01 � 63%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.02 � 38%     +31.7%       0.02 � 21%     +53.8%       0.03 � 10%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       0.04 � 66%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.05 � 47%     -75.3%       0.01 � 83%     -69.2%       0.01 � 74%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.01 �  9%     +33.8%       0.02 � 18%     +17.5%       0.02 � 10%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       0.44 � 49%     -29.7%       0.31 � 35%    +145.0%       1.09 � 41%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>       0.08 � 57%     -69.7%       0.02 �146%      +6.7%       0.09 � 66%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>       0.10 � 37%     +45.2%       0.15 �  8%     +20.5%       0.12 � 13%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       7209 �  3%      -7.8%       6648 �  2%      -3.6%       6948 �  3%  perf-sched.total_wait_and_delay.count.ms
>       1533 �  6%     -10.2%       1377            -2.6%       1493 �  7%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.01 � 42%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.02 � 37%     -68.5%       0.01 � 44%     -57.7%       0.01 � 63%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.04 � 66%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.05 � 47%     -75.3%       0.01 � 83%     -69.2%       0.01 � 74%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.61            -6.6        0.00            -6.6        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.20            -6.2        0.00            -6.2        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      17.96            -1.1       16.87            -1.5       16.43        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      16.08            -1.0       15.10            -1.3       14.78        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
>       9.85            -0.8        9.02            -1.2        8.66        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
>      11.56            -0.8       10.73            -1.2       10.33 �  2%  perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
>       5.32            -0.2        5.10 �  5%      -0.3        5.03        perf-profile.calltrace.cycles-pp.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       3.57 �  2%      -0.1        3.43 �  3%      -0.2        3.40 �  3%  perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64
>       4.87            -0.1        4.74 �  4%      -0.2        4.66 �  2%  perf-profile.calltrace.cycles-pp.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       2.76 �  2%      -0.1        2.66 �  3%      -0.2        2.58 �  3%  perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       1.11 � 15%      -0.1        1.04 �  4%      -0.1        0.96 �  5%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>       0.64 �  4%      +0.4        1.06 �  2%      +0.4        1.06 �  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.6        0.56 �  5%      +0.6        0.56 �  4%  perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +0.6        0.57 �  6%      +0.5        0.47 � 46%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +0.6        0.58 �  7%      +0.6        0.58 �  3%  perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.7        0.69 �  4%      +0.7        0.71 �  5%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +0.7        0.70 �  6%      +0.8        0.78 �  6%  perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.7        0.73 �  8%      +0.7        0.74 �  8%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +0.7        0.74 �  5%      +0.7        0.74 �  7%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +0.8        0.84 �  2%      +0.7        0.70 �  4%  perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +0.9        0.88 �  5%      +0.9        0.86 �  3%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
>      78.92            +0.9       79.81            +0.5       79.46        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +1.3        1.28 �  2%      +1.3        1.30 �  5%  perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +1.4        1.42 �  3%      +1.4        1.36 �  3%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.6        1.59 �  4%      +1.6        1.64 �  3%  perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +1.8        1.80 �  4%      +1.7        1.73 �  2%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.9        1.89 �  4%      +1.9        1.90 �  5%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
>       0.00            +2.1        2.06 �  3%      +2.2        2.17 �  2%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +2.1        2.12 �  2%      +2.1        2.14 �  2%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +2.4        2.43 �  4%      +2.4        2.40 �  4%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>      52.76            +2.6       55.40            +2.7       55.48        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +3.0        2.98            +3.4        3.38 �  2%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +3.1        3.11 �  3%      +3.2        3.18 �  5%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +3.9        3.90 �  2%      +4.0        4.00 �  2%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +5.0        4.96            +5.4        5.39        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +6.0        6.04 �  2%      +6.2        6.15        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00           +27.5       27.47           +28.1       28.06        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00           +32.1       32.09           +32.7       32.73        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.44            -1.2        5.20            -0.8        5.64        perf-profile.children.cycles-pp.mas_preallocate
>      18.11            -1.1       16.99            -1.6       16.55        perf-profile.children.cycles-pp.perf_event_mmap
>       4.01 �  2%      -1.0        3.06            -0.6        3.45 �  2%  perf-profile.children.cycles-pp.mas_wr_store_type
>      16.54            -0.9       15.60            -1.3       15.27        perf-profile.children.cycles-pp.perf_event_mmap_event
>      10.02            -0.8        9.18            -1.2        8.80        perf-profile.children.cycles-pp.perf_event_mmap_output
>       5.61            -0.8        4.77            -0.8        4.80 �  2%  perf-profile.children.cycles-pp.up_write
>      11.80            -0.8       10.97            -1.2       10.57 �  2%  perf-profile.children.cycles-pp.perf_iterate_sb
>       1.39            -0.6        0.81 �  3%      -0.6        0.81 �  6%  perf-profile.children.cycles-pp.can_vma_merge_after
>       6.89            -0.5        6.38            -0.4        6.52        perf-profile.children.cycles-pp.mas_store_prealloc
>       3.67 �  2%      -0.3        3.41 �  3%      -0.2        3.48 �  5%  perf-profile.children.cycles-pp.vma_complete
>       5.55            -0.2        5.32 �  4%      -0.3        5.23        perf-profile.children.cycles-pp.check_brk_limits
>       2.20 �  4%      -0.2        1.97 �  3%      -0.2        1.96 �  4%  perf-profile.children.cycles-pp.mas_leaf_max_gap
>       2.68 �  3%      -0.2        2.47 �  3%      -0.2        2.44 �  4%  perf-profile.children.cycles-pp.mas_update_gap
>       5.11            -0.2        4.91 �  4%      -0.3        4.84 �  2%  perf-profile.children.cycles-pp.__get_unmapped_area
>       2.51 �  3%      -0.1        2.36 �  3%      -0.2        2.32 �  2%  perf-profile.children.cycles-pp.down_write_killable
>       0.61 �  5%      -0.1        0.49 �  7%      -0.1        0.50 �  5%  perf-profile.children.cycles-pp.may_expand_vm
>       3.67            -0.1        3.55 �  3%      -0.2        3.51 �  3%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
>       1.25 �  4%      -0.1        1.14 �  5%      -0.1        1.19 �  4%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       2.85 �  2%      -0.1        2.74 �  2%      -0.2        2.66 �  3%  perf-profile.children.cycles-pp.userfaultfd_unmap_complete
>       0.14 � 11%      -0.1        0.08 � 12%      -0.1        0.06 � 14%  perf-profile.children.cycles-pp.arch_vma_name
>       0.42            -0.1        0.36 �  4%      -0.0        0.40 �  3%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.36 �  6%      -0.1        0.31 �  4%      -0.0        0.34 � 11%  perf-profile.children.cycles-pp.brk_test
>       2.39            -0.1        2.34 �  5%      -0.1        2.29        perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown_vmflags
>       0.25 �  5%      -0.0        0.21 �  9%      -0.0        0.23 � 13%  perf-profile.children.cycles-pp.__rb_insert_augmented
>       1.92 �  2%      -0.0        1.89 �  4%      -0.1        1.80 �  2%  perf-profile.children.cycles-pp.init_multi_vma_prep
>       0.31 � 10%      -0.0        0.29 � 10%      -0.0        0.26 �  9%  perf-profile.children.cycles-pp.sched_setaffinity
>       0.14 �  3%      -0.0        0.14 �  7%      -0.0        0.11 �  8%  perf-profile.children.cycles-pp.intel_idle
>       0.08 � 10%      +0.0        0.12 � 16%      +0.0        0.10 �  9%  perf-profile.children.cycles-pp.mmap_region
>       0.09 �  8%      +0.0        0.12 � 15%      +0.0        0.11 �  8%  perf-profile.children.cycles-pp.do_mmap
>       0.10 � 14%      +0.0        0.15 � 11%      +0.0        0.14 � 13%  perf-profile.children.cycles-pp.anon_vma_interval_tree_remove
>       1.01 �  5%      +0.1        1.08 �  5%      +0.1        1.12        perf-profile.children.cycles-pp.rcu_all_qs
>       0.19 �  5%      +0.1        0.30 �  5%      +0.1        0.28 �  8%  perf-profile.children.cycles-pp.rb_next
>       1.27 �  4%      +0.1        1.40 �  2%      +0.1        1.37 �  5%  perf-profile.children.cycles-pp.sized_strscpy
>       0.42 �  6%      +0.1        0.57 �  6%      +0.2        0.63 �  9%  perf-profile.children.cycles-pp.strnlen
>       0.48 �  4%      +0.2        0.64 �  5%      +0.2        0.65 �  5%  perf-profile.children.cycles-pp.mas_wr_store_entry
>       1.80 �  4%      +0.2        2.02 �  2%      +0.2        2.00        perf-profile.children.cycles-pp.__cond_resched
>       0.00            +0.3        0.31 � 12%      +0.3        0.32 �  3%  perf-profile.children.cycles-pp.mas_next_setup
>       4.16 �  3%      +0.3        4.48            +0.5        4.64 �  3%  perf-profile.children.cycles-pp.down_write
>       0.40 �  6%      +0.4        0.78 �  4%      +0.4        0.79 �  3%  perf-profile.children.cycles-pp.percpu_counter_add_batch
>       0.48 �  4%      +0.4        0.92 �  5%      +0.4        0.90 �  3%  perf-profile.children.cycles-pp.mas_prev_setup
>       0.56 �  6%      +0.5        1.02 �  4%      +0.5        1.01 �  4%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
>       0.79 �  4%      +0.5        1.30 �  3%      +0.5        1.30 �  4%  perf-profile.children.cycles-pp.__vm_enough_memory
>       1.02 �  2%      +0.7        1.72 �  3%      +0.8        1.79 �  4%  perf-profile.children.cycles-pp.mas_next_slot
>       0.00            +0.7        0.70 �  6%      +0.7        0.68 �  4%  perf-profile.children.cycles-pp.mas_next_range
>      79.62            +0.9       80.50            +0.5       80.13        perf-profile.children.cycles-pp.__do_sys_brk
>       1.10 �  3%      +1.0        2.10 �  3%      +0.8        1.94 �  3%  perf-profile.children.cycles-pp.mas_prev
>       2.86 �  3%      +1.3        4.12 �  3%      +1.4        4.25        perf-profile.children.cycles-pp.vma_prepare
>       1.45 �  4%      +1.3        2.79 �  3%      +1.3        2.77 �  3%  perf-profile.children.cycles-pp.mas_prev_slot
>      54.06            +1.8       55.82            +1.9       55.95        perf-profile.children.cycles-pp.do_brk_flags
>       0.00           +28.3       28.30           +28.9       28.88        perf-profile.children.cycles-pp.vma_expand
>       0.00           +32.6       32.58           +33.1       33.05        perf-profile.children.cycles-pp.vma_merge_new_range
>       5.90 �  2%      -3.4        2.47 �  3%      -3.4        2.47 �  2%  perf-profile.self.cycles-pp.do_brk_flags
>       3.84 �  2%      -0.9        2.90            -0.6        3.28 �  3%  perf-profile.self.cycles-pp.mas_wr_store_type
>       9.85            -0.8        9.02            -1.2        8.63        perf-profile.self.cycles-pp.perf_event_mmap_output
>       5.26            -0.8        4.47 �  2%      -0.8        4.49 �  2%  perf-profile.self.cycles-pp.up_write
>       1.34 �  2%      -0.6        0.75 �  3%      -0.6        0.74 �  5%  perf-profile.self.cycles-pp.can_vma_merge_after
>       2.86            -0.4        2.47 �  5%      -0.2        2.70 �  2%  perf-profile.self.cycles-pp.mas_store_prealloc
>       2.50 �  2%      -0.3        2.22 �  2%      -0.2        2.27 �  2%  perf-profile.self.cycles-pp.mas_preallocate
>       5.02 �  2%      -0.2        4.79            -0.3        4.69        perf-profile.self.cycles-pp.__do_sys_brk
>       2.19 �  4%      -0.2        1.96 �  3%      -0.2        1.95 �  4%  perf-profile.self.cycles-pp.mas_leaf_max_gap
>       1.87 �  3%      -0.2        1.66 �  4%      -0.2        1.71 �  6%  perf-profile.self.cycles-pp.perf_event_mmap_event
>       1.52 �  3%      -0.2        1.33 �  2%      -0.3        1.24 �  2%  perf-profile.self.cycles-pp.perf_event_mmap
>       1.82 �  3%      -0.1        1.68 �  4%      -0.2        1.66 �  3%  perf-profile.self.cycles-pp.down_write_killable
>       1.84 �  2%      -0.1        1.74 �  4%      -0.2        1.67 �  2%  perf-profile.self.cycles-pp.init_multi_vma_prep
>       2.77 �  2%      -0.1        2.67 �  2%      -0.2        2.58 �  3%  perf-profile.self.cycles-pp.userfaultfd_unmap_complete
>       1.18 �  2%      -0.1        1.09            -0.1        1.11 �  6%  perf-profile.self.cycles-pp.do_syscall_64
>       0.92 �  3%      -0.1        0.84 �  6%      -0.1        0.82 �  5%  perf-profile.self.cycles-pp.__get_unmapped_area
>       2.30            -0.0        2.26 �  5%      -0.1        2.21        perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown_vmflags
>       0.33 �  4%      -0.0        0.30 �  5%      -0.0        0.33 �  6%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
>       1.08 �  2%      -0.0        1.05 �  4%      -0.1        1.00 �  2%  perf-profile.self.cycles-pp.mas_find
>       0.14 �  3%      -0.0        0.14 �  7%      -0.0        0.11 �  8%  perf-profile.self.cycles-pp.intel_idle
>       0.03 � 70%      +0.1        0.08 � 14%      +0.0        0.07 � 15%  perf-profile.self.cycles-pp.anon_vma_interval_tree_remove
>       0.13 �  7%      +0.1        0.22 � 10%      +0.1        0.22 � 11%  perf-profile.self.cycles-pp.rb_next
>       0.40 �  7%      +0.1        0.50 �  9%      +0.1        0.51 �  6%  perf-profile.self.cycles-pp.mas_wr_store_entry
>       1.21 �  4%      +0.1        1.32 �  2%      +0.1        1.30 �  5%  perf-profile.self.cycles-pp.sized_strscpy
>       0.38 �  7%      +0.1        0.50 �  7%      +0.2        0.57 �  9%  perf-profile.self.cycles-pp.strnlen
>       0.95 �  7%      +0.1        1.10 �  4%      +0.1        1.08        perf-profile.self.cycles-pp.__cond_resched
>       0.63 �  5%      +0.2        0.82 �  9%      +0.2        0.83 �  2%  perf-profile.self.cycles-pp.vma_prepare
>       2.98 �  2%      +0.2        3.20            +0.3        3.32 �  4%  perf-profile.self.cycles-pp.down_write
>       0.37 �  6%      +0.2        0.60 �  6%      +0.2        0.58 �  4%  perf-profile.self.cycles-pp.__vm_enough_memory
>       0.00            +0.2        0.24 � 11%      +0.2        0.24 �  5%  perf-profile.self.cycles-pp.mas_next_setup
>       0.24 �  6%      +0.3        0.54 �  6%      +0.3        0.56 �  4%  perf-profile.self.cycles-pp.percpu_counter_add_batch
>       0.32 �  4%      +0.3        0.64 �  5%      +0.3        0.62 �  3%  perf-profile.self.cycles-pp.mas_prev_setup
>       0.38 � 10%      +0.3        0.72 �  6%      +0.3        0.73 �  6%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
>       0.00            +0.4        0.41 �  5%      +0.3        0.33 �  9%  perf-profile.self.cycles-pp.mas_next_range
>       0.63 �  5%      +0.5        1.17 �  3%      +0.5        1.10 �  5%  perf-profile.self.cycles-pp.mas_prev
>       0.87 �  3%      +0.6        1.49 �  3%      +0.7        1.61 �  6%  perf-profile.self.cycles-pp.mas_next_slot
>       1.37 �  4%      +1.3        2.64 �  3%      +1.3        2.62 �  4%  perf-profile.self.cycles-pp.mas_prev_slot
>       0.00            +1.3        1.30            +1.2        1.19 �  4%  perf-profile.self.cycles-pp.vma_merge_new_range
>       0.00            +3.4        3.45 �  2%      +3.4        3.43 �  2%  perf-profile.self.cycles-pp.vma_expand
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/brk_test/aim9/300s
>
> commit:
>   fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
>   cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
>   2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>    2667734            -5.6%    2518021            -6.2%    2503505        aim9.brk_test.ops_per_sec
>     196.00            +0.0%     196.00         +1038.8%       2231 � 89%  meminfo.Inactive(file)
>      23.94            -8.7%      21.86 �  2%      -6.0%      22.51        time.user_time
>       0.01 � 47%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.06 � 34%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.01 � 47%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.06 � 34%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>      49.00            +0.0%      49.00         +1039.3%     558.24 � 89%  proc-vmstat.nr_inactive_file
>      49.00            +0.0%      49.00         +1039.3%     558.24 � 89%  proc-vmstat.nr_zone_inactive_file
>     948658            +2.3%     970280            +3.2%     978780        proc-vmstat.pgalloc_normal
>     792310            -1.5%     780779            -1.7%     779104        proc-vmstat.pgfault
>     814343            +2.4%     833987            +3.0%     839063        proc-vmstat.pgfree
>  1.721e+09            +3.0%  1.773e+09            +2.6%  1.765e+09        perf-stat.i.branch-instructions
>       0.54            -5.4%       0.52            -4.8%       0.52        perf-stat.i.cpi
>  7.553e+09            +6.0%  8.003e+09            +5.5%  7.968e+09        perf-stat.i.instructions
>       1.86            +6.1%       1.97            +5.3%       1.96        perf-stat.i.ipc
>       2399            -1.1%       2372            -1.3%       2367        perf-stat.i.minor-faults
>       2399            -1.1%       2372            -1.3%       2367        perf-stat.i.page-faults
>       0.36 �  2%      -0.0        0.35            +0.0        0.36        perf-stat.overall.branch-miss-rate%
>       0.55            -5.3%       0.52            -4.6%       0.52        perf-stat.overall.cpi
>       1.82            +5.6%       1.92            +4.8%       1.91        perf-stat.overall.ipc
>  1.715e+09            +3.0%  1.767e+09            +2.6%   1.76e+09        perf-stat.ps.branch-instructions
>  7.529e+09            +5.9%  7.977e+09            +5.5%  7.942e+09        perf-stat.ps.instructions
>       2391            -1.1%       2364            -1.3%       2359        perf-stat.ps.minor-faults
>       2391            -1.1%       2364            -1.3%       2359        perf-stat.ps.page-faults
>  2.275e+12            +5.8%  2.408e+12            +5.3%  2.395e+12        perf-stat.total.instructions
>       6.58 �  2%      -6.6        0.00            -6.6        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       5.76 �  2%      -5.8        0.00            -5.8        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      18.35            -1.3       17.10            -1.0       17.32        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      15.92            -1.1       14.78 �  2%      -0.9       15.03        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
>      11.03            -0.7       10.33            -0.7       10.36        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
>       4.22 �  3%      -0.4        3.79 �  2%      -0.5        3.69        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       8.48            -0.4        8.08 �  2%      -0.5        7.95        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
>       5.32            -0.3        4.98            -0.2        5.10 �  3%  perf-profile.calltrace.cycles-pp.clear_bhb_loop.brk
>       5.38 �  3%      -0.3        5.06            -0.5        4.89        perf-profile.calltrace.cycles-pp.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       1.16 �  7%      -0.3        0.86 �  5%      -0.3        0.90 �  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.88 � 14%      -0.2        0.64 �  8%      -0.2        0.64 �  4%  perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       5.56            -0.2        5.38            -0.2        5.34        perf-profile.calltrace.cycles-pp.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.74 �  6%      -0.2        0.57 �  6%      -0.1        0.62 �  7%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
>       5.09            -0.2        4.92            -0.2        4.90        perf-profile.calltrace.cycles-pp.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       3.73            -0.2        3.56 �  2%      -0.2        3.58 �  2%  perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64
>       1.25 �  2%      -0.2        1.08 �  9%      -0.1        1.13 �  4%  perf-profile.calltrace.cycles-pp.sized_strscpy.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
>       1.98 �  2%      -0.1        1.84 �  3%      -0.1        1.88 �  2%  perf-profile.calltrace.cycles-pp.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.55 �  2%      -0.1        0.42 � 44%      -0.1        0.46 � 44%  perf-profile.calltrace.cycles-pp.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.78 �  3%      -0.1        0.72 �  4%      -0.1        0.72 �  3%  perf-profile.calltrace.cycles-pp.mas_prev.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.6        0.56 �  5%      +0.1        0.09 �223%  perf-profile.calltrace.cycles-pp.anon_vma_interval_tree_insert.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +0.6        0.64            +0.6        0.60 �  7%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +0.7        0.69 �  8%      +0.7        0.70 �  8%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +0.8        0.78 �  4%      +0.9        0.88 �  5%  perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.8        0.80 �  2%      +0.7        0.72 �  3%  perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>      82.26            +0.8       83.07            +0.7       82.95        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.8        0.82 �  4%      +0.8        0.80 �  5%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>      81.38            +0.8       82.20            +0.7       82.07        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.8        0.84 �  4%      +0.8        0.80 �  7%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>      77.94            +1.0       78.92            +0.8       78.74        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +1.2        1.24 �  3%      +1.2        1.18 �  4%  perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +1.3        1.32 �  2%      +1.4        1.37 �  3%  perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +1.4        1.38 �  2%      +1.4        1.42 �  4%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +1.4        1.40 �  4%      +1.4        1.38        perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +1.6        1.64            +1.7        1.66 �  3%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.8        1.82 �  5%      +1.7        1.73 �  3%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.9        1.89 �  3%      +1.9        1.89 �  2%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.9        1.92 �  3%      +1.9        1.91        perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
>       0.00            +2.3        2.31            +2.3        2.31 �  3%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +2.7        2.68 �  2%      +2.6        2.62        perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +2.9        2.92 �  4%      +2.8        2.81        perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +2.9        2.93 �  2%      +2.9        2.94 �  3%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>      53.19            +3.0       56.14            +2.6       55.83        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +4.4        4.42            +4.4        4.40        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +7.1        7.09            +7.0        6.95        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00           +26.8       26.83           +26.3       26.31        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00           +31.4       31.41           +30.8       30.83        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       5.93            -1.3        4.62            -1.3        4.60        perf-profile.children.cycles-pp.mas_preallocate
>      18.48            -1.3       17.23            -1.0       17.46        perf-profile.children.cycles-pp.perf_event_mmap
>      16.41            -1.2       15.23            -0.9       15.47        perf-profile.children.cycles-pp.perf_event_mmap_event
>       3.46            -1.1        2.37            -1.1        2.38 �  3%  perf-profile.children.cycles-pp.mas_wr_store_type
>      11.24            -0.7       10.52 �  2%      -0.7       10.56        perf-profile.children.cycles-pp.perf_iterate_sb
>       4.29 �  3%      -0.4        3.86 �  2%      -0.5        3.76        perf-profile.children.cycles-pp.mas_walk
>       8.61            -0.4        8.21 �  2%      -0.5        8.07        perf-profile.children.cycles-pp.perf_event_mmap_output
>       0.83 �  7%      -0.4        0.47 �  8%      -0.4        0.45 � 10%  perf-profile.children.cycles-pp.may_expand_vm
>       3.82            -0.3        3.48 �  3%      -0.4        3.41        perf-profile.children.cycles-pp.down_write
>       5.39            -0.3        5.06            -0.2        5.18 �  3%  perf-profile.children.cycles-pp.clear_bhb_loop
>       1.36 �  5%      -0.3        1.03 �  5%      -0.3        1.08 �  4%  perf-profile.children.cycles-pp.__vm_enough_memory
>       5.64 �  3%      -0.3        5.32            -0.5        5.13        perf-profile.children.cycles-pp.mas_find
>       1.18 �  5%      -0.3        0.88 �  4%      -0.3        0.87 �  3%  perf-profile.children.cycles-pp.can_vma_merge_after
>       0.57 � 22%      -0.2        0.33 � 12%      -0.2        0.36 �  7%  perf-profile.children.cycles-pp.cap_vm_enough_memory
>       1.06 � 11%      -0.2        0.83 �  9%      -0.2        0.82 �  5%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
>       5.30            -0.2        5.11            -0.2        5.10        perf-profile.children.cycles-pp.__get_unmapped_area
>       5.75            -0.2        5.56            -0.2        5.52        perf-profile.children.cycles-pp.check_brk_limits
>       0.82 �  4%      -0.2        0.64 �  4%      -0.1        0.70 �  7%  perf-profile.children.cycles-pp.percpu_counter_add_batch
>       3.86            -0.2        3.69 �  2%      -0.1        3.71 �  2%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
>       1.32 �  2%      -0.2        1.16 �  8%      -0.1        1.20 �  3%  perf-profile.children.cycles-pp.sized_strscpy
>       2.10 �  2%      -0.1        1.96 �  2%      -0.1        2.00 �  3%  perf-profile.children.cycles-pp.down_write_killable
>       1.86 �  3%      -0.1        1.74 �  2%      -0.1        1.80 �  4%  perf-profile.children.cycles-pp.__cond_resched
>       0.57 �  7%      -0.1        0.45 � 13%      -0.1        0.46 �  2%  perf-profile.children.cycles-pp.strlen
>       2.78            -0.1        2.66 �  2%      -0.1        2.64 �  2%  perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown_vmflags
>       0.10 � 13%      +0.0        0.11 � 16%      +0.0        0.12 � 11%  perf-profile.children.cycles-pp.vfs_read
>       0.10 � 11%      +0.0        0.12 � 13%      +0.0        0.14 �  8%  perf-profile.children.cycles-pp.read
>       0.70 �  2%      +0.1        0.81 �  6%      +0.1        0.82 �  7%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
>       0.68 �  9%      +0.2        0.90 �  3%      +0.2        0.86 �  6%  perf-profile.children.cycles-pp.mas_wr_slot_store
>       0.34 �  5%      +0.3        0.68 �  6%      +0.3        0.68 �  4%  perf-profile.children.cycles-pp.mas_prev_setup
>       0.00            +0.4        0.43 �  2%      +0.4        0.41 �  7%  perf-profile.children.cycles-pp.mas_next_setup
>       6.91 �  2%      +0.5        7.37            +0.4        7.26        perf-profile.children.cycles-pp.mas_store_prealloc
>       0.92 �  3%      +0.8        1.76 �  3%      +0.8        1.69 �  2%  perf-profile.children.cycles-pp.mas_prev
>      83.20            +0.9       84.05            +0.7       83.90        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      82.36            +0.9       83.23            +0.7       83.08        perf-profile.children.cycles-pp.do_syscall_64
>      78.68            +0.9       79.62            +0.8       79.46        perf-profile.children.cycles-pp.__do_sys_brk
>       0.00            +0.9        0.94 �  3%      +1.1        1.05 �  5%  perf-profile.children.cycles-pp.mas_next_range
>       1.35 �  5%      +1.2        2.59 �  3%      +1.2        2.52        perf-profile.children.cycles-pp.mas_prev_slot
>       0.84 �  3%      +1.4        2.26 �  3%      +1.4        2.27 �  2%  perf-profile.children.cycles-pp.mas_next_slot
>      54.26            +2.3       56.54            +2.0       56.30        perf-profile.children.cycles-pp.do_brk_flags
>       0.00           +27.5       27.55           +27.0       27.00        perf-profile.children.cycles-pp.vma_expand
>       0.00           +31.9       31.86           +31.1       31.13        perf-profile.children.cycles-pp.vma_merge_new_range
>       6.50            -3.3        3.19 �  2%      -3.1        3.35        perf-profile.self.cycles-pp.do_brk_flags
>       3.35            -1.1        2.25            -1.1        2.25 �  2%  perf-profile.self.cycles-pp.mas_wr_store_type
>       5.31 �  2%      -0.4        4.88            -0.2        5.14 �  2%  perf-profile.self.cycles-pp.__do_sys_brk
>       4.22 �  3%      -0.4        3.80 �  2%      -0.5        3.70        perf-profile.self.cycles-pp.mas_walk
>       8.47            -0.4        8.07 �  2%      -0.5        7.94        perf-profile.self.cycles-pp.perf_event_mmap_output
>       0.71 �  8%      -0.3        0.38 �  8%      -0.3        0.36 � 12%  perf-profile.self.cycles-pp.may_expand_vm
>       5.32            -0.3        5.00            -0.2        5.11 �  3%  perf-profile.self.cycles-pp.clear_bhb_loop
>       1.12 �  5%      -0.3        0.82 �  4%      -0.3        0.80 �  5%  perf-profile.self.cycles-pp.can_vma_merge_after
>       2.62            -0.2        2.38 �  5%      -0.3        2.32 �  3%  perf-profile.self.cycles-pp.down_write
>       0.44 � 28%      -0.2        0.20 � 13%      -0.2        0.24 �  7%  perf-profile.self.cycles-pp.cap_vm_enough_memory
>       2.50 �  3%      -0.2        2.31 �  2%      -0.2        2.28        perf-profile.self.cycles-pp.mas_preallocate
>       0.61 �  9%      -0.2        0.42 �  7%      -0.2        0.42 �  6%  perf-profile.self.cycles-pp.__vm_enough_memory
>       1.26 �  2%      -0.2        1.09 �  9%      -0.1        1.14 �  3%  perf-profile.self.cycles-pp.sized_strscpy
>       0.57 �  5%      -0.1        0.44 �  4%      -0.1        0.47 �  8%  perf-profile.self.cycles-pp.percpu_counter_add_batch
>       2.25 �  3%      -0.1        2.13 �  3%      -0.0        2.25 �  2%  perf-profile.self.cycles-pp.perf_event_mmap_event
>       2.70            -0.1        2.60 �  2%      -0.1        2.58 �  2%  perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown_vmflags
>       0.50 �  8%      -0.1        0.40 � 12%      -0.1        0.41 �  2%  perf-profile.self.cycles-pp.strlen
>       0.57 �  4%      -0.1        0.52 �  4%      -0.0        0.55 �  4%  perf-profile.self.cycles-pp.strnlen
>       1.40 �  3%      -0.0        1.34 �  2%      -0.1        1.33 �  2%  perf-profile.self.cycles-pp.down_write_killable
>       0.01 �223%      +0.1        0.06 � 15%      +0.0        0.05 � 48%  perf-profile.self.cycles-pp.anon_vma_interval_tree_remove
>       0.51 �  4%      +0.1        0.60 �  5%      +0.1        0.60 �  7%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
>       2.87 �  2%      +0.2        3.10 �  2%      +0.2        3.09 �  2%  perf-profile.self.cycles-pp.mas_store_prealloc
>       0.61 �  8%      +0.2        0.84 �  4%      +0.2        0.81 �  7%  perf-profile.self.cycles-pp.mas_wr_slot_store
>       0.26 �  7%      +0.3        0.56 �  5%      +0.3        0.56 �  7%  perf-profile.self.cycles-pp.mas_prev_setup
>       0.00            +0.3        0.33 �  3%      +0.3        0.31 � 10%  perf-profile.self.cycles-pp.mas_next_setup
>       0.53 �  6%      +0.4        0.98 �  3%      +0.4        0.94 �  7%  perf-profile.self.cycles-pp.mas_prev
>       0.00            +0.6        0.56 �  5%      +0.6        0.62 �  5%  perf-profile.self.cycles-pp.mas_next_range
>       1.29 �  4%      +1.2        2.46 �  3%      +1.1        2.39        perf-profile.self.cycles-pp.mas_prev_slot
>       0.72 �  4%      +1.4        2.07 �  3%      +1.4        2.09        perf-profile.self.cycles-pp.mas_next_slot
>       0.00            +1.4        1.40 �  4%      +1.2        1.24 �  5%  perf-profile.self.cycles-pp.vma_merge_new_range
>       0.00            +3.6        3.55 �  6%      +3.4        3.42 �  2%  perf-profile.self.cycles-pp.vma_expand
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s
>
> commit:
>   fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
>   cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
>   2e71337ac26478 ("mm: explicitly enable an expand-only merge mode for brk()")
>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10 2e71337ac2647889d3d9d76a5ce
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>     201.54            +2.9%     207.44            +2.5%     206.52        time.system_time
>      97.58            -6.0%      91.75            -5.0%      92.66        time.user_time
>    1322908            -5.0%    1256536            -4.1%    1268145        aim9.brk_test.ops_per_sec
>     201.54            +2.9%     207.44            +2.5%     206.52        aim9.time.system_time
>      97.58            -6.0%      91.75            -5.0%      92.66        aim9.time.user_time
>       0.04 � 82%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.10 � 60%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>      90.66 � 71%    +411.1%     463.37 �113%    +160.5%     236.20 � 12%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.blk_execute_rq
>     127.98 � 86%    +586.2%     878.13 �150%    +192.6%     374.47 � 56%  perf-sched.wait_and_delay.max.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.blk_execute_rq
>       0.04 � 82%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>     111.98 � 31%    +323.3%     474.03 �108%    +110.6%     235.86 � 12%  perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.blk_execute_rq
>       0.10 � 60%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>     149.30 � 58%    +495.3%     888.79 �147%    +150.4%     373.80 � 57%  perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.blk_execute_rq
>       0.30 �  2%      -9.0%       0.27 �  4%     -11.5%       0.27 �  7%  perf-stat.i.MPKI
>   8.33e+08            +3.9%  8.654e+08            +4.5%  8.708e+08        perf-stat.i.branch-instructions
>       1.15            -0.1        1.09            -0.1        1.08        perf-stat.i.branch-miss-rate%
>   12964626            -1.9%   12711922            -2.6%   12624576        perf-stat.i.branch-misses
>       1.11            -7.4%       1.03            -7.9%       1.03        perf-stat.i.cpi
>  3.943e+09            +6.0%   4.18e+09            +6.7%  4.206e+09        perf-stat.i.instructions
>       0.91            +7.9%       0.98            +8.5%       0.99        perf-stat.i.ipc
>       0.29 �  2%      -9.1%       0.27 �  4%     -10.8%       0.26 �  7%  perf-stat.overall.MPKI
>       1.56            -0.1        1.47            -0.1        1.45        perf-stat.overall.branch-miss-rate%
>       1.08            -6.8%       1.01            -7.2%       1.01        perf-stat.overall.cpi
>       0.92            +7.2%       0.99            +7.8%       0.99        perf-stat.overall.ipc
>  8.303e+08            +3.9%  8.627e+08            +4.5%  8.681e+08        perf-stat.ps.branch-instructions
>   12931205            -2.0%   12678170            -2.6%   12593410        perf-stat.ps.branch-misses
>   3.93e+09            +6.0%  4.167e+09            +6.7%  4.193e+09        perf-stat.ps.instructions
>  1.184e+12            +6.1%  1.256e+12            +6.7%  1.263e+12        perf-stat.total.instructions
>       7.16 �  2%      -0.4        6.76 �  4%      -0.3        6.83 �  5%  perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.brk
>       5.72 �  2%      -0.4        5.35 �  3%      -0.2        5.53 �  4%  perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
>       6.13 �  2%      -0.3        5.84 �  3%      -0.2        5.97 �  4%  perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.83 � 11%      -0.1        0.71 �  5%      -0.1        0.76 �  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.6        0.58 �  5%      +0.6        0.57 �  8%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
>      16.73 �  2%      +0.6       17.34            +0.5       17.27 �  4%  perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.7        0.66 �  6%      +0.6        0.61 � 45%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
>      24.21            +0.7       24.90            +0.5       24.71 �  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>      23.33            +0.7       24.05 �  2%      +0.5       23.87 �  3%  perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.8        0.82 �  4%      +0.9        0.92 � 11%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +0.9        0.87 �  5%      +0.9        0.86 �  6%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +1.1        1.07 �  9%      +1.0        1.01 � 14%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.1        1.10 �  6%      +1.2        1.15 � 10%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +2.3        2.26 �  5%      +2.2        2.19 �  5%  perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +7.6        7.56 �  3%      +7.5        7.48 �  4%  perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +8.6        8.62 �  4%      +8.4        8.40 �  4%  perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.74 �  2%      -0.4        7.30 �  4%      -0.4        7.38 �  5%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       5.81 �  2%      -0.4        5.43 �  3%      -0.2        5.60 �  4%  perf-profile.children.cycles-pp.perf_event_mmap_event
>       6.18 �  2%      -0.3        5.88 �  3%      -0.2        6.00 �  4%  perf-profile.children.cycles-pp.perf_event_mmap
>       3.93            -0.2        3.73 �  3%      -0.1        3.81 �  4%  perf-profile.children.cycles-pp.perf_iterate_sb
>       0.22 � 29%      -0.1        0.08 � 17%      -0.1        0.09 � 42%  perf-profile.children.cycles-pp.may_expand_vm
>       0.96 �  3%      -0.1        0.83 �  4%      -0.0        0.93 � 11%  perf-profile.children.cycles-pp.vma_complete
>       0.61 � 14%      -0.1        0.52 �  7%      -0.0        0.57 �  9%  perf-profile.children.cycles-pp.percpu_counter_add_batch
>       0.15 �  7%      -0.1        0.08 � 20%      -0.1        0.08 � 25%  perf-profile.children.cycles-pp.brk_test
>       0.10 � 11%      +0.0        0.10 � 28%      +0.0        0.12 � 10%  perf-profile.children.cycles-pp.run_posix_cpu_timers
>       0.08 � 11%      +0.0        0.12 � 14%      +0.0        0.12 � 12%  perf-profile.children.cycles-pp.mas_prev_setup
>       0.00            +0.0        0.05 � 46%      +0.1        0.08 � 16%  perf-profile.children.cycles-pp.mas_next_setup
>       0.24 � 19%      +0.1        0.31 �  9%      +0.1        0.32 �  9%  perf-profile.children.cycles-pp.mas_prev
>       0.17 � 12%      +0.1        0.27 � 10%      +0.0        0.19 � 16%  perf-profile.children.cycles-pp.mas_wr_store_entry
>       0.00            +0.2        0.15 � 11%      +0.2        0.17 �  8%  perf-profile.children.cycles-pp.mas_next_range
>       0.19 �  8%      +0.2        0.38 � 10%      +0.2        0.41 �  8%  perf-profile.children.cycles-pp.mas_next_slot
>       0.34 � 17%      +0.3        0.64 �  6%      +0.3        0.61 �  6%  perf-profile.children.cycles-pp.mas_prev_slot
>      23.40            +0.7       24.12 �  2%      +0.5       23.94 �  3%  perf-profile.children.cycles-pp.__do_sys_brk
>       0.00            +7.6        7.59 �  3%      +7.5        7.49 �  4%  perf-profile.children.cycles-pp.vma_expand
>       0.00            +8.7        8.66 �  4%      +8.5        8.46 �  4%  perf-profile.children.cycles-pp.vma_merge_new_range
>       1.61 � 10%      -0.9        0.69 �  8%      -0.8        0.83 � 14%  perf-profile.self.cycles-pp.do_brk_flags
>       7.64 �  2%      -0.4        7.20 �  4%      -0.4        7.28 �  5%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.22 � 30%      -0.1        0.08 � 17%      -0.1        0.09 � 42%  perf-profile.self.cycles-pp.may_expand_vm
>       0.57 � 15%      -0.1        0.46 �  6%      -0.0        0.53 � 10%  perf-profile.self.cycles-pp.percpu_counter_add_batch
>       0.77 �  7%      -0.1        0.69 �  5%      -0.1        0.69 �  5%  perf-profile.self.cycles-pp.perf_event_mmap_event
>       0.15 �  7%      -0.1        0.08 � 20%      -0.1        0.08 � 24%  perf-profile.self.cycles-pp.brk_test
>       0.20 �  5%      -0.0        0.18 �  4%      +0.0        0.20 �  9%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
>       0.10 � 11%      +0.0        0.10 � 28%      +0.0        0.12 � 10%  perf-profile.self.cycles-pp.run_posix_cpu_timers
>       0.07 � 18%      +0.0        0.10 � 18%      +0.0        0.11 � 11%  perf-profile.self.cycles-pp.mas_prev_setup
>       0.00            +0.1        0.09 � 12%      +0.1        0.11 �  9%  perf-profile.self.cycles-pp.mas_next_range
>       0.36 �  8%      +0.1        0.45 �  6%      +0.0        0.40 � 11%  perf-profile.self.cycles-pp.perf_event_mmap
>       0.15 � 13%      +0.1        0.25 � 14%      +0.0        0.17 � 16%  perf-profile.self.cycles-pp.mas_wr_store_entry
>       0.17 � 11%      +0.2        0.37 � 11%      +0.2        0.40 �  9%  perf-profile.self.cycles-pp.mas_next_slot
>       0.34 � 17%      +0.3        0.64 �  6%      +0.3        0.61 �  6%  perf-profile.self.cycles-pp.mas_prev_slot
>       0.00            +0.3        0.33 �  5%      +0.3        0.30 �  7%  perf-profile.self.cycles-pp.vma_merge_new_range
>       0.00            +0.8        0.81 �  9%      +0.7        0.74 �  9%  perf-profile.self.cycles-pp.vma_expand
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-10-11  7:26             ` Lorenzo Stoakes
@ 2024-10-15 19:56               ` Lorenzo Stoakes
  2024-10-17  2:58                 ` Oliver Sang
  0 siblings, 1 reply; 13+ messages in thread
From: Lorenzo Stoakes @ 2024-10-15 19:56 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin

On Fri, Oct 11, 2024 at 08:26:37AM +0100, Lorenzo Stoakes wrote:
[snip]

> Thanks for testing this suffices to rule this one out... I will try to get a
> functional and reliable performance environment locally so I can properly
> address this and then we can try something else.
>
> Thanks!
> Lorenzo
>

OK Oliver, could you try the below patch? I have got aim9.brk up and
running locally and for me this seems to address the issue.

This is against Andrew's tree [0] in the mm-unstable branch. It should
hopefully apply cleanly to -next also.

Very much appreciated, thanks!

[0]:https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/

----8<----
From cee7f4196247de0da2b7632838fd36aee8b77e13 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Tue, 15 Oct 2024 20:16:32 +0100
Subject: [PATCH] mm: add expand-only VMA merge mode and optimise
 do_brk_flags()

We know in advance that do_brk_flags() wants only to perform a VMA
expansion (if the prior VMA is compatible), and that we assume no mergeable
VMA follows it.

These are the semantics of this function prior to the recent rewrite of the
VMA merging logic, however we are now doing more work than necessary -
positioning the VMA iterator at the prior VMA and performing tasks that are
not required.

Add a new field to the vmg struct to permit merge flags and add a new merge
flag VMG_FLAG_JUST_EXPAND which implies this behaviour, and have
do_brk_flags() use this.

This fixes a reported performance regression in a brk() benchmarking suite.
---
 mm/mmap.c |  3 ++-
 mm/vma.c  | 23 +++++++++++++++--------
 mm/vma.h  | 16 ++++++++++++++++
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 02f7b45c3076..b99ba4cac9fe 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1741,7 +1741,8 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
 		VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));

 		vmg.prev = vma;
-		vma_iter_next_range(vmi);
+		/* vmi is positioned at prev, which this mode expects. */
+		vmg.merge_flags = VMG_FLAG_JUST_EXPAND;

 		if (vma_merge_new_range(&vmg))
 			goto out;
diff --git a/mm/vma.c b/mm/vma.c
index 749c4881fd60..69ce9e07ab11 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -562,6 +562,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 	pgoff_t pgoff = vmg->pgoff;
 	pgoff_t pglen = PHYS_PFN(end - start);
 	bool can_merge_left, can_merge_right;
+	bool just_expand = vmg->merge_flags & VMG_FLAG_JUST_EXPAND;

 	mmap_assert_write_locked(vmg->mm);
 	VM_WARN_ON(vmg->vma);
@@ -575,7 +576,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 		return NULL;

 	can_merge_left = can_vma_merge_left(vmg);
-	can_merge_right = can_vma_merge_right(vmg, can_merge_left);
+	can_merge_right = !just_expand && can_vma_merge_right(vmg, can_merge_left);

 	/* If we can merge with the next VMA, adjust vmg accordingly. */
 	if (can_merge_right) {
@@ -590,7 +591,11 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 		vmg->vma = prev;
 		vmg->pgoff = prev->vm_pgoff;

-		vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
+		/* In expand-only case we are already positioned here. */
+		if (!just_expand) {
+			/* Equivalent to going to the previous range. */
+			vma_prev(vmg->vmi);
+		}
 	}

 	/*
@@ -604,12 +609,14 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
 	}

 	/* If expansion failed, reset state. Allows us to retry merge later. */
-	vmg->vma = NULL;
-	vmg->start = start;
-	vmg->end = end;
-	vmg->pgoff = pgoff;
-	if (vmg->vma == prev)
-		vma_iter_set(vmg->vmi, start);
+	if (!just_expand) {
+		vmg->vma = NULL;
+		vmg->start = start;
+		vmg->end = end;
+		vmg->pgoff = pgoff;
+		if (vmg->vma == prev)
+			vma_iter_set(vmg->vmi, start);
+	}

 	return NULL;
 }
diff --git a/mm/vma.h b/mm/vma.h
index 82354fe5edd0..8f8548958e41 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -59,6 +59,19 @@ enum vma_merge_state {
 	VMA_MERGE_SUCCESS,
 };

+typedef unsigned long vma_merge_flags_t;
+
+ /*
+  * If we can expand, simply do so. We know there is nothing to merge to the
+  * right.
+  *
+  * Does not reset state upon failure to merge.
+  *
+  * IMPORTANT: The VMA iterator is assumed to be positioned at the previous VMA,
+  *            rather than at the gap.
+  */
+#define VMG_FLAG_JUST_EXPAND (1 << 0)
+
 /* Represents a VMA merge operation. */
 struct vma_merge_struct {
 	struct mm_struct *mm;
@@ -75,6 +88,7 @@ struct vma_merge_struct {
 	struct mempolicy *policy;
 	struct vm_userfaultfd_ctx uffd_ctx;
 	struct anon_vma_name *anon_name;
+	vma_merge_flags_t merge_flags;
 	enum vma_merge_state state;
 };

@@ -99,6 +113,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
 		.flags = flags_,					\
 		.pgoff = pgoff_,					\
 		.state = VMA_MERGE_START,				\
+		.merge_flags = 0,					\
 	}

 #define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_)	\
@@ -118,6 +133,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
 		.uffd_ctx = vma_->vm_userfaultfd_ctx,		\
 		.anon_name = anon_vma_name(vma_),		\
 		.state = VMA_MERGE_START,			\
+		.merge_flags = 0,				\
 	}

 #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
--
2.46.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-10-15 19:56               ` Lorenzo Stoakes
@ 2024-10-17  2:58                 ` Oliver Sang
  2024-10-17  8:54                   ` Lorenzo Stoakes
  0 siblings, 1 reply; 13+ messages in thread
From: Oliver Sang @ 2024-10-17  2:58 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin,
	oliver.sang

[-- Attachment #1: Type: text/plain, Size: 11436 bytes --]

hi, Lorenzo,

On Tue, Oct 15, 2024 at 08:56:28PM +0100, Lorenzo Stoakes wrote:
> On Fri, Oct 11, 2024 at 08:26:37AM +0100, Lorenzo Stoakes wrote:
> [snip]
> 
> > Thanks for testing this suffices to rule this one out... I will try to get a
> > functional and reliable performance environment locally so I can properly
> > address this and then we can try something else.
> >
> > Thanks!
> > Lorenzo
> >
> 
> OK Oliver, could you try the below patch? I have got aim9.brk up and
> running locally and for me this seems to address the issue.
> 
> This is against Andrew's tree [0] in the mm-unstable branch. It should
> hopefully apply cleanly to -next also.

I found the patch still be able to applied to cacded5e42 cleanly, so below data
still based on this applyment.

$ git log --oneline 9cecc5dc893886
9cecc5dc893886 mm: add expand-only VMA merge mode and optimise do_brk_flags()
cacded5e42b960 mm: avoid using vma_merge() for new VMAs
fc21959f74bc11 mm: abstract vma_expand() to use vma_merge_struct
...

again, if some patches in mm-unstable or -next have some impacts, please let me
know then I can re-apply the patch and do the tests again. thanks


by this patch, we do see performance recovery but not fully.

e.g. for
model: Granite Rapids
nr_node: 1
nr_cpu: 240
memory: 192G

we got better score from the patch than cacded5e42b960, but still 2.0%
regression than fc21959f74bc11 (the parent of cacded5e42b960)

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-gnr-1ap1/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  9cecc5dc893886 ("mm: add expand-only VMA merge mode and optimise do_brk_flags()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 9cecc5dc89388676d1d0d47461c
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3220697            -6.0%    3028867            -2.0%    3156931        aim9.brk_test.ops_per_sec


similar results on other platforms, full data is attached as
fc21959f74bc11-cacded5e42b960-9cecc5dc893886


for
model: Emerald Rapids
nr_node: 4
nr_cpu: 256
memory: 256G
brand: INTEL(R) XEON(R) PLATINUM 8592+

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-emr-2sp1/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  9cecc5dc893886 ("mm: add expand-only VMA merge mode and optimise do_brk_flags()")
  
fc21959f74bc1138 cacded5e42b9609b07b22d80c10 9cecc5dc89388676d1d0d47461c
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3669298            -6.5%    3430070            -2.7%    3571699        aim9.brk_test.ops_per_sec


for
model: Sapphire Rapids
nr_node: 2
nr_cpu: 224
memory: 512G
brand: Intel(R) Xeon(R) Platinum 8480CTDX

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  9cecc5dc893886 ("mm: add expand-only VMA merge mode and optimise do_brk_flags()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 9cecc5dc89388676d1d0d47461c
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3540976            -6.4%    3314159            -2.6%    3449384        aim9.brk_test.ops_per_sec


for
model: Ice Lake
nr_node: 2
nr_cpu: 64
memory: 256G
brand: Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  9cecc5dc893886 ("mm: add expand-only VMA merge mode and optimise do_brk_flags()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 9cecc5dc89388676d1d0d47461c
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   2667734            -5.6%    2518021            -1.0%    2640850        aim9.brk_test.ops_per_sec


for
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
which we made the original report

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  9cecc5dc893886 ("mm: add expand-only VMA merge mode and optimise do_brk_flags()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 9cecc5dc89388676d1d0d47461c
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   1322908            -5.0%    1256536            -1.6%    1301387        aim9.brk_test.ops_per_sec

> 
> Very much appreciated, thanks!
> 
> [0]:https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/
> 
> ----8<----
> From cee7f4196247de0da2b7632838fd36aee8b77e13 Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Date: Tue, 15 Oct 2024 20:16:32 +0100
> Subject: [PATCH] mm: add expand-only VMA merge mode and optimise
>  do_brk_flags()
> 
> We know in advance that do_brk_flags() wants only to perform a VMA
> expansion (if the prior VMA is compatible), and that we assume no mergeable
> VMA follows it.
> 
> These are the semantics of this function prior to the recent rewrite of the
> VMA merging logic, however we are now doing more work than necessary -
> positioning the VMA iterator at the prior VMA and performing tasks that are
> not required.
> 
> Add a new field to the vmg struct to permit merge flags and add a new merge
> flag VMG_FLAG_JUST_EXPAND which implies this behaviour, and have
> do_brk_flags() use this.
> 
> This fixes a reported performance regression in a brk() benchmarking suite.
> ---
>  mm/mmap.c |  3 ++-
>  mm/vma.c  | 23 +++++++++++++++--------
>  mm/vma.h  | 16 ++++++++++++++++
>  3 files changed, 33 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 02f7b45c3076..b99ba4cac9fe 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1741,7 +1741,8 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
>  		VMG_STATE(vmg, mm, vmi, addr, addr + len, flags, PHYS_PFN(addr));
> 
>  		vmg.prev = vma;
> -		vma_iter_next_range(vmi);
> +		/* vmi is positioned at prev, which this mode expects. */
> +		vmg.merge_flags = VMG_FLAG_JUST_EXPAND;
> 
>  		if (vma_merge_new_range(&vmg))
>  			goto out;
> diff --git a/mm/vma.c b/mm/vma.c
> index 749c4881fd60..69ce9e07ab11 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -562,6 +562,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  	pgoff_t pgoff = vmg->pgoff;
>  	pgoff_t pglen = PHYS_PFN(end - start);
>  	bool can_merge_left, can_merge_right;
> +	bool just_expand = vmg->merge_flags & VMG_FLAG_JUST_EXPAND;
> 
>  	mmap_assert_write_locked(vmg->mm);
>  	VM_WARN_ON(vmg->vma);
> @@ -575,7 +576,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  		return NULL;
> 
>  	can_merge_left = can_vma_merge_left(vmg);
> -	can_merge_right = can_vma_merge_right(vmg, can_merge_left);
> +	can_merge_right = !just_expand && can_vma_merge_right(vmg, can_merge_left);
> 
>  	/* If we can merge with the next VMA, adjust vmg accordingly. */
>  	if (can_merge_right) {
> @@ -590,7 +591,11 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  		vmg->vma = prev;
>  		vmg->pgoff = prev->vm_pgoff;
> 
> -		vma_prev(vmg->vmi); /* Equivalent to going to the previous range */
> +		/* In expand-only case we are already positioned here. */
> +		if (!just_expand) {
> +			/* Equivalent to going to the previous range. */
> +			vma_prev(vmg->vmi);
> +		}
>  	}
> 
>  	/*
> @@ -604,12 +609,14 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg)
>  	}
> 
>  	/* If expansion failed, reset state. Allows us to retry merge later. */
> -	vmg->vma = NULL;
> -	vmg->start = start;
> -	vmg->end = end;
> -	vmg->pgoff = pgoff;
> -	if (vmg->vma == prev)
> -		vma_iter_set(vmg->vmi, start);
> +	if (!just_expand) {
> +		vmg->vma = NULL;
> +		vmg->start = start;
> +		vmg->end = end;
> +		vmg->pgoff = pgoff;
> +		if (vmg->vma == prev)
> +			vma_iter_set(vmg->vmi, start);
> +	}
> 
>  	return NULL;
>  }
> diff --git a/mm/vma.h b/mm/vma.h
> index 82354fe5edd0..8f8548958e41 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -59,6 +59,19 @@ enum vma_merge_state {
>  	VMA_MERGE_SUCCESS,
>  };
> 
> +typedef unsigned long vma_merge_flags_t;
> +
> + /*
> +  * If we can expand, simply do so. We know there is nothing to merge to the
> +  * right.
> +  *
> +  * Does not reset state upon failure to merge.
> +  *
> +  * IMPORTANT: The VMA iterator is assumed to be positioned at the previous VMA,
> +  *            rather than at the gap.
> +  */
> +#define VMG_FLAG_JUST_EXPAND (1 << 0)
> +
>  /* Represents a VMA merge operation. */
>  struct vma_merge_struct {
>  	struct mm_struct *mm;
> @@ -75,6 +88,7 @@ struct vma_merge_struct {
>  	struct mempolicy *policy;
>  	struct vm_userfaultfd_ctx uffd_ctx;
>  	struct anon_vma_name *anon_name;
> +	vma_merge_flags_t merge_flags;
>  	enum vma_merge_state state;
>  };
> 
> @@ -99,6 +113,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
>  		.flags = flags_,					\
>  		.pgoff = pgoff_,					\
>  		.state = VMA_MERGE_START,				\
> +		.merge_flags = 0,					\
>  	}
> 
>  #define VMG_VMA_STATE(name, vmi_, prev_, vma_, start_, end_)	\
> @@ -118,6 +133,7 @@ static inline pgoff_t vma_pgoff_offset(struct vm_area_struct *vma,
>  		.uffd_ctx = vma_->vm_userfaultfd_ctx,		\
>  		.anon_name = anon_vma_name(vma_),		\
>  		.state = VMA_MERGE_START,			\
> +		.merge_flags = 0,				\
>  	}
> 
>  #ifdef CONFIG_DEBUG_VM_MAPLE_TREE
> --
> 2.46.2

[-- Attachment #2: fc21959f74bc11-cacded5e42b960-9cecc5dc893886 --]
[-- Type: text/plain, Size: 101060 bytes --]

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-gnr-1ap1/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  9cecc5dc893886 ("mm: add expand-only VMA merge mode and optimise do_brk_flags()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 9cecc5dc89388676d1d0d47461c
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3220697            -6.0%    3028867            -2.0%    3156931        aim9.brk_test.ops_per_sec
     24.58            -3.9%      23.63            +0.5%      24.71        time.user_time
    119459            -3.2%     115601            -0.5%     118822        proc-vmstat.nr_active_anon
    120943            -3.2%     117079            -0.5%     120301        proc-vmstat.nr_shmem
    119459            -3.2%     115601            -0.5%     118822        proc-vmstat.nr_zone_active_anon
     26.78 ± 11%      +3.2%      27.63 ± 29%     +36.7%      36.60 ± 20%  sched_debug.cfs_rq:/.removed.load_avg.stddev
     13.45 ± 11%      +4.4%      14.04 ± 30%     +36.7%      18.39 ± 20%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
     13.44 ± 11%      +4.5%      14.04 ± 30%     +36.8%      18.39 ± 20%  sched_debug.cfs_rq:/.removed.util_avg.stddev
      0.02 ±120%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      3.27 ±  5%   +5112.4%     170.40 ±218%   +5108.0%     170.26 ±218%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.20 ±188%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.02 ±120%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.20 ±188%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
 1.767e+09            +4.2%  1.841e+09            -0.5%  1.757e+09        perf-stat.i.branch-instructions
      0.45            -6.2%       0.42            -1.0%       0.44        perf-stat.i.cpi
 8.347e+09            +6.6%    8.9e+09            +0.9%  8.426e+09        perf-stat.i.instructions
      2.27            +6.6%       2.42            +1.3%       2.30        perf-stat.i.ipc
      0.03 ±  4%      -2.0%       0.03 ±  3%      -5.4%       0.03 ±  2%  perf-stat.overall.MPKI
      0.44            -5.9%       0.42            -1.5%       0.44        perf-stat.overall.cpi
      2.25            +6.2%       2.39            +1.6%       2.29        perf-stat.overall.ipc
 1.761e+09            +4.2%  1.834e+09            -0.5%  1.752e+09        perf-stat.ps.branch-instructions
 8.319e+09            +6.6%   8.87e+09            +1.0%  8.398e+09        perf-stat.ps.instructions
 2.519e+12            +6.4%   2.68e+12            +1.3%  2.552e+12        perf-stat.total.instructions
      7.07            -7.1        0.00            -7.1        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.30            -6.3        0.00            -6.3        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     18.35            -1.0       17.36            -0.7       17.68        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.40            -0.9       15.47            -0.6       15.83        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
     10.17            -0.8        9.36            -0.4        9.76        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
     11.92            -0.8       11.12            -0.4       11.48        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      5.07 ±  3%      -0.2        4.84 ±  2%      -0.1        4.94        perf-profile.calltrace.cycles-pp.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.40 ±  3%      -0.2        5.18 ±  2%      -0.0        5.35        perf-profile.calltrace.cycles-pp.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      3.66 ±  2%      -0.2        3.50 ±  2%      -0.0        3.61 ±  2%  perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64
      1.66 ±  2%      -0.1        1.56 ±  3%      -0.0        1.64 ±  4%  perf-profile.calltrace.cycles-pp.up_write.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.68 ±  3%      -0.1        0.60 ±  5%      -0.1        0.60 ±  8%  perf-profile.calltrace.cycles-pp.kfree.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      5.91 ±  2%      -0.1        5.85            +0.4        6.34        perf-profile.calltrace.cycles-pp.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      4.23 ±  2%      -0.0        4.21            +0.2        4.47 ±  2%  perf-profile.calltrace.cycles-pp.mas_walk.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.37 ± 70%      +0.3        0.67 ±  4%      +0.3        0.64 ±  5%  perf-profile.calltrace.cycles-pp.strlen.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      0.49 ± 44%      +0.5        1.02 ±  5%      +0.6        1.04 ±  4%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     83.74            +0.5       84.28            +0.1       83.83        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.6        0.60 ±  6%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.7        0.65 ±  7%      +0.7        0.72 ±  7%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.7        0.68 ±  4%      +0.7        0.70 ±  3%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.7        0.68 ±  2%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     80.24            +0.7       80.95            +0.1       80.38        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.7        0.74 ±  2%      +0.8        0.82 ±  4%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +0.8        0.75 ±  4%      +0.6        0.56 ±  4%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.8        0.81 ±  3%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.8        0.84 ±  5%      +0.9        0.89 ±  2%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.3        1.30 ±  5%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +1.4        1.35 ±  4%      +2.0        2.03 ±  3%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.6        1.60 ±  4%      +1.7        1.66 ±  3%  perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.8        1.76 ±  2%      +1.8        1.82 ±  3%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
      0.00            +1.8        1.78 ±  2%      +2.0        1.99 ±  2%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.0        2.03            +2.1        2.13 ±  2%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +2.1        2.06 ±  3%      +2.0        2.04 ±  3%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.3        2.29 ±  3%      +2.3        2.26 ±  3%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
     53.64            +2.6       56.21            +0.7       54.36        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +3.1        3.14 ±  2%      +3.3        3.27 ±  2%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +3.2        3.25            +3.4        3.43        perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +3.8        3.84            +4.1        4.05 ±  2%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +5.3        5.31 ±  2%      +5.6        5.56        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +6.1        6.07            +6.2        6.24 ±  2%  perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00           +27.7       27.74           +29.5       29.50        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00           +32.4       32.43           +31.6       31.64        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     18.49            -1.0       17.47            -0.7       17.81        perf-profile.children.cycles-pp.perf_event_mmap
      6.54            -1.0        5.54 ±  2%      -0.7        5.82        perf-profile.children.cycles-pp.mas_preallocate
      7.40            -1.0        6.40 ±  2%      -0.8        6.56        perf-profile.children.cycles-pp.mas_store_prealloc
      5.68            -1.0        4.72            -0.1        5.54 ±  2%  perf-profile.children.cycles-pp.up_write
     16.88            -0.9       15.93            -0.6       16.27        perf-profile.children.cycles-pp.perf_event_mmap_event
     10.35            -0.8        9.53            -0.4        9.92        perf-profile.children.cycles-pp.perf_event_mmap_output
     12.16            -0.8       11.35            -0.4       11.73        perf-profile.children.cycles-pp.perf_iterate_sb
      4.02 ±  2%      -0.7        3.32            -0.5        3.50        perf-profile.children.cycles-pp.mas_wr_store_type
      2.97            -0.6        2.37 ±  3%      -0.6        2.35 ±  3%  perf-profile.children.cycles-pp.mas_update_gap
      1.36 ±  8%      -0.6        0.80 ±  4%      -0.8        0.61 ±  5%  perf-profile.children.cycles-pp.can_vma_merge_after
      2.26 ±  2%      -0.5        1.80 ±  2%      -0.4        1.87 ±  3%  perf-profile.children.cycles-pp.mas_leaf_max_gap
      3.71 ±  2%      -0.3        3.44            -0.1        3.58 ±  2%  perf-profile.children.cycles-pp.vma_complete
      5.62 ±  3%      -0.2        5.40 ±  2%      -0.1        5.57        perf-profile.children.cycles-pp.check_brk_limits
      3.83 ±  2%      -0.2        3.65 ±  2%      -0.0        3.78 ±  2%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
      0.66 ±  7%      -0.1        0.55 ±  9%      -0.1        0.53 ±  6%  perf-profile.children.cycles-pp.may_expand_vm
      0.78 ±  3%      -0.1        0.69 ±  4%      -0.1        0.70 ±  6%  perf-profile.children.cycles-pp.kfree
      0.15 ± 12%      -0.1        0.08 ± 13%      -0.1        0.08 ±  8%  perf-profile.children.cycles-pp.arch_vma_name
      6.23 ±  2%      -0.1        6.17            +0.4        6.65        perf-profile.children.cycles-pp.mas_find
      4.32 ±  2%      -0.0        4.30            +0.2        4.56 ±  2%  perf-profile.children.cycles-pp.mas_walk
      0.23 ±  7%      +0.0        0.24 ± 18%      +0.0        0.26 ±  2%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.24 ±  7%      +0.0        0.24 ± 18%      +0.0        0.27 ±  2%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.81 ±  6%      +0.0        0.83 ±  3%      +0.1        0.91 ±  4%  perf-profile.children.cycles-pp.vma_adjust_trans_huge
      0.54 ±  5%      +0.0        0.58 ± 12%      +0.1        0.59 ±  4%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.57 ±  4%      +0.0        0.62 ± 12%      +0.1        0.63 ±  3%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.58 ±  7%      +0.1        0.66 ±  7%      +0.1        0.73 ±  7%  perf-profile.children.cycles-pp.mas_wr_slot_store
      0.19 ± 10%      +0.1        0.31 ± 10%      +0.1        0.32 ± 10%  perf-profile.children.cycles-pp.rb_next
      0.50 ±  4%      +0.1        0.62 ±  7%      +0.1        0.61 ±  9%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.40 ±  6%      +0.1        0.53 ±  6%      +0.1        0.47 ± 10%  perf-profile.children.cycles-pp.strnlen
      0.58 ± 13%      +0.2        0.75 ±  4%      +0.1        0.72 ±  5%  perf-profile.children.cycles-pp.strlen
      0.96 ±  6%      +0.2        1.14 ±  3%      +0.2        1.20 ±  4%  perf-profile.children.cycles-pp.rcu_all_qs
      0.68 ±  3%      +0.3        0.98 ±  5%      +0.4        1.05 ±  3%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
      1.77 ±  4%      +0.3        2.09            +0.4        2.14 ±  3%  perf-profile.children.cycles-pp.__cond_resched
      0.00            +0.4        0.36 ±  9%      +0.0        0.00        perf-profile.children.cycles-pp.mas_next_setup
      0.36 ±  8%      +0.4        0.76 ±  3%      +0.4        0.78 ±  4%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.48 ±  7%      +0.4        0.90 ±  6%      -0.0        0.47 ±  7%  perf-profile.children.cycles-pp.mas_prev_setup
      0.67 ±  9%      +0.6        1.24 ±  4%      +0.6        1.29 ±  3%  perf-profile.children.cycles-pp.__vm_enough_memory
      3.81            +0.6        4.39            +0.7        4.48        perf-profile.children.cycles-pp.down_write
     80.98            +0.7       81.64            +0.1       81.13        perf-profile.children.cycles-pp.__do_sys_brk
      1.05 ±  4%      +0.7        1.72 ±  3%      -0.0        1.00 ±  4%  perf-profile.children.cycles-pp.mas_next_slot
      0.00            +0.7        0.70 ±  6%      +0.0        0.00        perf-profile.children.cycles-pp.mas_next_range
      1.11 ±  4%      +1.0        2.10 ±  3%      -0.0        1.11 ±  4%  perf-profile.children.cycles-pp.mas_prev
      2.82 ±  3%      +1.2        4.07            +1.5        4.30 ±  2%  perf-profile.children.cycles-pp.vma_prepare
      1.54 ±  4%      +1.3        2.88 ±  3%      -0.0        1.53 ±  6%  perf-profile.children.cycles-pp.mas_prev_slot
     54.97            +1.6       56.61            -0.2       54.72        perf-profile.children.cycles-pp.do_brk_flags
      0.00           +28.6       28.64           +30.4       30.44        perf-profile.children.cycles-pp.vma_expand
      0.00           +32.9       32.91           +32.0       31.95        perf-profile.children.cycles-pp.vma_merge_new_range
      5.90 ±  2%      -3.5        2.37 ±  4%      -3.5        2.44 ±  3%  perf-profile.self.cycles-pp.do_brk_flags
      5.36 ±  2%      -1.0        4.38            -0.1        5.21 ±  2%  perf-profile.self.cycles-pp.up_write
     10.18            -0.8        9.36            -0.4        9.74        perf-profile.self.cycles-pp.perf_event_mmap_output
      3.86 ±  2%      -0.7        3.18            -0.5        3.33        perf-profile.self.cycles-pp.mas_wr_store_type
      1.28 ±  7%      -0.5        0.74 ±  4%      -0.7        0.55 ±  7%  perf-profile.self.cycles-pp.can_vma_merge_after
      3.02 ±  2%      -0.5        2.52 ±  4%      -0.4        2.57 ±  2%  perf-profile.self.cycles-pp.mas_store_prealloc
      2.19 ±  2%      -0.4        1.78 ±  2%      -0.4        1.82 ±  3%  perf-profile.self.cycles-pp.mas_leaf_max_gap
      5.03            -0.4        4.67            +0.1        5.09        perf-profile.self.cycles-pp.__do_sys_brk
      2.60 ±  4%      -0.3        2.27 ±  5%      -0.2        2.40 ±  2%  perf-profile.self.cycles-pp.mas_preallocate
      1.89 ±  4%      -0.3        1.59 ±  4%      -0.3        1.55 ±  5%  perf-profile.self.cycles-pp.perf_event_mmap_event
      1.71 ±  4%      -0.2        1.53 ±  3%      -0.1        1.59 ±  6%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.74 ±  3%      -0.2        0.57 ±  7%      -0.2        0.52 ±  6%  perf-profile.self.cycles-pp.mas_update_gap
      1.89 ±  4%      -0.2        1.73 ±  2%      +0.0        1.90        perf-profile.self.cycles-pp.init_multi_vma_prep
      1.58 ±  4%      -0.1        1.47 ±  3%      -0.1        1.50 ±  5%  perf-profile.self.cycles-pp.perf_event_mmap
      1.27 ±  2%      -0.1        1.16 ±  2%      -0.1        1.22 ±  4%  perf-profile.self.cycles-pp.vma_complete
      0.69 ±  2%      -0.1        0.61 ±  4%      -0.1        0.61 ±  7%  perf-profile.self.cycles-pp.kfree
      4.24 ±  2%      -0.0        4.20            +0.2        4.48 ±  2%  perf-profile.self.cycles-pp.mas_walk
      1.02            +0.0        1.05 ±  4%      +0.2        1.21 ±  4%  perf-profile.self.cycles-pp.mas_find
      0.55 ±  7%      +0.0        0.59 ±  8%      +0.1        0.65 ±  6%  perf-profile.self.cycles-pp.mas_wr_slot_store
      0.18 ± 16%      +0.0        0.23 ± 14%      +0.1        0.24 ±  5%  perf-profile.self.cycles-pp.cap_vm_enough_memory
      0.15 ± 10%      +0.1        0.24 ± 11%      +0.1        0.23 ± 10%  perf-profile.self.cycles-pp.rb_next
      0.58 ±  8%      +0.1        0.68 ±  5%      +0.2        0.74 ±  3%  perf-profile.self.cycles-pp.rcu_all_qs
      0.37 ±  5%      +0.1        0.50 ±  7%      +0.1        0.45 ± 10%  perf-profile.self.cycles-pp.strnlen
      0.54 ± 13%      +0.2        0.68 ±  4%      +0.1        0.66 ±  6%  perf-profile.self.cycles-pp.strlen
      1.01 ±  6%      +0.2        1.17 ±  2%      +0.2        1.17 ±  4%  perf-profile.self.cycles-pp.__cond_resched
      0.66 ±  6%      +0.2        0.83 ±  2%      +0.2        0.87 ±  5%  perf-profile.self.cycles-pp.vma_prepare
      0.46 ±  6%      +0.2        0.67 ±  7%      +0.3        0.74 ±  5%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
      0.34 ± 12%      +0.2        0.54 ±  3%      +0.2        0.58 ±  6%  perf-profile.self.cycles-pp.__vm_enough_memory
      0.00            +0.3        0.29 ± 10%      +0.0        0.00        perf-profile.self.cycles-pp.mas_next_setup
      0.32 ± 11%      +0.3        0.62 ±  7%      +0.0        0.32 ±  5%  perf-profile.self.cycles-pp.mas_prev_setup
      0.23 ±  7%      +0.3        0.55 ±  6%      +0.3        0.55 ±  3%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.00            +0.4        0.35 ±  7%      +0.0        0.00        perf-profile.self.cycles-pp.mas_next_range
      2.65 ±  3%      +0.4        3.00 ±  2%      +0.4        3.07        perf-profile.self.cycles-pp.down_write
      0.64 ±  5%      +0.6        1.21 ±  3%      -0.0        0.62 ±  5%  perf-profile.self.cycles-pp.mas_prev
      0.89 ±  5%      +0.7        1.54 ±  3%      -0.0        0.85 ±  5%  perf-profile.self.cycles-pp.mas_next_slot
      1.46 ±  4%      +1.3        2.72 ±  3%      -0.0        1.45 ±  6%  perf-profile.self.cycles-pp.mas_prev_slot
      0.00            +1.3        1.33 ±  2%      +0.9        0.86 ±  4%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +3.5        3.54 ±  3%      +3.7        3.73        perf-profile.self.cycles-pp.vma_expand

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-emr-2sp1/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  9cecc5dc893886 ("mm: add expand-only VMA merge mode and optimise do_brk_flags()")
  
fc21959f74bc1138 cacded5e42b9609b07b22d80c10 9cecc5dc89388676d1d0d47461c
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3669298            -6.5%    3430070            -2.7%    3571699        aim9.brk_test.ops_per_sec
     23.53            -4.9%      22.38            -2.0%      23.06        time.user_time
    491107 ±  5%      -7.2%     455906 ±  6%      -5.1%     466069 ±  4%  meminfo.Active
    491011 ±  5%      -7.2%     455810 ±  6%      -5.1%     465957 ±  4%  meminfo.Active(anon)
    505666 ±  5%      -7.0%     470410 ±  5%      -5.0%     480514 ±  4%  meminfo.Shmem
     10118 ± 40%     -61.8%       3861 ± 40%     -24.4%       7653 ± 61%  numa-vmstat.node1.nr_slab_reclaimable
    121015 ±  6%      -7.3%     112196 ±  6%     -18.6%      98460 ± 20%  numa-vmstat.node3.nr_active_anon
    121371 ±  6%      -7.3%     112537 ±  6%     -18.6%      98831 ± 20%  numa-vmstat.node3.nr_shmem
    121015 ±  6%      -7.3%     112196 ±  6%     -18.6%      98460 ± 20%  numa-vmstat.node3.nr_zone_active_anon
     40474 ± 40%     -61.8%      15444 ± 40%     -24.4%      30612 ± 61%  numa-meminfo.node1.KReclaimable
     40474 ± 40%     -61.8%      15444 ± 40%     -24.4%      30612 ± 61%  numa-meminfo.node1.SReclaimable
    484115 ±  6%      -7.3%     448760 ±  6%     -18.7%     393817 ± 20%  numa-meminfo.node3.Active
    484083 ±  6%      -7.3%     448760 ±  6%     -18.7%     393798 ± 20%  numa-meminfo.node3.Active(anon)
    485577 ±  6%      -7.3%     450224 ±  6%     -18.6%     395333 ± 20%  numa-meminfo.node3.Shmem
    122753 ±  5%      -7.1%     113979 ±  6%      -5.1%     116468 ±  4%  proc-vmstat.nr_active_anon
    899298            -1.0%     890515            -0.7%     892993        proc-vmstat.nr_file_pages
    126417 ±  5%      -6.9%     117634 ±  5%      -5.0%     120109 ±  4%  proc-vmstat.nr_shmem
    122753 ±  5%      -7.1%     113979 ±  6%      -5.1%     116468 ±  4%  proc-vmstat.nr_zone_active_anon
    595.50 ± 22%     +53.6%     914.50 ± 12%     +34.1%     798.50 ± 37%  proc-vmstat.numa_hint_faults_local
     17958            -4.3%      17188 ±  2%      -5.1%      17041 ±  2%  proc-vmstat.pgactivate
      0.01 ± 52%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.01 ± 15%      +7.0%       0.01 ± 16%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.06 ± 69%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.01 ± 17%      -3.8%       0.01 ± 21%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    400.06            +0.0%     400.07          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
     10.00            +0.0%      10.00          -100.0%       0.00        perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    999.53            -0.0%     999.38          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 52%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
    400.05            +0.0%     400.06          -100.0%       0.00        perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.06 ± 69%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
    999.52            -0.0%     999.37          -100.0%       0.00        perf-sched.wait_time.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
 2.071e+09            +2.8%  2.128e+09            -1.5%   2.04e+09        perf-stat.i.branch-instructions
      0.48            -4.2%       0.46            +1.4%       0.48 ±  2%  perf-stat.i.cpi
 4.717e+09            -0.7%  4.686e+09            -0.9%  4.676e+09        perf-stat.i.cpu-cycles
 9.794e+09            +5.1%   1.03e+10            -0.1%  9.787e+09        perf-stat.i.instructions
      2.15            +5.8%       2.28            +0.5%       2.16        perf-stat.i.ipc
      0.34 ±  3%      -0.0        0.33            -0.0        0.33 ±  3%  perf-stat.overall.branch-miss-rate%
      0.48            -5.5%       0.46            -0.8%       0.48        perf-stat.overall.cpi
      2.08            +5.8%       2.20            +0.8%       2.09        perf-stat.overall.ipc
 2.063e+09            +2.8%   2.12e+09            -1.5%  2.032e+09        perf-stat.ps.branch-instructions
 4.703e+09            -0.7%  4.672e+09            -0.9%  4.662e+09        perf-stat.ps.cpu-cycles
 9.758e+09            +5.1%  1.026e+10            -0.1%  9.751e+09        perf-stat.ps.instructions
 2.944e+12            +5.5%  3.106e+12            +0.4%  2.957e+12        perf-stat.total.instructions
      6.54 ±  2%      -6.5        0.00            -6.5        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.22            -6.2        0.00            -6.2        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      9.69 ±  2%      -0.6        9.07            +0.3       10.01        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
     11.30 ±  2%      -0.6       10.71            +0.5       11.80        perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
     15.57            -0.5       15.05            +0.6       16.16        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
      3.61 ±  5%      -0.2        3.38 ±  4%      -0.3        3.33 ±  3%  perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64
      2.76            -0.1        2.62 ±  3%      -0.1        2.67 ±  2%  perf-profile.calltrace.cycles-pp.userfaultfd_unmap_complete.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.58 ±  3%      -0.1        0.44 ± 44%      -0.2        0.34 ± 70%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.brk
      0.84 ±  4%      -0.1        0.74 ±  8%      -0.1        0.75 ±  4%  perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.12 ±  5%      +0.2        1.29 ±  3%      +0.1        1.27 ±  3%  perf-profile.calltrace.cycles-pp.sized_strscpy.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      0.65 ±  6%      +0.4        1.07 ±  5%      +0.5        1.11 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.5        0.54 ±  4%      +0.6        0.56 ±  6%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.5        0.55 ±  4%      +0.2        0.18 ±141%  perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.7        0.66 ±  4%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.7        0.68 ±  9%      +0.7        0.68 ±  4%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.7        0.68 ±  4%      +0.7        0.66 ±  3%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.8        0.76 ±  2%      +0.8        0.76 ±  3%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +0.8        0.80 ±  3%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.8        0.82 ±  3%      +0.9        0.88 ±  4%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
     77.52            +1.0       78.50            +0.7       78.17        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +1.3        1.26 ±  3%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +1.3        1.35 ±  3%      +1.9        1.89 ±  2%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.6        1.56 ±  2%      +1.6        1.59 ±  3%  perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.7        1.72 ±  3%      +1.8        1.85 ±  3%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.9        1.87 ±  4%      +2.1        2.08 ±  3%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
      0.00            +2.1        2.07 ±  2%      +2.1        2.08 ±  5%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +2.1        2.14 ±  2%      +2.1        2.12 ±  2%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.4        2.37 ±  2%      +2.7        2.72 ±  2%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
     51.80            +2.9       54.66            +2.0       53.75        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +3.0        3.02 ±  2%      +2.9        2.90 ±  3%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +3.1        3.06            +3.1        3.10 ±  2%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +3.9        3.86            +3.9        3.92 ±  2%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +5.0        5.01            +4.9        4.88        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +5.9        5.88            +6.3        6.35        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00           +27.1       27.13           +28.6       28.60        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00           +31.6       31.63           +30.9       30.88        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.46            -1.2        5.24            -1.4        5.11        perf-profile.children.cycles-pp.mas_preallocate
      5.54            -0.9        4.64            -0.3        5.27 ±  2%  perf-profile.children.cycles-pp.up_write
      3.99            -0.9        3.10 ±  2%      -1.0        2.98 ±  3%  perf-profile.children.cycles-pp.mas_wr_store_type
      9.85 ±  2%      -0.6        9.22            +0.3       10.18        perf-profile.children.cycles-pp.perf_event_mmap_output
      6.82 ±  2%      -0.6        6.22            -0.1        6.70        perf-profile.children.cycles-pp.mas_store_prealloc
      1.33 ±  5%      -0.6        0.75 ±  4%      -0.6        0.73 ±  4%  perf-profile.children.cycles-pp.can_vma_merge_after
     11.53 ±  2%      -0.6       10.96            +0.5       12.04        perf-profile.children.cycles-pp.perf_iterate_sb
     16.03            -0.5       15.50            +0.6       16.61        perf-profile.children.cycles-pp.perf_event_mmap_event
      2.65 ±  3%      -0.2        2.40 ±  3%      +0.2        2.82 ±  2%  perf-profile.children.cycles-pp.mas_update_gap
      2.18 ±  2%      -0.2        1.94 ±  3%      -0.0        2.15 ±  2%  perf-profile.children.cycles-pp.mas_leaf_max_gap
      3.72 ±  6%      -0.2        3.48 ±  4%      -0.3        3.44 ±  4%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
      3.52            -0.1        3.38 ±  2%      -0.1        3.41 ±  2%  perf-profile.children.cycles-pp.vma_complete
      0.62 ±  7%      -0.1        0.48 ±  9%      -0.1        0.53 ±  3%  perf-profile.children.cycles-pp.may_expand_vm
      1.92 ±  2%      -0.1        1.79 ±  3%      -0.0        1.92 ±  4%  perf-profile.children.cycles-pp.init_multi_vma_prep
      0.40 ±  6%      -0.1        0.35 ±  9%      -0.0        0.40 ±  3%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.35 ±  2%      -0.0        0.33 ±  7%      -0.1        0.29 ±  9%  perf-profile.children.cycles-pp.brk_test
      0.07 ± 18%      -0.0        0.06 ± 45%      -0.0        0.05 ±  7%  perf-profile.children.cycles-pp.elf_load
      0.18 ±  7%      -0.0        0.17 ± 23%      +0.1        0.28 ± 10%  perf-profile.children.cycles-pp.khugepaged_enter_vma
      0.44 ±  4%      -0.0        0.44 ±  5%      +0.0        0.46 ±  2%  perf-profile.children.cycles-pp.mas_destroy
      0.11 ± 20%      +0.0        0.14 ±  8%      +0.0        0.14 ± 11%  perf-profile.children.cycles-pp.anon_vma_interval_tree_remove
      0.52 ±  3%      +0.0        0.56 ±  4%      +0.1        0.60 ±  5%  perf-profile.children.cycles-pp.mas_wr_slot_store
      0.20 ± 11%      +0.1        0.28 ±  7%      +0.1        0.32 ±  9%  perf-profile.children.cycles-pp.rb_next
      0.49 ±  3%      +0.1        0.61 ±  4%      +0.1        0.57 ±  5%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.98 ±  4%      +0.1        1.11 ±  3%      +0.2        1.13 ±  6%  perf-profile.children.cycles-pp.rcu_all_qs
      0.39 ±  7%      +0.2        0.55 ±  7%      +0.1        0.53 ±  5%  perf-profile.children.cycles-pp.strnlen
      1.18 ±  5%      +0.2        1.37 ±  3%      +0.2        1.34 ±  3%  perf-profile.children.cycles-pp.sized_strscpy
      1.78 ±  3%      +0.3        2.04 ±  2%      +0.3        2.06 ±  4%  perf-profile.children.cycles-pp.__cond_resched
      0.00            +0.3        0.33 ±  4%      +0.0        0.00        perf-profile.children.cycles-pp.mas_next_setup
      0.41 ±  9%      +0.4        0.76 ±  7%      +0.4        0.77 ±  4%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.58 ±  4%      +0.4        0.96 ±  2%      +0.5        1.04 ±  3%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
      0.44 ± 17%      +0.4        0.85 ±  7%      -0.0        0.42 ±  4%  perf-profile.children.cycles-pp.mas_prev_setup
      4.11 ±  2%      +0.4        4.52            +0.4        4.52 ±  2%  perf-profile.children.cycles-pp.down_write
      0.74 ±  6%      +0.6        1.29 ±  5%      +0.6        1.35 ±  3%  perf-profile.children.cycles-pp.__vm_enough_memory
      0.00            +0.7        0.67 ±  6%      +0.0        0.00        perf-profile.children.cycles-pp.mas_next_range
      0.95 ±  5%      +0.7        1.64 ±  2%      +0.0        0.98 ±  3%  perf-profile.children.cycles-pp.mas_next_slot
     78.23            +0.9       79.17            +0.6       78.84        perf-profile.children.cycles-pp.__do_sys_brk
      1.02 ± 14%      +1.0        1.99 ±  4%      -0.0        1.01 ±  3%  perf-profile.children.cycles-pp.mas_prev
      2.89 ±  3%      +1.2        4.10            +1.2        4.14 ±  3%  perf-profile.children.cycles-pp.vma_prepare
      1.38 ± 12%      +1.3        2.73 ±  4%      +0.1        1.48 ±  4%  perf-profile.children.cycles-pp.mas_prev_slot
     53.08            +1.9       55.03            +1.0       54.10        perf-profile.children.cycles-pp.do_brk_flags
      0.00           +28.0       27.95           +29.5       29.47        perf-profile.children.cycles-pp.vma_expand
      0.00           +32.1       32.10           +31.2       31.24        perf-profile.children.cycles-pp.vma_merge_new_range
      5.69            -3.4        2.34 ±  3%      -3.4        2.34 ±  2%  perf-profile.self.cycles-pp.do_brk_flags
      5.22            -0.9        4.33 ±  2%      -0.3        4.95 ±  3%  perf-profile.self.cycles-pp.up_write
      3.82            -0.9        2.95 ±  3%      -1.0        2.83 ±  3%  perf-profile.self.cycles-pp.mas_wr_store_type
      9.68 ±  2%      -0.6        9.05            +0.3       10.00        perf-profile.self.cycles-pp.perf_event_mmap_output
      1.28 ±  5%      -0.6        0.69 ±  6%      -0.6        0.67 ±  4%  perf-profile.self.cycles-pp.can_vma_merge_after
      2.88 ±  3%      -0.4        2.44 ±  2%      -0.4        2.49        perf-profile.self.cycles-pp.mas_store_prealloc
      2.55            -0.3        2.22 ±  2%      -0.4        2.19 ±  3%  perf-profile.self.cycles-pp.mas_preallocate
      4.98 ±  2%      -0.3        4.70            -0.3        4.71        perf-profile.self.cycles-pp.__do_sys_brk
      2.15 ±  3%      -0.2        1.93 ±  3%      -0.0        2.12 ±  2%  perf-profile.self.cycles-pp.mas_leaf_max_gap
      1.82            -0.2        1.60 ±  4%      -0.1        1.67 ±  2%  perf-profile.self.cycles-pp.perf_event_mmap_event
      1.51 ±  4%      -0.2        1.31 ±  4%      -0.2        1.35 ±  6%  perf-profile.self.cycles-pp.perf_event_mmap
      1.85 ±  2%      -0.2        1.66 ±  3%      -0.1        1.77 ±  4%  perf-profile.self.cycles-pp.init_multi_vma_prep
      2.38 ±  7%      -0.2        2.20 ±  5%      -0.2        2.17 ±  3%  perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown_vmflags
      5.65 ±  2%      -0.1        5.50            -0.3        5.40 ±  2%  perf-profile.self.cycles-pp.brk
      2.77            -0.1        2.63 ±  3%      -0.1        2.67 ±  2%  perf-profile.self.cycles-pp.userfaultfd_unmap_complete
      0.75 ±  5%      -0.1        0.67 ±  4%      -0.1        0.66 ±  2%  perf-profile.self.cycles-pp.security_vm_enough_memory_mm
      0.28 ±  5%      -0.0        0.26 ±  8%      -0.1        0.22 ± 11%  perf-profile.self.cycles-pp.brk_test
      0.45 ±  7%      -0.0        0.45 ±  8%      +0.2        0.66 ±  5%  perf-profile.self.cycles-pp.mas_update_gap
      0.11 ± 12%      +0.0        0.11 ± 21%      +0.0        0.15 ± 10%  perf-profile.self.cycles-pp.khugepaged_enter_vma
      0.62 ±  4%      +0.0        0.65 ±  5%      +0.1        0.70 ±  4%  perf-profile.self.cycles-pp.rcu_all_qs
      0.03 ± 70%      +0.0        0.07 ± 14%      +0.0        0.07 ± 12%  perf-profile.self.cycles-pp.anon_vma_interval_tree_remove
      1.77 ±  4%      +0.1        1.82 ±  3%      +0.2        1.93 ±  3%  perf-profile.self.cycles-pp.perf_iterate_sb
      0.15 ± 12%      +0.1        0.20 ±  5%      +0.1        0.24 ± 10%  perf-profile.self.cycles-pp.rb_next
      0.40 ±  5%      +0.1        0.48 ±  5%      +0.0        0.45 ±  5%  perf-profile.self.cycles-pp.mas_wr_store_entry
      0.34 ±  6%      +0.2        0.50 ±  8%      +0.1        0.48 ±  4%  perf-profile.self.cycles-pp.strnlen
      0.66 ±  4%      +0.2        0.84 ±  5%      +0.1        0.79 ±  6%  perf-profile.self.cycles-pp.vma_prepare
      1.12 ±  5%      +0.2        1.30 ±  3%      +0.2        1.28 ±  3%  perf-profile.self.cycles-pp.sized_strscpy
      3.00 ±  2%      +0.2        3.19 ±  2%      +0.2        3.20 ±  3%  perf-profile.self.cycles-pp.down_write
      0.92 ±  4%      +0.2        1.13 ±  2%      +0.2        1.08 ±  4%  perf-profile.self.cycles-pp.__cond_resched
      0.00            +0.3        0.26 ±  7%      +0.0        0.00        perf-profile.self.cycles-pp.mas_next_setup
      0.28 ±  8%      +0.3        0.54 ±  8%      +0.2        0.53 ±  5%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.29 ± 12%      +0.3        0.58 ±  9%      +0.4        0.66 ±  4%  perf-profile.self.cycles-pp.__vm_enough_memory
      0.29 ± 24%      +0.3        0.58 ±  7%      -0.0        0.28 ±  8%  perf-profile.self.cycles-pp.mas_prev_setup
      0.40 ±  4%      +0.3        0.70 ±  2%      +0.3        0.74 ±  4%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
      0.00            +0.4        0.36 ±  6%      +0.0        0.00        perf-profile.self.cycles-pp.mas_next_range
      0.58 ± 14%      +0.5        1.12 ±  5%      +0.0        0.59 ±  4%  perf-profile.self.cycles-pp.mas_prev
      0.81 ±  4%      +0.7        1.48 ±  3%      +0.0        0.83 ±  3%  perf-profile.self.cycles-pp.mas_next_slot
      1.32 ± 11%      +1.3        2.59 ±  3%      +0.1        1.40 ±  5%  perf-profile.self.cycles-pp.mas_prev_slot
      0.00            +1.3        1.30 ±  4%      +1.0        1.03 ±  6%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +3.4        3.39            +3.9        3.88 ±  2%  perf-profile.self.cycles-pp.vma_expand

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  9cecc5dc893886 ("mm: add expand-only VMA merge mode and optimise do_brk_flags()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 9cecc5dc89388676d1d0d47461c
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3540976            -6.4%    3314159            -2.6%    3449384        aim9.brk_test.ops_per_sec
     23.65            -5.8%      22.28            -3.4%      22.84        time.user_time
   1046452 ±  3%      +4.2%    1090314           +26.9%    1327644 ± 24%  sched_debug.cpu.avg_idle.max
    535529 ±  5%      +3.1%     552160 ±  2%     +30.0%     696343 ± 30%  sched_debug.cpu.max_idle_balance_cost.max
    111409 ±  2%      -5.1%     105748 ±  3%      -3.4%     107624        proc-vmstat.nr_active_anon
    114711 ±  2%      -5.0%     109006 ±  3%      -3.3%     110946        proc-vmstat.nr_shmem
    111409 ±  2%      -5.1%     105748 ±  3%      -3.4%     107624        proc-vmstat.nr_zone_active_anon
     17422 ±  2%      -5.3%      16494            -0.4%      17353        proc-vmstat.pgactivate
 1.999e+09            +3.2%  2.064e+09            -1.3%  1.972e+09        perf-stat.i.branch-instructions
   6526528 ±  3%      -1.2%    6446015            -7.6%    6028010 ±  4%  perf-stat.i.branch-misses
      0.47            -5.1%       0.44            -0.1%       0.47        perf-stat.i.cpi
 9.452e+09            +5.6%  9.983e+09            +0.1%  9.461e+09        perf-stat.i.instructions
      2.19            +5.8%       2.31            +0.7%       2.20        perf-stat.i.ipc
      0.33 ±  3%      -0.0        0.31            -0.0        0.30 ±  4%  perf-stat.overall.branch-miss-rate%
      0.47            -5.1%       0.45            -0.8%       0.47        perf-stat.overall.cpi
      2.12            +5.4%       2.23            +0.8%       2.13        perf-stat.overall.ipc
 1.991e+09            +3.2%  2.056e+09            -1.3%  1.964e+09        perf-stat.ps.branch-instructions
   6486215 ±  3%      -1.2%    6410482            -7.7%    5987617 ±  4%  perf-stat.ps.branch-misses
 9.417e+09            +5.6%  9.946e+09            +0.1%  9.426e+09        perf-stat.ps.instructions
 2.841e+12            +5.7%  3.002e+12            +0.4%  2.852e+12        perf-stat.total.instructions
      0.01 ± 42%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.02 ± 37%     -68.5%       0.01 ± 44%     -32.3%       0.01 ± 82%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.06 ± 56%     -30.8%       0.04 ±105%     +79.5%       0.11 ± 12%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.04 ± 66%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.05 ± 47%     -75.3%       0.01 ± 83%     -34.8%       0.03 ± 93%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.08 ± 45%     -38.0%       0.05 ± 98%     +56.1%       0.13 ± 11%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.01 ±  9%     +33.8%       0.02 ± 18%     +11.2%       0.01 ±  9%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.08 ± 57%     -69.7%       0.02 ±146%     -86.4%       0.01 ± 67%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.10 ± 37%     +45.2%       0.15 ±  8%      -3.3%       0.10 ± 21%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      7209 ±  3%      -7.8%       6648 ±  2%      -2.4%       7039 ±  2%  perf-sched.total_wait_and_delay.count.ms
      1533 ±  6%     -10.2%       1377            -5.1%       1454 ±  6%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 42%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.02 ± 37%     -68.5%       0.01 ± 44%     -32.3%       0.01 ± 82%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.04 ± 66%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.05 ± 47%     -75.3%       0.01 ± 83%     -34.8%       0.03 ± 93%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.61            -6.6        0.00            -6.6        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.20            -6.2        0.00            -6.2        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     17.96            -1.1       16.87            -0.1       17.84        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     16.08            -1.0       15.10            +0.0       16.08        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
      9.85            -0.8        9.02            +0.1        9.94        perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
     11.56            -0.8       10.73            +0.2       11.80 ±  2%  perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      3.57 ±  2%      -0.1        3.43 ±  3%      -0.1        3.46        perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64
      2.35 ±  3%      -0.1        2.22 ±  3%      -0.2        2.16 ±  2%  perf-profile.calltrace.cycles-pp.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     10.59 ±  2%      -0.1       10.48            -0.4       10.15 ±  3%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.brk
      0.58 ±  4%      -0.1        0.48 ± 45%      -0.1        0.44 ± 44%  perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      5.67 ±  2%      -0.0        5.66 ±  4%      +0.4        6.02 ±  2%  perf-profile.calltrace.cycles-pp.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      3.93 ±  4%      +0.1        4.00 ±  4%      +0.2        4.12        perf-profile.calltrace.cycles-pp.mas_walk.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.64 ±  4%      +0.4        1.06 ±  2%      +0.5        1.10 ±  3%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.6        0.56 ±  5%      +0.6        0.58 ±  4%  perf-profile.calltrace.cycles-pp.mas_wr_store_entry.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.6        0.57 ±  6%      +0.6        0.60 ±  8%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.6        0.58 ±  7%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.7        0.69 ±  4%      +0.7        0.69 ±  6%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.7        0.70 ±  6%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.7        0.73 ±  8%      +0.8        0.77 ±  5%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +0.7        0.74 ±  5%      +0.8        0.79 ±  6%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.8        0.84 ±  2%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +0.9        0.88 ±  5%      +0.9        0.89 ±  3%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
     78.92            +0.9       79.81            +0.6       79.50        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +1.3        1.28 ±  2%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +1.4        1.42 ±  3%      +1.9        1.88 ±  3%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.6        1.59 ±  4%      +1.6        1.62 ±  3%  perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.8        1.80 ±  4%      +1.8        1.83 ±  4%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.9        1.89 ±  4%      +2.1        2.10 ±  2%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
      0.00            +2.1        2.06 ±  3%      +2.1        2.13        perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +2.1        2.12 ±  2%      +2.2        2.22 ±  3%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.4        2.43 ±  4%      +2.7        2.70 ±  2%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
     52.76            +2.6       55.40            +1.8       54.61        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +3.0        2.98            +3.0        2.95        perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +3.1        3.11 ±  3%      +3.1        3.13 ±  2%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +3.9        3.90 ±  2%      +4.0        4.00        perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +5.0        4.96            +5.0        4.97        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +6.0        6.04 ±  2%      +6.4        6.44        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00           +27.5       27.47           +29.1       29.12        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00           +32.1       32.09           +31.7       31.72        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.44            -1.2        5.20            -1.2        5.20        perf-profile.children.cycles-pp.mas_preallocate
     18.11            -1.1       16.99            -0.1       17.98        perf-profile.children.cycles-pp.perf_event_mmap
      4.01 ±  2%      -1.0        3.06            -1.0        3.03        perf-profile.children.cycles-pp.mas_wr_store_type
     16.54            -0.9       15.60            +0.0       16.55        perf-profile.children.cycles-pp.perf_event_mmap_event
     10.02            -0.8        9.18            +0.1       10.10        perf-profile.children.cycles-pp.perf_event_mmap_output
      5.61            -0.8        4.77            -0.3        5.32 ±  2%  perf-profile.children.cycles-pp.up_write
     11.80            -0.8       10.97            +0.2       12.02 ±  2%  perf-profile.children.cycles-pp.perf_iterate_sb
      1.39            -0.6        0.81 ±  3%      -0.5        0.86 ±  5%  perf-profile.children.cycles-pp.can_vma_merge_after
      6.89            -0.5        6.38            -0.1        6.79        perf-profile.children.cycles-pp.mas_store_prealloc
      3.67 ±  2%      -0.3        3.41 ±  3%      -0.2        3.44 ±  2%  perf-profile.children.cycles-pp.vma_complete
      2.20 ±  4%      -0.2        1.97 ±  3%      -0.0        2.16 ±  2%  perf-profile.children.cycles-pp.mas_leaf_max_gap
      2.68 ±  3%      -0.2        2.47 ±  3%      +0.1        2.78 ±  2%  perf-profile.children.cycles-pp.mas_update_gap
      2.51 ±  3%      -0.1        2.36 ±  3%      -0.2        2.32 ±  2%  perf-profile.children.cycles-pp.down_write_killable
      0.61 ±  5%      -0.1        0.49 ±  7%      -0.1        0.51 ±  8%  perf-profile.children.cycles-pp.may_expand_vm
      1.25 ±  4%      -0.1        1.14 ±  5%      -0.1        1.18 ±  3%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.14 ± 11%      -0.1        0.08 ± 12%      -0.1        0.05 ± 46%  perf-profile.children.cycles-pp.arch_vma_name
      0.42            -0.1        0.36 ±  4%      -0.0        0.41 ±  4%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.36 ±  6%      -0.1        0.31 ±  4%      -0.1        0.30 ±  9%  perf-profile.children.cycles-pp.brk_test
      0.25 ±  5%      -0.0        0.21 ±  9%      -0.0        0.22 ±  4%  perf-profile.children.cycles-pp.__rb_insert_augmented
      0.41 ±  6%      -0.0        0.38 ±  5%      -0.0        0.36 ±  8%  perf-profile.children.cycles-pp.cap_vm_enough_memory
      0.61 ±  4%      -0.0        0.59 ±  3%      -0.0        0.57 ±  2%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      5.98 ±  2%      -0.0        5.97 ±  4%      +0.4        6.34 ±  2%  perf-profile.children.cycles-pp.mas_find
      0.18 ± 18%      -0.0        0.17 ± 13%      +0.1        0.25 ±  6%  perf-profile.children.cycles-pp.khugepaged_enter_vma
      0.14 ±  3%      -0.0        0.14 ±  7%      -0.0        0.12 ±  7%  perf-profile.children.cycles-pp.intel_idle
      0.08 ± 10%      +0.0        0.12 ± 16%      +0.0        0.10 ± 11%  perf-profile.children.cycles-pp.mmap_region
      0.09 ±  8%      +0.0        0.12 ± 15%      +0.0        0.11 ± 10%  perf-profile.children.cycles-pp.do_mmap
      0.10 ± 14%      +0.0        0.15 ± 11%      +0.1        0.16 ± 12%  perf-profile.children.cycles-pp.anon_vma_interval_tree_remove
      4.02 ±  4%      +0.1        4.08 ±  4%      +0.2        4.20        perf-profile.children.cycles-pp.mas_walk
      1.01 ±  5%      +0.1        1.08 ±  5%      +0.1        1.12 ±  2%  perf-profile.children.cycles-pp.rcu_all_qs
      0.19 ±  5%      +0.1        0.30 ±  5%      +0.1        0.31 ±  6%  perf-profile.children.cycles-pp.rb_next
      1.27 ±  4%      +0.1        1.40 ±  2%      +0.0        1.31 ±  4%  perf-profile.children.cycles-pp.sized_strscpy
      0.42 ±  6%      +0.1        0.57 ±  6%      +0.1        0.54 ±  7%  perf-profile.children.cycles-pp.strnlen
      0.48 ±  4%      +0.2        0.64 ±  5%      +0.2        0.65 ±  3%  perf-profile.children.cycles-pp.mas_wr_store_entry
      1.80 ±  4%      +0.2        2.02 ±  2%      +0.3        2.09        perf-profile.children.cycles-pp.__cond_resched
      0.00            +0.3        0.31 ± 12%      +0.0        0.00        perf-profile.children.cycles-pp.mas_next_setup
      4.16 ±  3%      +0.3        4.48            +0.5        4.68        perf-profile.children.cycles-pp.down_write
      0.40 ±  6%      +0.4        0.78 ±  4%      +0.4        0.77 ±  6%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.48 ±  4%      +0.4        0.92 ±  5%      -0.0        0.44 ±  9%  perf-profile.children.cycles-pp.mas_prev_setup
      0.56 ±  6%      +0.5        1.02 ±  4%      +0.5        1.04 ±  3%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
      0.79 ±  4%      +0.5        1.30 ±  3%      +0.5        1.32 ±  3%  perf-profile.children.cycles-pp.__vm_enough_memory
      1.02 ±  2%      +0.7        1.72 ±  3%      -0.0        0.98 ±  2%  perf-profile.children.cycles-pp.mas_next_slot
      0.00            +0.7        0.70 ±  6%      +0.0        0.00        perf-profile.children.cycles-pp.mas_next_range
     79.62            +0.9       80.50            +0.6       80.20        perf-profile.children.cycles-pp.__do_sys_brk
      1.10 ±  3%      +1.0        2.10 ±  3%      -0.0        1.06 ±  2%  perf-profile.children.cycles-pp.mas_prev
      2.86 ±  3%      +1.3        4.12 ±  3%      +1.4        4.24        perf-profile.children.cycles-pp.vma_prepare
      1.45 ±  4%      +1.3        2.79 ±  3%      -0.0        1.41 ±  4%  perf-profile.children.cycles-pp.mas_prev_slot
     54.06            +1.8       55.82            +0.9       54.96        perf-profile.children.cycles-pp.do_brk_flags
      0.00           +28.3       28.30           +30.0       30.00        perf-profile.children.cycles-pp.vma_expand
      0.00           +32.6       32.58           +32.1       32.11        perf-profile.children.cycles-pp.vma_merge_new_range
      5.90 ±  2%      -3.4        2.47 ±  3%      -3.5        2.38        perf-profile.self.cycles-pp.do_brk_flags
      3.84 ±  2%      -0.9        2.90            -1.0        2.88        perf-profile.self.cycles-pp.mas_wr_store_type
      9.85            -0.8        9.02            +0.1        9.95        perf-profile.self.cycles-pp.perf_event_mmap_output
      5.26            -0.8        4.47 ±  2%      -0.3        5.00        perf-profile.self.cycles-pp.up_write
      1.34 ±  2%      -0.6        0.75 ±  3%      -0.6        0.78 ±  5%  perf-profile.self.cycles-pp.can_vma_merge_after
      2.86            -0.4        2.47 ±  5%      -0.3        2.53 ±  2%  perf-profile.self.cycles-pp.mas_store_prealloc
      2.50 ±  2%      -0.3        2.22 ±  2%      -0.3        2.24 ±  4%  perf-profile.self.cycles-pp.mas_preallocate
      5.02 ±  2%      -0.2        4.79            -0.2        4.78 ±  2%  perf-profile.self.cycles-pp.__do_sys_brk
      2.19 ±  4%      -0.2        1.96 ±  3%      -0.1        2.13 ±  2%  perf-profile.self.cycles-pp.mas_leaf_max_gap
      1.87 ±  3%      -0.2        1.66 ±  4%      -0.2        1.67 ±  2%  perf-profile.self.cycles-pp.perf_event_mmap_event
      1.52 ±  3%      -0.2        1.33 ±  2%      -0.1        1.40 ±  3%  perf-profile.self.cycles-pp.perf_event_mmap
      1.82 ±  3%      -0.1        1.68 ±  4%      -0.2        1.66 ±  2%  perf-profile.self.cycles-pp.down_write_killable
      1.84 ±  2%      -0.1        1.74 ±  4%      -0.1        1.77 ±  4%  perf-profile.self.cycles-pp.init_multi_vma_prep
      1.18 ±  2%      -0.1        1.09            -0.1        1.10 ±  5%  perf-profile.self.cycles-pp.do_syscall_64
      5.62 ±  2%      -0.1        5.54            -0.2        5.37 ±  2%  perf-profile.self.cycles-pp.brk
      0.79 ±  4%      -0.1        0.73 ±  6%      -0.1        0.71 ±  3%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.33 ±  4%      -0.0        0.30 ±  5%      +0.0        0.34 ±  4%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      1.08 ±  2%      -0.0        1.05 ±  4%      +0.2        1.29 ±  6%  perf-profile.self.cycles-pp.mas_find
      0.61 ±  4%      -0.0        0.59 ±  3%      -0.0        0.56 ±  2%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.14 ±  3%      -0.0        0.14 ±  7%      -0.0        0.12 ±  7%  perf-profile.self.cycles-pp.intel_idle
      0.10 ± 16%      +0.0        0.10 ± 16%      +0.0        0.14 ± 10%  perf-profile.self.cycles-pp.khugepaged_enter_vma
      0.29 ± 10%      +0.0        0.33 ± 15%      +0.1        0.37 ±  8%  perf-profile.self.cycles-pp.security_mmap_addr
      0.46 ±  3%      +0.0        0.51 ±  8%      +0.2        0.62 ±  4%  perf-profile.self.cycles-pp.mas_update_gap
      0.03 ± 70%      +0.1        0.08 ± 14%      +0.0        0.08 ± 14%  perf-profile.self.cycles-pp.anon_vma_interval_tree_remove
      3.94 ±  4%      +0.1        4.00 ±  4%      +0.2        4.12 ±  2%  perf-profile.self.cycles-pp.mas_walk
      0.13 ±  7%      +0.1        0.22 ± 10%      +0.1        0.24 ±  4%  perf-profile.self.cycles-pp.rb_next
      0.40 ±  7%      +0.1        0.50 ±  9%      +0.1        0.52 ±  4%  perf-profile.self.cycles-pp.mas_wr_store_entry
      1.21 ±  4%      +0.1        1.32 ±  2%      +0.0        1.24 ±  5%  perf-profile.self.cycles-pp.sized_strscpy
      0.38 ±  7%      +0.1        0.50 ±  7%      +0.1        0.48 ±  8%  perf-profile.self.cycles-pp.strnlen
      0.95 ±  7%      +0.1        1.10 ±  4%      +0.2        1.13 ±  2%  perf-profile.self.cycles-pp.__cond_resched
      0.63 ±  5%      +0.2        0.82 ±  9%      +0.2        0.84 ±  3%  perf-profile.self.cycles-pp.vma_prepare
      2.98 ±  2%      +0.2        3.20            +0.3        3.31        perf-profile.self.cycles-pp.down_write
      0.37 ±  6%      +0.2        0.60 ±  6%      +0.2        0.59 ±  5%  perf-profile.self.cycles-pp.__vm_enough_memory
      0.00            +0.2        0.24 ± 11%      +0.0        0.00        perf-profile.self.cycles-pp.mas_next_setup
      0.24 ±  6%      +0.3        0.54 ±  6%      +0.3        0.55 ±  8%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.32 ±  4%      +0.3        0.64 ±  5%      -0.0        0.29 ±  9%  perf-profile.self.cycles-pp.mas_prev_setup
      0.38 ± 10%      +0.3        0.72 ±  6%      +0.3        0.72 ±  5%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
      0.00            +0.4        0.41 ±  5%      +0.0        0.00        perf-profile.self.cycles-pp.mas_next_range
      0.63 ±  5%      +0.5        1.17 ±  3%      -0.0        0.61 ±  4%  perf-profile.self.cycles-pp.mas_prev
      0.87 ±  3%      +0.6        1.49 ±  3%      -0.0        0.83 ±  3%  perf-profile.self.cycles-pp.mas_next_slot
      1.37 ±  4%      +1.3        2.64 ±  3%      -0.0        1.33 ±  5%  perf-profile.self.cycles-pp.mas_prev_slot
      0.00            +1.3        1.30            +1.3        1.26 ±  6%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +3.4        3.45 ±  2%      +4.0        3.96 ±  2%  perf-profile.self.cycles-pp.vma_expand

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  9cecc5dc893886 ("mm: add expand-only VMA merge mode and optimise do_brk_flags()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 9cecc5dc89388676d1d0d47461c
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   2667734            -5.6%    2518021            -1.0%    2640850        aim9.brk_test.ops_per_sec
    196.00            +0.0%     196.00          +404.4%     988.58 ±168%  meminfo.Inactive(file)
     23.94            -8.7%      21.86 ±  2%      -4.6%      22.84        time.user_time
    948658            +2.3%     970280            +1.0%     958340        proc-vmstat.pgalloc_normal
    792310            -1.5%     780779            -0.1%     791672        proc-vmstat.pgfault
    814343            +2.4%     833987            +0.9%     821925        proc-vmstat.pgfree
      0.01 ± 47%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.06 ± 34%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
    399.96            -0.0%     399.92           -43.8%     224.93 ±  2%  perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
     10.00            +0.0%      10.00           +80.0%      18.00        perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 47%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
    399.95            -0.0%     399.92           -43.8%     224.93 ±  2%  perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.06 ± 34%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
 1.721e+09            +3.0%  1.773e+09            -0.2%  1.717e+09        perf-stat.i.branch-instructions
      0.54            -5.4%       0.52            -1.7%       0.54        perf-stat.i.cpi
 7.553e+09            +6.0%  8.003e+09            +1.2%  7.645e+09        perf-stat.i.instructions
      1.86            +6.1%       1.97            +2.0%       1.90        perf-stat.i.ipc
      0.36 ±  2%      -0.0        0.35            -0.0        0.34 ±  5%  perf-stat.overall.branch-miss-rate%
      0.55            -5.3%       0.52            -2.1%       0.54        perf-stat.overall.cpi
      1.82            +5.6%       1.92            +2.2%       1.86        perf-stat.overall.ipc
 1.715e+09            +3.0%  1.767e+09            -0.2%  1.711e+09        perf-stat.ps.branch-instructions
 7.529e+09            +5.9%  7.977e+09            +1.2%  7.621e+09        perf-stat.ps.instructions
 2.275e+12            +5.8%  2.408e+12            +1.7%  2.314e+12        perf-stat.total.instructions
      6.58 ±  2%      -6.6        0.00            -6.6        0.00        perf-profile.calltrace.cycles-pp.mas_store_prealloc.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.76 ±  2%      -5.8        0.00            -5.8        0.00        perf-profile.calltrace.cycles-pp.mas_preallocate.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     18.35            -1.3       17.10            -0.5       17.82        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
     15.92            -1.1       14.78 ±  2%      -0.6       15.36        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
     11.03            -0.7       10.33            -0.3       10.71 ±  2%  perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      4.22 ±  3%      -0.4        3.79 ±  2%      -0.2        4.00 ±  2%  perf-profile.calltrace.cycles-pp.mas_walk.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      8.48            -0.4        8.08 ±  2%      -0.2        8.32 ±  2%  perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap_event.perf_event_mmap.do_brk_flags
      5.32            -0.3        4.98            -0.2        5.08        perf-profile.calltrace.cycles-pp.clear_bhb_loop.brk
      5.38 ±  3%      -0.3        5.06            +0.0        5.39        perf-profile.calltrace.cycles-pp.mas_find.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      1.16 ±  7%      -0.3        0.86 ±  5%      -0.2        0.95 ±  4%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.88 ± 14%      -0.2        0.64 ±  8%      -0.1        0.75 ±  5%  perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.56            -0.2        5.38            +0.2        5.73        perf-profile.calltrace.cycles-pp.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.74 ±  6%      -0.2        0.57 ±  6%      -0.1        0.69 ±  3%  perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64
      5.09            -0.2        4.92            +0.1        5.22        perf-profile.calltrace.cycles-pp.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.73            -0.2        3.56 ±  2%      +0.1        3.79 ±  2%  perf-profile.calltrace.cycles-pp.thp_get_unmapped_area_vmflags.__get_unmapped_area.check_brk_limits.__do_sys_brk.do_syscall_64
      1.25 ±  2%      -0.2        1.08 ±  9%      -0.1        1.17 ±  7%  perf-profile.calltrace.cycles-pp.sized_strscpy.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk
      1.98 ±  2%      -0.1        1.84 ±  3%      +0.0        1.99 ±  4%  perf-profile.calltrace.cycles-pp.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.55 ±  2%      -0.1        0.42 ± 44%      +0.0        0.60 ±  8%  perf-profile.calltrace.cycles-pp.__cond_resched.down_write_killable.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.78 ±  3%      -0.1        0.72 ±  4%      -0.0        0.77 ±  3%  perf-profile.calltrace.cycles-pp.mas_prev.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.6        0.56 ±  5%      +0.5        0.46 ± 44%  perf-profile.calltrace.cycles-pp.anon_vma_interval_tree_insert.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.6        0.64            +0.6        0.64 ±  5%  perf-profile.calltrace.cycles-pp.vma_adjust_trans_huge.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +0.7        0.69 ±  8%      +0.7        0.66 ±  6%  perf-profile.calltrace.cycles-pp.__anon_vma_interval_tree_remove.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +0.8        0.78 ±  4%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_next_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.8        0.80 ±  2%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_prev.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
     82.26            +0.8       83.07            +0.7       82.94        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.8        0.82 ±  4%      +1.0        1.04 ±  3%  perf-profile.calltrace.cycles-pp.can_vma_merge_after.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
     81.38            +0.8       82.20            +0.6       82.03        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.8        0.84 ±  4%      +0.9        0.91 ±  6%  perf-profile.calltrace.cycles-pp.mas_wr_slot_store.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
     77.94            +1.0       78.92            +0.6       78.57        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +1.2        1.24 ±  3%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_prev_slot.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +1.3        1.32 ±  2%      +0.0        0.00        perf-profile.calltrace.cycles-pp.mas_next_slot.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +1.4        1.38 ±  2%      +1.4        1.39 ±  3%  perf-profile.calltrace.cycles-pp.down_write.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.4        1.40 ±  4%      +1.5        1.49 ±  4%  perf-profile.calltrace.cycles-pp.up_write.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.6        1.64            +1.7        1.74 ±  4%  perf-profile.calltrace.cycles-pp.up_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.8        1.82 ±  5%      +1.8        1.84 ±  2%  perf-profile.calltrace.cycles-pp.down_write.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.9        1.89 ±  3%      +2.0        2.04 ±  2%  perf-profile.calltrace.cycles-pp.init_multi_vma_prep.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.9        1.92 ±  3%      +2.0        2.01 ±  3%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
      0.00            +2.3        2.31            +2.3        2.33 ±  3%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +2.7        2.68 ±  2%      +2.8        2.78 ±  2%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +2.9        2.92 ±  4%      +3.0        2.96 ±  2%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.9        2.93 ±  2%      +2.8        2.84 ±  3%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
     53.19            +3.0       56.14            +1.2       54.34        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +4.4        4.42            +4.6        4.57        perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +7.1        7.09            +7.4        7.42        perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00           +26.8       26.83           +27.6       27.58        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00           +31.4       31.41           +30.5       30.46        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.93            -1.3        4.62            -1.2        4.76        perf-profile.children.cycles-pp.mas_preallocate
     18.48            -1.3       17.23            -0.5       17.96        perf-profile.children.cycles-pp.perf_event_mmap
     16.41            -1.2       15.23            -0.5       15.88        perf-profile.children.cycles-pp.perf_event_mmap_event
      3.46            -1.1        2.37            -1.1        2.40 ±  3%  perf-profile.children.cycles-pp.mas_wr_store_type
     11.24            -0.7       10.52 ±  2%      -0.3       10.93 ±  2%  perf-profile.children.cycles-pp.perf_iterate_sb
      4.29 ±  3%      -0.4        3.86 ±  2%      -0.2        4.06 ±  2%  perf-profile.children.cycles-pp.mas_walk
      8.61            -0.4        8.21 ±  2%      -0.2        8.44 ±  2%  perf-profile.children.cycles-pp.perf_event_mmap_output
      0.83 ±  7%      -0.4        0.47 ±  8%      -0.3        0.50 ±  7%  perf-profile.children.cycles-pp.may_expand_vm
      3.82            -0.3        3.48 ±  3%      -0.3        3.48 ±  3%  perf-profile.children.cycles-pp.down_write
      5.39            -0.3        5.06            -0.2        5.16        perf-profile.children.cycles-pp.clear_bhb_loop
      1.36 ±  5%      -0.3        1.03 ±  5%      -0.2        1.16 ±  3%  perf-profile.children.cycles-pp.__vm_enough_memory
      1.18 ±  5%      -0.3        0.88 ±  4%      -0.1        1.11 ±  2%  perf-profile.children.cycles-pp.can_vma_merge_after
      0.57 ± 22%      -0.2        0.33 ± 12%      -0.1        0.45 ±  8%  perf-profile.children.cycles-pp.cap_vm_enough_memory
      1.06 ± 11%      -0.2        0.83 ±  9%      -0.1        0.94 ±  4%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      5.30            -0.2        5.11            +0.1        5.42        perf-profile.children.cycles-pp.__get_unmapped_area
      5.75            -0.2        5.56            +0.2        5.93        perf-profile.children.cycles-pp.check_brk_limits
      0.82 ±  4%      -0.2        0.64 ±  4%      -0.1        0.76 ±  3%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      3.86            -0.2        3.69 ±  2%      +0.1        3.93 ±  2%  perf-profile.children.cycles-pp.thp_get_unmapped_area_vmflags
      1.32 ±  2%      -0.2        1.16 ±  8%      -0.1        1.24 ±  6%  perf-profile.children.cycles-pp.sized_strscpy
      2.10 ±  2%      -0.1        1.96 ±  2%      +0.0        2.11 ±  3%  perf-profile.children.cycles-pp.down_write_killable
      1.86 ±  3%      -0.1        1.74 ±  2%      -0.0        1.83 ±  5%  perf-profile.children.cycles-pp.__cond_resched
      0.57 ±  7%      -0.1        0.45 ± 13%      -0.1        0.46 ±  7%  perf-profile.children.cycles-pp.strlen
      2.78            -0.1        2.66 ±  2%      +0.0        2.80 ±  2%  perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown_vmflags
      3.16            -0.1        3.10            -0.2        3.00 ±  3%  perf-profile.children.cycles-pp.vma_prepare
      0.72 ±  5%      -0.0        0.68 ±  9%      -0.1        0.66 ±  4%  perf-profile.children.cycles-pp.brk_test
      0.20 ± 11%      -0.0        0.19 ±  9%      +0.1        0.28 ± 11%  perf-profile.children.cycles-pp.khugepaged_enter_vma
      1.92 ±  2%      +0.0        1.96 ±  3%      +0.2        2.10 ±  3%  perf-profile.children.cycles-pp.init_multi_vma_prep
      0.70 ±  2%      +0.1        0.81 ±  6%      +0.1        0.80 ±  6%  perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
     95.56            +0.2       95.72            +0.4       95.92        perf-profile.children.cycles-pp.brk
      0.68 ±  9%      +0.2        0.90 ±  3%      +0.3        0.98 ±  6%  perf-profile.children.cycles-pp.mas_wr_slot_store
      0.34 ±  5%      +0.3        0.68 ±  6%      +0.0        0.34 ±  6%  perf-profile.children.cycles-pp.mas_prev_setup
      0.00            +0.4        0.43 ±  2%      +0.0        0.00        perf-profile.children.cycles-pp.mas_next_setup
      6.91 ±  2%      +0.5        7.37            +0.9        7.76        perf-profile.children.cycles-pp.mas_store_prealloc
      0.92 ±  3%      +0.8        1.76 ±  3%      -0.0        0.90 ±  4%  perf-profile.children.cycles-pp.mas_prev
     83.20            +0.9       84.05            +0.7       83.85        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     82.36            +0.9       83.23            +0.6       83.00        perf-profile.children.cycles-pp.do_syscall_64
     78.68            +0.9       79.62            +0.6       79.31        perf-profile.children.cycles-pp.__do_sys_brk
      0.00            +0.9        0.94 ±  3%      +0.0        0.00        perf-profile.children.cycles-pp.mas_next_range
      1.35 ±  5%      +1.2        2.59 ±  3%      +0.1        1.49 ±  3%  perf-profile.children.cycles-pp.mas_prev_slot
      0.84 ±  3%      +1.4        2.26 ±  3%      +0.1        0.93 ±  6%  perf-profile.children.cycles-pp.mas_next_slot
     54.26            +2.3       56.54            +0.4       54.70        perf-profile.children.cycles-pp.do_brk_flags
      0.00           +27.5       27.55           +28.3       28.34        perf-profile.children.cycles-pp.vma_expand
      0.00           +31.9       31.86           +30.8       30.80        perf-profile.children.cycles-pp.vma_merge_new_range
      6.50            -3.3        3.19 ±  2%      -2.8        3.65 ±  2%  perf-profile.self.cycles-pp.do_brk_flags
      3.35            -1.1        2.25            -1.1        2.28 ±  3%  perf-profile.self.cycles-pp.mas_wr_store_type
      5.31 ±  2%      -0.4        4.88            -0.2        5.13 ±  3%  perf-profile.self.cycles-pp.__do_sys_brk
      4.22 ±  3%      -0.4        3.80 ±  2%      -0.2        4.00 ±  2%  perf-profile.self.cycles-pp.mas_walk
      8.47            -0.4        8.07 ±  2%      -0.2        8.28 ±  2%  perf-profile.self.cycles-pp.perf_event_mmap_output
      0.71 ±  8%      -0.3        0.38 ±  8%      -0.3        0.39 ±  5%  perf-profile.self.cycles-pp.may_expand_vm
      5.32            -0.3        5.00            -0.2        5.10        perf-profile.self.cycles-pp.clear_bhb_loop
      1.12 ±  5%      -0.3        0.82 ±  4%      -0.1        1.04 ±  4%  perf-profile.self.cycles-pp.can_vma_merge_after
      2.62            -0.2        2.38 ±  5%      -0.2        2.40 ±  3%  perf-profile.self.cycles-pp.down_write
      0.44 ± 28%      -0.2        0.20 ± 13%      -0.1        0.33 ± 12%  perf-profile.self.cycles-pp.cap_vm_enough_memory
      2.50 ±  3%      -0.2        2.31 ±  2%      -0.1        2.42        perf-profile.self.cycles-pp.mas_preallocate
      0.61 ±  9%      -0.2        0.42 ±  7%      -0.2        0.45 ±  5%  perf-profile.self.cycles-pp.__vm_enough_memory
      1.26 ±  2%      -0.2        1.09 ±  9%      -0.1        1.17 ±  6%  perf-profile.self.cycles-pp.sized_strscpy
      0.57 ±  5%      -0.1        0.44 ±  4%      -0.0        0.52 ±  5%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      2.25 ±  3%      -0.1        2.13 ±  3%      -0.0        2.23 ±  3%  perf-profile.self.cycles-pp.perf_event_mmap_event
      2.70            -0.1        2.60 ±  2%      +0.0        2.74 ±  2%  perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown_vmflags
      0.50 ±  8%      -0.1        0.40 ± 12%      -0.1        0.41 ±  6%  perf-profile.self.cycles-pp.strlen
      0.57 ±  4%      -0.1        0.52 ±  4%      -0.0        0.56 ±  9%  perf-profile.self.cycles-pp.strnlen
      0.46 ±  7%      -0.0        0.45 ± 12%      +0.1        0.51 ±  5%  perf-profile.self.cycles-pp.check_brk_limits
      1.85 ±  2%      -0.0        1.84 ±  3%      +0.1        1.98 ±  3%  perf-profile.self.cycles-pp.init_multi_vma_prep
      0.60 ±  5%      +0.0        0.63            +0.1        0.65 ±  2%  perf-profile.self.cycles-pp.vma_adjust_trans_huge
      0.01 ±223%      +0.1        0.06 ± 15%      +0.0        0.05 ± 46%  perf-profile.self.cycles-pp.anon_vma_interval_tree_remove
      0.64 ±  7%      +0.1        0.72 ±  6%      +0.2        0.81 ±  5%  perf-profile.self.cycles-pp.mas_find
      0.51 ±  4%      +0.1        0.60 ±  5%      +0.1        0.60 ±  6%  perf-profile.self.cycles-pp.__anon_vma_interval_tree_remove
      2.87 ±  2%      +0.2        3.10 ±  2%      +0.4        3.24 ±  2%  perf-profile.self.cycles-pp.mas_store_prealloc
      0.61 ±  8%      +0.2        0.84 ±  4%      +0.3        0.91 ±  5%  perf-profile.self.cycles-pp.mas_wr_slot_store
      0.26 ±  7%      +0.3        0.56 ±  5%      +0.0        0.28 ±  4%  perf-profile.self.cycles-pp.mas_prev_setup
      0.00            +0.3        0.33 ±  3%      +0.0        0.00        perf-profile.self.cycles-pp.mas_next_setup
      0.53 ±  6%      +0.4        0.98 ±  3%      -0.0        0.51 ±  8%  perf-profile.self.cycles-pp.mas_prev
      0.00            +0.6        0.56 ±  5%      +0.0        0.00        perf-profile.self.cycles-pp.mas_next_range
      1.29 ±  4%      +1.2        2.46 ±  3%      +0.1        1.42 ±  4%  perf-profile.self.cycles-pp.mas_prev_slot
      0.72 ±  4%      +1.4        2.07 ±  3%      +0.1        0.79 ±  8%  perf-profile.self.cycles-pp.mas_next_slot
      0.00            +1.4        1.40 ±  4%      +1.3        1.29 ±  3%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +3.6        3.55 ±  6%      +3.7        3.66 ±  2%  perf-profile.self.cycles-pp.vma_expand

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s

commit:
  fc21959f74bc11 ("mm: abstract vma_expand() to use vma_merge_struct")
  cacded5e42b960 ("mm: avoid using vma_merge() for new VMAs")
  9cecc5dc893886 ("mm: add expand-only VMA merge mode and optimise do_brk_flags()")

fc21959f74bc1138 cacded5e42b9609b07b22d80c10 9cecc5dc89388676d1d0d47461c
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     78524 ±113%     +27.6%     100212 ± 86%     +64.3%     128984 ± 72%  numa-meminfo.node0.AnonHugePages
     38.39 ±113%     +27.5%      48.94 ± 86%     +64.1%      63.00 ± 72%  numa-vmstat.node0.nr_anon_transparent_hugepages
      0.54 ± 14%      -6.6%       0.50 ± 23%     -38.6%       0.33 ± 32%  sched_debug.cpu.nr_running.max
      5949            -1.7%       5850 ±  2%      -4.0%       5713 ±  3%  vmstat.system.in
    201.54            +2.9%     207.44            +2.0%     205.54        time.system_time
     97.58            -6.0%      91.75            -4.0%      93.66        time.user_time
   1322908            -5.0%    1256536            -1.6%    1301387        aim9.brk_test.ops_per_sec
    201.54            +2.9%     207.44            +2.0%     205.54        aim9.time.system_time
     97.58            -6.0%      91.75            -4.0%      93.66        aim9.time.user_time
     61335            -2.5%      59800            -3.4%      59279        proc-vmstat.nr_active_anon
     62342            -2.4%      60875            -3.3%      60255        proc-vmstat.nr_shmem
     61335            -2.5%      59800            -3.4%      59279        proc-vmstat.nr_zone_active_anon
      0.04 ± 82%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.10 ± 60%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.04 ± 82%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
      0.10 ± 60%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
  8.33e+08            +3.9%  8.654e+08            -1.1%   8.24e+08        perf-stat.i.branch-instructions
      1.15            -0.1        1.09            -0.0        1.14 ±  2%  perf-stat.i.branch-miss-rate%
  12964626            -1.9%   12711922            -4.5%   12386373        perf-stat.i.branch-misses
      1.11            -7.4%       1.03            -2.0%       1.09        perf-stat.i.cpi
 4.277e+09            -1.1%  4.229e+09            -2.0%  4.191e+09        perf-stat.i.cpu-cycles
 3.943e+09            +6.0%   4.18e+09            +0.2%  3.951e+09        perf-stat.i.instructions
      0.91            +7.9%       0.98            +2.5%       0.93        perf-stat.i.ipc
      0.29 ±  2%      -9.1%       0.27 ±  4%      -5.7%       0.28 ±  5%  perf-stat.overall.MPKI
      1.56            -0.1        1.47            -0.1        1.50        perf-stat.overall.branch-miss-rate%
      1.08            -6.8%       1.01            -2.2%       1.06        perf-stat.overall.cpi
      0.92            +7.2%       0.99            +2.3%       0.94        perf-stat.overall.ipc
 8.303e+08            +3.9%  8.627e+08            -1.1%  8.214e+08        perf-stat.ps.branch-instructions
  12931205            -2.0%   12678170            -4.5%   12352666        perf-stat.ps.branch-misses
 4.263e+09            -1.1%  4.215e+09            -2.0%  4.177e+09        perf-stat.ps.cpu-cycles
  3.93e+09            +6.0%  4.167e+09            +0.2%  3.938e+09        perf-stat.ps.instructions
 1.184e+12            +6.1%  1.256e+12            +0.8%  1.194e+12        perf-stat.total.instructions
      7.16 ±  2%      -0.4        6.76 ±  4%      -0.6        6.55 ±  3%  perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.brk
      5.72 ±  2%      -0.4        5.35 ±  3%      +0.0        5.73        perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
      6.13 ±  2%      -0.3        5.84 ±  3%      +0.1        6.25        perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.83 ± 11%      -0.1        0.71 ±  5%      -0.1        0.74 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.51 ±  5%      -0.0        2.48 ± 11%      +0.2        2.71 ±  4%  perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
      1.32 ±  5%      -0.0        1.31 ± 14%      +0.1        1.45 ±  4%  perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry
      0.00            +0.6        0.58 ±  5%      +0.6        0.58 ±  5%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
     16.73 ±  2%      +0.6       17.34            +0.3       16.99        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.7        0.66 ±  6%      +0.7        0.67 ±  8%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
     24.51            +0.7       25.17 ±  2%      +0.6       25.14        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.brk
     24.21            +0.7       24.90            +0.6       24.84        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
     23.33            +0.7       24.05 ±  2%      +0.6       23.89        perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
      0.00            +0.8        0.82 ±  4%      +0.8        0.83 ±  3%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +0.9        0.87 ±  5%      +0.9        0.88 ±  2%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
      0.00            +1.1        1.07 ±  9%      +1.0        1.04 ± 13%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +1.1        1.10 ±  6%      +1.2        1.16 ±  4%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +2.3        2.26 ±  5%      +2.2        2.21 ±  2%  perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
      0.00            +7.6        7.56 ±  3%      +7.5        7.52        perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
      0.00            +8.6        8.62 ±  4%      +8.0        7.97        perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.74 ±  2%      -0.4        7.30 ±  4%      -0.6        7.14 ±  3%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      5.81 ±  2%      -0.4        5.43 ±  3%      -0.0        5.80        perf-profile.children.cycles-pp.perf_event_mmap_event
      6.18 ±  2%      -0.3        5.88 ±  3%      +0.1        6.29        perf-profile.children.cycles-pp.perf_event_mmap
      3.93            -0.2        3.73 ±  3%      +0.0        3.96        perf-profile.children.cycles-pp.perf_iterate_sb
      0.22 ± 29%      -0.1        0.08 ± 17%      -0.0        0.19 ± 69%  perf-profile.children.cycles-pp.may_expand_vm
      0.96 ±  3%      -0.1        0.83 ±  4%      -0.1        0.84 ±  2%  perf-profile.children.cycles-pp.vma_complete
      0.61 ± 14%      -0.1        0.52 ±  7%      -0.0        0.57 ±  5%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.15 ±  7%      -0.1        0.08 ± 20%      -0.1        0.09 ± 11%  perf-profile.children.cycles-pp.brk_test
      2.58 ±  5%      -0.0        2.54 ± 11%      +0.2        2.76 ±  4%  perf-profile.children.cycles-pp.menu_select
      0.27 ±  9%      -0.0        0.26 ±  8%      +0.1        0.35 ±  4%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.06 ± 49%      -0.0        0.06 ± 13%      +0.1        0.11 ± 14%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.28 ± 19%      -0.0        0.27 ± 16%      +0.1        0.36 ± 11%  perf-profile.children.cycles-pp.__softirqentry_text_end
      1.36 ±  5%      -0.0        1.35 ± 13%      +0.1        1.49 ±  3%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
      0.08 ± 11%      +0.0        0.12 ± 14%      -0.0        0.06 ± 48%  perf-profile.children.cycles-pp.mas_prev_setup
      0.17 ± 12%      +0.1        0.27 ± 10%      -0.0        0.16 ±  5%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.00            +0.2        0.15 ± 11%      +0.0        0.00        perf-profile.children.cycles-pp.mas_next_range
      0.19 ±  8%      +0.2        0.38 ± 10%      +0.0        0.22 ± 22%  perf-profile.children.cycles-pp.mas_next_slot
      0.34 ± 17%      +0.3        0.64 ±  6%      -0.0        0.33 ± 18%  perf-profile.children.cycles-pp.mas_prev_slot
      1.70 ± 12%      +0.5        2.18 ± 15%      +0.4        2.08 ± 10%  perf-profile.children.cycles-pp.mas_find
     25.16            +0.7       25.83 ±  2%      +0.6       25.79        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     24.86            +0.7       25.56 ±  2%      +0.6       25.49        perf-profile.children.cycles-pp.do_syscall_64
     23.40            +0.7       24.12 ±  2%      +0.6       23.95        perf-profile.children.cycles-pp.__do_sys_brk
      0.00            +7.6        7.59 ±  3%      +7.6        7.56        perf-profile.children.cycles-pp.vma_expand
      0.00            +8.7        8.66 ±  4%      +8.0        7.98        perf-profile.children.cycles-pp.vma_merge_new_range
      1.61 ± 10%      -0.9        0.69 ±  8%      -0.8        0.78 ±  6%  perf-profile.self.cycles-pp.do_brk_flags
      7.64 ±  2%      -0.4        7.20 ±  4%      -0.6        7.04 ±  3%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.22 ± 30%      -0.1        0.08 ± 17%      -0.0        0.19 ± 70%  perf-profile.self.cycles-pp.may_expand_vm
      0.57 ± 15%      -0.1        0.46 ±  6%      -0.0        0.52 ±  5%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.15 ±  7%      -0.1        0.08 ± 20%      -0.1        0.09 ± 11%  perf-profile.self.cycles-pp.brk_test
      0.20 ±  5%      -0.0        0.18 ±  4%      -0.0        0.19 ± 10%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
      0.19 ±  8%      -0.0        0.18 ±  8%      +0.0        0.23 ±  5%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.06 ± 48%      -0.0        0.05 ±  8%      +0.0        0.11 ± 17%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      0.09 ± 11%      -0.0        0.09 ± 18%      +0.1        0.15 ± 25%  perf-profile.self.cycles-pp.security_vm_enough_memory_mm
      0.07 ± 18%      +0.0        0.10 ± 18%      -0.0        0.05 ± 47%  perf-profile.self.cycles-pp.mas_prev_setup
      0.00            +0.1        0.09 ± 12%      +0.0        0.00        perf-profile.self.cycles-pp.mas_next_range
      0.36 ±  8%      +0.1        0.45 ±  6%      +0.1        0.48 ±  6%  perf-profile.self.cycles-pp.perf_event_mmap
      0.15 ± 13%      +0.1        0.25 ± 14%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.mas_wr_store_entry
      0.17 ± 11%      +0.2        0.37 ± 11%      +0.0        0.20 ± 23%  perf-profile.self.cycles-pp.mas_next_slot
      0.34 ± 17%      +0.3        0.64 ±  6%      -0.0        0.33 ± 18%  perf-profile.self.cycles-pp.mas_prev_slot
      0.00            +0.3        0.33 ±  5%      +0.2        0.22 ± 12%  perf-profile.self.cycles-pp.vma_merge_new_range
      0.00            +0.8        0.81 ±  9%      +0.7        0.69 ± 12%  perf-profile.self.cycles-pp.vma_expand


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-10-17  2:58                 ` Oliver Sang
@ 2024-10-17  8:54                   ` Lorenzo Stoakes
  2024-10-18  0:34                     ` Oliver Sang
  0 siblings, 1 reply; 13+ messages in thread
From: Lorenzo Stoakes @ 2024-10-17  8:54 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin

On Thu, Oct 17, 2024 at 10:58:38AM +0800, Oliver Sang wrote:
> hi, Lorenzo,
>
> On Tue, Oct 15, 2024 at 08:56:28PM +0100, Lorenzo Stoakes wrote:
> > On Fri, Oct 11, 2024 at 08:26:37AM +0100, Lorenzo Stoakes wrote:
> > [snip]
> >
> > > Thanks for testing this suffices to rule this one out... I will try to get a
> > > functional and reliable performance environment locally so I can properly
> > > address this and then we can try something else.
> > >
> > > Thanks!
> > > Lorenzo
> > >
> >
> > OK Oliver, could you try the below patch? I have got aim9.brk up and
> > running locally and for me this seems to address the issue.
> >
> > This is against Andrew's tree [0] in the mm-unstable branch. It should
> > hopefully apply cleanly to -next also.
>
> I found the patch still be able to applied to cacded5e42 cleanly, so below data
> still based on this applyment.
>
> $ git log --oneline 9cecc5dc893886
> 9cecc5dc893886 mm: add expand-only VMA merge mode and optimise do_brk_flags()
> cacded5e42b960 mm: avoid using vma_merge() for new VMAs
> fc21959f74bc11 mm: abstract vma_expand() to use vma_merge_struct
> ...
>
> again, if some patches in mm-unstable or -next have some impacts, please let me
> know then I can re-apply the patch and do the tests again. thanks
>
>
> by this patch, we do see performance recovery but not fully.
>
> e.g. for
> model: Granite Rapids
> nr_node: 1
> nr_cpu: 240
> memory: 192G
>
> we got better score from the patch than cacded5e42b960, but still 2.0%
> regression than fc21959f74bc11 (the parent of cacded5e42b960)

Thanks for this. As far as I'm concerned this puts us into noise territory,
so we'll go with this as the solution!

A side-note, the brk2 test from the will-it-scale suite, written explicitly
to be more real-world representative, sees an actual performance
_improvement_ here (though small).

So overall I'm comfortable with this, we can revisit if anybody raises any
objection! The benefit in de-duplicating code is very significant.

Thanks for all your help, hugely appreciated!

Cheers, Lorenzo

[snip]


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
  2024-10-17  8:54                   ` Lorenzo Stoakes
@ 2024-10-18  0:34                     ` Oliver Sang
  0 siblings, 0 replies; 13+ messages in thread
From: Oliver Sang @ 2024-10-18  0:34 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Mark Brown,
	Liam R. Howlett, Vlastimil Babka, Bert Karwatzki, Jeff Xu,
	Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-mm, ying.huang, feng.tang, fengwei.yin,
	oliver.sang

hi, Lorenzo,

On Thu, Oct 17, 2024 at 09:54:13AM +0100, Lorenzo Stoakes wrote:
> On Thu, Oct 17, 2024 at 10:58:38AM +0800, Oliver Sang wrote:

[...]

> 
> Thanks for this. As far as I'm concerned this puts us into noise territory,
> so we'll go with this as the solution!
> 
> A side-note, the brk2 test from the will-it-scale suite, written explicitly
> to be more real-world representative, sees an actual performance
> _improvement_ here (though small).
> 
> So overall I'm comfortable with this, we can revisit if anybody raises any
> objection! The benefit in de-duplicating code is very significant.

thanks a lot for all these informations!

> 
> Thanks for all your help, hugely appreciated!

you are welcome :) just our great pleasure!

> 
> Cheers, Lorenzo
> 
> [snip]


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-10-18  0:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-30  2:21 [linus:master] [mm] cacded5e42: aim9.brk_test.ops_per_sec -5.0% regression kernel test robot
2024-09-30  8:21 ` Lorenzo Stoakes
2024-10-08  8:31   ` Oliver Sang
2024-10-08  8:44     ` Lorenzo Stoakes
2024-10-09  6:44       ` Oliver Sang
2024-10-09  9:52         ` Lorenzo Stoakes
2024-10-09 21:24         ` Lorenzo Stoakes
2024-10-11  2:46           ` Oliver Sang
2024-10-11  7:26             ` Lorenzo Stoakes
2024-10-15 19:56               ` Lorenzo Stoakes
2024-10-17  2:58                 ` Oliver Sang
2024-10-17  8:54                   ` Lorenzo Stoakes
2024-10-18  0:34                     ` Oliver Sang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox