Greetings, FYI, we noticed a -40.2% regression of will-it-scale.per_thread_ops between commit 524e00b36e8c and e15e06a83923 of mainline 524e00b36e8c5 mm: remove rb tree. 0c563f1480435 proc: remove VMA rbtree use from nommu d0cf3dd47f0d5 damon: convert __damon_va_three_regions to use the VMA iterator c9dbe82cb99db kernel/fork: use maple tree for dup_mmap() during forking 3499a13168da6 mm/mmap: use maple tree for unmapped_area{_topdown} 7fdbd37da5c6f mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree be8432e7166ef mm/mmap: use the maple tree in find_vma() instead of the rbtree. 2e3af1db17442 mmap: use the VMA iterator in count_vma_pages_range() f39af05949a42 mm: add VMA iterator d4af56c5c7c67 mm: start tracking VMAs with maple tree e15e06a839232 lib/test_maple_tree: add testing for maple tree in testcase: will-it-scale on test machine: 104 threads 2 sockets (Skylake) with 192G memory with following parameters: nr_task: 50% mode: thread test: mmap1 cpufreq_governor: performance test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale We couldn't find out the commit that introduced this regression because some of above commits failed to boot during bisection, but looks it is related with maple tree code. Please check following details: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/mmap1/will-it-scale commit: e15e06a839232 ("lib/test_maple_tree: add testing for maple tree") 524e00b36e8c5 ("mm: remove rb tree.") e15e06a8392321a1 524e00b36e8c547f5582eef3fb6 ---------------- --------------------------- %stddev %change %stddev \ | \ 238680 -40.2% 142816 will-it-scale.52.threads 4589 -40.2% 2746 will-it-scale.per_thread_ops 238680 -40.2% 142816 will-it-scale.workload 0.28 -0.1 0.20 ± 3% mpstat.cpu.all.usr% 7758 -1.6% 7636 proc-vmstat.nr_mapped 0.03 ± 14% +40.0% 0.05 ± 10% time.system_time 35.87 ± 41% -17.2 18.71 ± 92% turbostat.C1E% 14.11 ±105% +15.5 29.62 ± 52% turbostat.C6% 466662 ± 3% +20.3% 561351 ± 3% turbostat.POLL 42.33 +3.9% 44.00 turbostat.PkgTmp 838.08 ± 49% -50.7% 412.94 ± 19% sched_debug.cfs_rq:/.load_avg.max 466231 ± 14% -53.4% 217040 ± 82% sched_debug.cfs_rq:/.min_vruntime.min -335910 +146.5% -828023 sched_debug.cfs_rq:/.spread0.min 602391 ± 4% +6.5% 641749 ± 4% sched_debug.cpu.avg_idle.avg 26455 ± 7% +16.1% 30723 ± 6% sched_debug.cpu.nr_switches.max 230323 ± 6% +42.4% 327946 ± 3% numa-numastat.node0.local_node 257238 ± 2% +29.2% 332446 numa-numastat.node0.numa_hit 26826 ± 35% -83.1% 4532 ±138% numa-numastat.node0.other_node 344370 ± 3% -26.8% 251981 ± 2% numa-numastat.node1.local_node 351214 ± 2% -19.9% 281185 numa-numastat.node1.numa_hit 6779 ±139% +330.8% 29204 ± 21% numa-numastat.node1.other_node 111776 ± 8% +43.9% 160892 ± 17% numa-meminfo.node0.AnonHugePages 163879 ± 5% +34.9% 221083 ± 21% numa-meminfo.node0.AnonPages 182360 ± 2% +39.7% 254705 ± 15% numa-meminfo.node0.AnonPages.max 167687 ± 4% +33.0% 223029 ± 20% numa-meminfo.node0.Inactive 165329 ± 4% +34.9% 223029 ± 20% numa-meminfo.node0.Inactive(anon) 2357 ±131% -100.0% 0.00 numa-meminfo.node0.Inactive(file) 2087 ± 11% +22.1% 2548 ± 9% numa-meminfo.node0.PageTables 170594 ± 7% -27.5% 123611 ± 23% numa-meminfo.node1.AnonHugePages 238127 ± 3% -23.9% 181170 ± 25% numa-meminfo.node1.AnonPages 278201 ± 3% -26.8% 203778 ± 22% numa-meminfo.node1.AnonPages.max 244262 ± 2% -24.0% 185599 ± 25% numa-meminfo.node1.Inactive 244206 ± 2% -24.1% 185419 ± 25% numa-meminfo.node1.Inactive(anon) 20767 ± 64% -48.4% 10717 ±124% numa-meminfo.node1.Mapped 40936 ± 5% +34.9% 55213 ± 21% numa-vmstat.node0.nr_anon_pages 41317 ± 4% +34.8% 55700 ± 20% numa-vmstat.node0.nr_inactive_anon 41317 ± 4% +34.8% 55700 ± 20% numa-vmstat.node0.nr_zone_inactive_anon 257331 ± 2% +29.2% 332536 numa-vmstat.node0.numa_hit 230417 ± 5% +42.4% 328036 ± 3% numa-vmstat.node0.numa_local 26826 ± 35% -83.1% 4532 ±138% numa-vmstat.node0.numa_other 59518 ± 4% -24.0% 45237 ± 25% numa-vmstat.node1.nr_anon_pages 61041 ± 3% -24.2% 46287 ± 25% numa-vmstat.node1.nr_inactive_anon 5196 ± 64% -48.7% 2666 ±126% numa-vmstat.node1.nr_mapped 61041 ± 3% -24.2% 46287 ± 25% numa-vmstat.node1.nr_zone_inactive_anon 351314 ± 2% -20.0% 281191 numa-vmstat.node1.numa_hit 344470 ± 4% -26.8% 251987 ± 2% numa-vmstat.node1.numa_local 6779 ±139% +330.8% 29204 ± 21% numa-vmstat.node1.numa_other 3.12 ± 10% -25.7% 2.32 ± 2% perf-stat.i.MPKI 3.111e+09 +4.4% 3.247e+09 perf-stat.i.branch-instructions 0.43 -0.0 0.39 perf-stat.i.branch-miss-rate% 13577850 -5.5% 12837395 perf-stat.i.branch-misses 38.85 ± 3% +4.6 43.44 ± 3% perf-stat.i.cache-miss-rate% 47922345 ± 10% -21.9% 37423833 ± 2% perf-stat.i.cache-references 9.42 -5.1% 8.94 perf-stat.i.cpi 0.02 -0.0 0.01 perf-stat.i.dTLB-load-miss-rate% 632005 -28.8% 449814 perf-stat.i.dTLB-load-misses 4.127e+09 +3.8% 4.282e+09 perf-stat.i.dTLB-loads 0.00 ± 7% -0.0 0.00 ± 11% perf-stat.i.dTLB-store-miss-rate% 3.131e+08 +26.5% 3.962e+08 perf-stat.i.dTLB-stores 599587 ± 8% -20.0% 479492 ± 6% perf-stat.i.iTLB-load-misses 2324378 -12.7% 2028806 ± 7% perf-stat.i.iTLB-loads 1.54e+10 +5.4% 1.622e+10 perf-stat.i.instructions 25907 ± 7% +31.4% 34030 ± 6% perf-stat.i.instructions-per-iTLB-miss 0.11 +5.4% 0.11 perf-stat.i.ipc 570.88 ± 8% -22.1% 444.53 ± 2% perf-stat.i.metric.K/sec 72.60 +5.0% 76.20 perf-stat.i.metric.M/sec 90.37 +1.5 91.82 perf-stat.i.node-load-miss-rate% 7458505 ± 2% -27.2% 5431142 ± 3% perf-stat.i.node-load-misses 795163 -39.1% 484036 perf-stat.i.node-loads 3.11 ± 10% -25.9% 2.31 ± 2% perf-stat.overall.MPKI 0.44 -0.0 0.40 perf-stat.overall.branch-miss-rate% 38.72 ± 3% +4.5 43.24 ± 3% perf-stat.overall.cache-miss-rate% 9.40 -5.1% 8.93 perf-stat.overall.cpi 0.02 -0.0 0.01 perf-stat.overall.dTLB-load-miss-rate% 0.00 ± 6% -0.0 0.00 ± 11% perf-stat.overall.dTLB-store-miss-rate% 25842 ± 7% +31.5% 33976 ± 6% perf-stat.overall.instructions-per-iTLB-miss 0.11 +5.4% 0.11 perf-stat.overall.ipc 90.36 +1.4 91.81 perf-stat.overall.node-load-miss-rate% 19478525 +76.1% 34307144 perf-stat.overall.path-length 3.101e+09 +4.4% 3.236e+09 perf-stat.ps.branch-instructions 13536210 -5.5% 12794692 perf-stat.ps.branch-misses 47758992 ± 10% -21.9% 37302259 ± 2% perf-stat.ps.cache-references 629957 -28.8% 448327 perf-stat.ps.dTLB-load-misses 4.113e+09 +3.8% 4.268e+09 perf-stat.ps.dTLB-loads 3.121e+08 +26.5% 3.949e+08 perf-stat.ps.dTLB-stores 597514 ± 8% -20.0% 477834 ± 6% perf-stat.ps.iTLB-load-misses 2316405 -12.7% 2021878 ± 7% perf-stat.ps.iTLB-loads 1.535e+10 +5.4% 1.617e+10 perf-stat.ps.instructions 7434434 ± 2% -27.2% 5412315 ± 3% perf-stat.ps.node-load-misses 792675 -39.1% 482405 perf-stat.ps.node-loads 4.648e+12 +5.4% 4.9e+12 perf-stat.total.instructions 24.16 ± 66% -16.4 7.77 ±122% perf-profile.calltrace.cycles-pp.mwait_idle_with_hints.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 24.16 ± 66% -16.4 7.77 ±122% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 33.88 ± 20% -9.6 24.32 ± 7% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 33.89 ± 20% -9.3 24.61 ± 6% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 33.05 ± 20% -8.9 24.13 ± 6% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 33.04 ± 20% -8.9 24.13 ± 6% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 33.05 ± 20% -8.9 24.14 ± 6% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 33.05 ± 20% -8.9 24.14 ± 6% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 33.05 ± 20% -8.9 24.14 ± 6% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify 0.38 ± 70% +0.2 0.61 ± 2% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +0.6 0.56 ± 2% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff 0.00 +0.6 0.57 ± 2% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap 0.00 +0.6 0.60 ± 3% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap 31.73 ± 10% +4.5 36.24 ± 2% perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap 31.50 ± 10% +4.6 36.05 perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff 33.19 ± 10% +4.6 37.77 ± 2% perf-profile.calltrace.cycles-pp.__munmap 32.39 ± 10% +4.6 36.97 ± 2% perf-profile.calltrace.cycles-pp.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 32.34 ± 10% +4.6 36.94 ± 2% perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64 33.08 ± 10% +4.6 37.69 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 32.15 ± 10% +4.6 36.76 perf-profile.calltrace.cycles-pp.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap 33.05 ± 10% +4.6 37.66 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 32.31 ± 10% +4.6 36.92 ± 2% perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.__vm_munmap.__x64_sys_munmap 32.10 ± 10% +4.6 36.73 perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 32.98 ± 10% +4.6 37.61 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 32.07 ± 10% +4.6 36.70 perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff.do_syscall_64 32.98 ± 10% +4.6 37.61 ± 2% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 32.86 ± 10% +4.7 37.56 perf-profile.calltrace.cycles-pp.__mmap 32.74 ± 10% +4.7 37.48 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap 32.71 ± 10% +4.8 37.46 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap 32.62 ± 10% +4.8 37.39 perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap 24.31 ± 66% -16.4 7.88 ±122% perf-profile.children.cycles-pp.intel_idle 33.89 ± 20% -9.3 24.61 ± 6% perf-profile.children.cycles-pp.secondary_startup_64_no_verify 33.89 ± 20% -9.3 24.61 ± 6% perf-profile.children.cycles-pp.cpu_startup_entry 33.89 ± 20% -9.3 24.61 ± 6% perf-profile.children.cycles-pp.do_idle 33.85 ± 20% -9.3 24.57 ± 6% perf-profile.children.cycles-pp.mwait_idle_with_hints 33.88 ± 20% -9.3 24.60 ± 6% perf-profile.children.cycles-pp.cpuidle_enter 33.88 ± 20% -9.3 24.60 ± 6% perf-profile.children.cycles-pp.cpuidle_enter_state 33.88 ± 20% -9.3 24.61 ± 6% perf-profile.children.cycles-pp.cpuidle_idle_call 33.05 ± 20% -8.9 24.14 ± 6% perf-profile.children.cycles-pp.start_secondary 0.84 ± 25% -0.4 0.48 ± 16% perf-profile.children.cycles-pp.start_kernel 0.84 ± 25% -0.4 0.48 ± 16% perf-profile.children.cycles-pp.arch_call_rest_init 0.84 ± 25% -0.4 0.48 ± 16% perf-profile.children.cycles-pp.rest_init 0.16 ± 12% -0.1 0.08 perf-profile.children.cycles-pp.unmap_region 0.14 ± 11% -0.0 0.10 ± 8% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.13 ± 12% -0.0 0.10 perf-profile.children.cycles-pp.syscall_return_via_sysret 0.00 +0.1 0.06 ± 13% perf-profile.children.cycles-pp.mas_wr_node_store 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.memset_erms 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.mas_wr_modify 0.00 +0.1 0.07 ± 6% perf-profile.children.cycles-pp.kmem_cache_free_bulk 0.53 ± 10% +0.1 0.61 ± 2% perf-profile.children.cycles-pp.__do_munmap 0.00 +0.1 0.08 ± 5% perf-profile.children.cycles-pp.mas_destroy 0.00 +0.1 0.09 ± 5% perf-profile.children.cycles-pp.mt_find 0.00 +0.1 0.10 perf-profile.children.cycles-pp.mas_spanning_rebalance 0.00 +0.1 0.10 ± 4% perf-profile.children.cycles-pp.mas_wr_spanning_store 0.00 +0.1 0.12 ± 4% perf-profile.children.cycles-pp.mas_rev_awalk 0.00 +0.1 0.13 perf-profile.children.cycles-pp.mas_empty_area_rev 0.00 +0.1 0.14 ± 5% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk 0.00 +0.2 0.16 ± 5% perf-profile.children.cycles-pp.mas_alloc_nodes 0.00 +0.2 0.17 ± 4% perf-profile.children.cycles-pp.mas_preallocate 0.42 ± 15% +0.2 0.60 ± 3% perf-profile.children.cycles-pp.do_mmap 0.06 ± 7% +0.2 0.27 perf-profile.children.cycles-pp.vma_link 0.20 ± 14% +0.2 0.41 ± 4% perf-profile.children.cycles-pp.mmap_region 0.00 +0.3 0.35 ± 4% perf-profile.children.cycles-pp.mas_store_prealloc 0.78 ± 8% +0.4 1.13 ± 2% perf-profile.children.cycles-pp.rwsem_spin_on_owner 33.20 ± 10% +4.6 37.77 ± 2% perf-profile.children.cycles-pp.__munmap 32.98 ± 10% +4.6 37.61 ± 2% perf-profile.children.cycles-pp.__x64_sys_munmap 32.98 ± 10% +4.6 37.61 ± 2% perf-profile.children.cycles-pp.__vm_munmap 32.86 ± 10% +4.7 37.56 perf-profile.children.cycles-pp.__mmap 32.62 ± 10% +4.8 37.40 perf-profile.children.cycles-pp.vm_mmap_pgoff 63.26 ± 10% +9.1 72.32 ± 2% perf-profile.children.cycles-pp.osq_lock 64.54 ± 10% +9.2 73.72 ± 2% perf-profile.children.cycles-pp.down_write_killable 64.44 ± 10% +9.2 73.66 ± 2% perf-profile.children.cycles-pp.rwsem_down_write_slowpath 64.38 ± 10% +9.2 73.62 ± 2% perf-profile.children.cycles-pp.rwsem_optimistic_spin 65.87 ± 10% +9.3 75.21 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 65.79 ± 10% +9.4 75.15 ± 2% perf-profile.children.cycles-pp.do_syscall_64 33.85 ± 20% -9.3 24.57 ± 6% perf-profile.self.cycles-pp.mwait_idle_with_hints 0.29 ± 19% -0.1 0.14 ± 3% perf-profile.self.cycles-pp.rwsem_optimistic_spin 0.13 ± 10% -0.0 0.09 ± 9% perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.09 ± 9% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.down_write_killable 0.13 ± 12% -0.0 0.10 perf-profile.self.cycles-pp.syscall_return_via_sysret 0.00 +0.1 0.06 perf-profile.self.cycles-pp.memset_erms 0.00 +0.1 0.06 ± 13% perf-profile.self.cycles-pp.kmem_cache_free_bulk 0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.kmem_cache_alloc_bulk 0.00 +0.1 0.08 perf-profile.self.cycles-pp.mt_find 0.00 +0.1 0.11 ± 4% perf-profile.self.cycles-pp.mas_rev_awalk 0.76 ± 8% +0.4 1.12 ± 2% perf-profile.self.cycles-pp.rwsem_spin_on_owner 62.94 ± 10% +9.0 71.91 ± 2% perf-profile.self.cycles-pp.osq_lock If you fix the issue, kindly add following tag | Reported-by: kernel test robot | Link: https://lore.kernel.org/oe-lkp/202212191714.524e00b3-yujie.liu@intel.com Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://01.org/lkp