Hello, FYI, we noticed a -3.4% regression of vm-scalability.throughput due to commit: commit: 7e12beb8ca2ac98b2ec42e0ea4b76cdc93b58654 ("migrate_pages: batch flushing TLB") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: vm-scalability on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory with following parameters: runtime: 300s size: 512G test: anon-cow-rand-mt cpufreq_governor: performance test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/ If you fix the issue, kindly add following tag | Reported-by: kernel test robot | Link: https://lore.kernel.org/oe-lkp/202303192325.ecbaf968-yujie.liu@intel.com Details are as below: ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/512G/lkp-csl-2sp3/anon-cow-rand-mt/vm-scalability commit: ebe75e4751 ("migrate_pages: share more code between _unmap and _move") 7e12beb8ca ("migrate_pages: batch flushing TLB") ebe75e4751063dce 7e12beb8ca2ac98b2ec42e0ea4b ---------------- --------------------------- %stddev %change %stddev \ | \ 57522 -3.3% 55603 vm-scalability.median 5513665 -3.4% 5328506 vm-scalability.throughput 203067 ± 3% -8.6% 185675 ± 2% vm-scalability.time.involuntary_context_switches 68459282 ± 6% +42.1% 97269013 ± 2% vm-scalability.time.minor_page_faults 9007 -1.8% 8844 vm-scalability.time.percent_of_cpu_this_job_got 1170 ± 3% +58.2% 1852 ± 3% vm-scalability.time.system_time 26342 -4.6% 25132 vm-scalability.time.user_time 11275 ± 5% +364.1% 52332 ± 7% vm-scalability.time.voluntary_context_switches 1.658e+09 -3.4% 1.601e+09 vm-scalability.workload 51013 ± 40% -67.5% 16584 ±125% numa-numastat.node1.other_node 20056 ± 2% +8.9% 21844 ± 3% numa-vmstat.node1.nr_slab_unreclaimable 51013 ± 40% -67.5% 16584 ±125% numa-vmstat.node1.numa_other 2043 ± 3% +10.5% 2257 vmstat.system.cs 540820 ± 2% +186.2% 1547747 ± 8% vmstat.system.in 0.00 ±157% +0.0 0.00 ± 6% mpstat.cpu.all.iowait% 2.59 +1.7 4.27 ± 4% mpstat.cpu.all.irq% 4.03 ± 3% +2.3 6.36 ± 3% mpstat.cpu.all.sys% 5870 ± 64% -48.0% 3051 ± 7% numa-meminfo.node0.Active 195543 ± 3% -7.2% 181529 ± 4% numa-meminfo.node0.Slab 80226 ± 2% +8.9% 87378 ± 3% numa-meminfo.node1.SUnreclaim 40406018 ± 7% +66.5% 67272793 ± 2% proc-vmstat.numa_hint_faults 20211075 ± 7% +66.8% 33722069 ± 2% proc-vmstat.numa_hint_faults_local 40555366 ± 7% +66.3% 67430626 ± 2% proc-vmstat.numa_pte_updates 69364615 ± 6% +41.5% 98184580 ± 2% proc-vmstat.pgfault 210031 ± 8% +126.2% 475135 ± 99% turbostat.C1 1.382e+09 ± 2% +140.0% 3.317e+09 ± 5% turbostat.IRQ 8771 ± 6% +466.6% 49695 ± 7% turbostat.POLL 87.01 -2.6% 84.76 turbostat.RAMWatt 145904 ± 2% -22.2% 113504 ± 11% sched_debug.cfs_rq:/.min_vruntime.stddev 841.83 ± 2% -13.8% 725.47 ± 6% sched_debug.cfs_rq:/.runnable_avg.min 549777 ± 9% -49.2% 279239 ± 34% sched_debug.cfs_rq:/.spread0.avg 659447 ± 8% -36.7% 417735 ± 22% sched_debug.cfs_rq:/.spread0.max 145800 ± 2% -22.1% 113612 ± 11% sched_debug.cfs_rq:/.spread0.stddev 785.23 ± 6% -14.6% 670.61 ± 10% sched_debug.cfs_rq:/.util_avg.min 67.96 ± 5% +22.7% 83.40 ± 10% sched_debug.cfs_rq:/.util_avg.stddev 246549 ± 7% -15.1% 209367 ± 7% sched_debug.cpu.avg_idle.avg 1592 +10.8% 1763 ± 3% sched_debug.cpu.clock_task.stddev 32106 ± 10% -17.6% 26468 ± 8% sched_debug.cpu.nr_switches.max 1910 ± 6% +31.0% 2503 sched_debug.cpu.nr_switches.min 5664 ± 10% -16.6% 4723 ± 7% sched_debug.cpu.nr_switches.stddev 0.18 ± 4% +0.0 0.23 ± 3% perf-stat.i.branch-miss-rate% 8939520 ± 4% +61.3% 14417578 ± 3% perf-stat.i.branch-misses 66.18 -1.7 64.47 perf-stat.i.cache-miss-rate% 1927 ± 3% +11.0% 2139 perf-stat.i.context-switches 158.85 +10.7% 175.92 ± 3% perf-stat.i.cpu-migrations 0.04 ± 6% +0.0 0.05 ± 11% perf-stat.i.dTLB-load-miss-rate% 4916471 ± 7% +39.7% 6870029 ± 9% perf-stat.i.dTLB-load-misses 9.10 -0.4 8.71 perf-stat.i.dTLB-store-miss-rate% 5.311e+08 -4.1% 5.095e+08 perf-stat.i.dTLB-store-misses 2438160 ± 2% +161.5% 6374895 ± 7% perf-stat.i.iTLB-load-misses 115315 ± 2% +62.0% 186840 ± 7% perf-stat.i.iTLB-loads 43163 ± 5% -25.7% 32083 ± 26% perf-stat.i.instructions-per-iTLB-miss 0.34 ± 37% -63.2% 0.13 ± 27% perf-stat.i.major-faults 226565 ± 6% +41.4% 320417 ± 2% perf-stat.i.minor-faults 50.56 +1.7 52.22 perf-stat.i.node-load-miss-rate% 1.165e+08 +3.7% 1.208e+08 perf-stat.i.node-load-misses 1.13e+08 -3.6% 1.089e+08 perf-stat.i.node-loads 2.678e+08 -3.6% 2.582e+08 perf-stat.i.node-store-misses 2.655e+08 -4.2% 2.543e+08 perf-stat.i.node-stores 226565 ± 6% +41.4% 320418 ± 2% perf-stat.i.page-faults 0.08 ± 4% +0.0 0.12 ± 4% perf-stat.overall.branch-miss-rate% 67.13 -1.8 65.28 perf-stat.overall.cache-miss-rate% 367.93 +2.6% 377.43 perf-stat.overall.cycles-between-cache-misses 0.04 ± 7% +0.0 0.05 ± 10% perf-stat.overall.dTLB-load-miss-rate% 9.38 -0.4 8.97 perf-stat.overall.dTLB-store-miss-rate% 95.49 +1.7 97.16 perf-stat.overall.iTLB-load-miss-rate% 20560 ± 3% -61.9% 7826 ± 7% perf-stat.overall.instructions-per-iTLB-miss 50.76 +1.8 52.60 perf-stat.overall.node-load-miss-rate% 9205 +3.1% 9485 perf-stat.overall.path-length 8892515 ± 4% +62.1% 14412101 ± 3% perf-stat.ps.branch-misses 1927 ± 3% +11.1% 2142 perf-stat.ps.context-switches 158.37 +11.2% 176.03 ± 3% perf-stat.ps.cpu-migrations 4902779 ± 7% +40.2% 6871859 ± 9% perf-stat.ps.dTLB-load-misses 5.295e+08 -4.1% 5.077e+08 perf-stat.ps.dTLB-store-misses 2428324 ± 2% +163.0% 6385873 ± 7% perf-stat.ps.iTLB-load-misses 114618 ± 2% +62.5% 186290 ± 7% perf-stat.ps.iTLB-loads 0.34 ± 37% -63.2% 0.13 ± 27% perf-stat.ps.major-faults 226036 ± 6% +41.8% 320615 ± 2% perf-stat.ps.minor-faults 1.162e+08 +3.7% 1.205e+08 perf-stat.ps.node-load-misses 1.127e+08 -3.6% 1.086e+08 perf-stat.ps.node-loads 2.67e+08 -3.6% 2.573e+08 perf-stat.ps.node-store-misses 2.647e+08 -4.3% 2.534e+08 perf-stat.ps.node-stores 226036 ± 6% +41.8% 320615 ± 2% perf-stat.ps.page-faults 0.00 +0.6 0.60 ± 8% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush 0.00 +0.6 0.64 ± 7% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function 0.00 +0.9 0.90 ± 10% perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function 0.00 +1.9 1.86 ± 9% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access 0.00 +1.9 1.87 ± 8% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access 0.00 +1.9 1.94 ± 8% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_access 0.00 +2.6 2.59 ± 9% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_access 0.00 +2.8 2.80 ± 8% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush 3.43 ± 13% +6.5 9.88 ± 7% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 3.46 ± 13% +6.5 9.94 ± 7% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 3.15 ± 13% +6.5 9.69 ± 7% perf-profile.calltrace.cycles-pp.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 4.06 ± 11% +6.7 10.71 ± 7% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access 3.82 ± 13% +6.7 10.48 ± 7% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 3.83 ± 13% +6.7 10.49 ± 7% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access 2.57 ± 13% +6.9 9.46 ± 7% perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 2.36 ± 13% +6.9 9.28 ± 7% perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault 2.36 ± 13% +6.9 9.29 ± 7% perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault 0.00 +7.5 7.50 ± 7% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch 0.00 +7.6 7.56 ± 7% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages 0.00 +7.6 7.57 ± 8% perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page 0.00 +7.6 7.57 ± 7% perf-profile.calltrace.cycles-pp.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page 1.55 ± 13% -1.1 0.42 ± 9% perf-profile.children.cycles-pp.rmap_walk_anon 1.30 ± 15% -1.0 0.30 ± 9% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 1.11 ± 12% -0.9 0.18 ± 6% perf-profile.children.cycles-pp.try_to_migrate_one 1.17 ± 12% -0.9 0.26 ± 8% perf-profile.children.cycles-pp.try_to_migrate 1.30 ± 12% -0.9 0.42 ± 8% perf-profile.children.cycles-pp.migrate_folio_unmap 1.08 ± 13% -0.9 0.21 ± 11% perf-profile.children.cycles-pp._raw_spin_lock 0.46 ± 14% -0.3 0.14 ± 13% perf-profile.children.cycles-pp.page_vma_mapped_walk 0.35 ± 13% -0.2 0.11 ± 11% perf-profile.children.cycles-pp.remove_migration_pte 0.14 ± 21% -0.1 0.07 ± 11% perf-profile.children.cycles-pp.folio_lruvec_lock_irq 0.14 ± 21% -0.1 0.08 ± 10% perf-profile.children.cycles-pp._raw_spin_lock_irq 0.33 ± 3% -0.0 0.30 perf-profile.children.cycles-pp.lrand48_r@plt 0.06 ± 14% +0.0 0.08 ± 9% perf-profile.children.cycles-pp.mt_find 0.06 ± 14% +0.0 0.08 ± 11% perf-profile.children.cycles-pp.find_vma 0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.folio_migrate_flags 0.06 ± 8% +0.1 0.12 ± 8% perf-profile.children.cycles-pp.exit_to_user_mode_loop 0.03 ± 81% +0.1 0.10 ± 8% perf-profile.children.cycles-pp.uncharge_batch 0.00 +0.1 0.07 ± 8% perf-profile.children.cycles-pp.native_sched_clock 0.06 ± 10% +0.1 0.13 ± 8% perf-profile.children.cycles-pp.exit_to_user_mode_prepare 0.03 ± 81% +0.1 0.10 ± 10% perf-profile.children.cycles-pp.__folio_put 0.03 ± 81% +0.1 0.10 ± 10% perf-profile.children.cycles-pp.__mem_cgroup_uncharge 0.16 ± 12% +0.1 0.24 ± 10% perf-profile.children.cycles-pp.up_read 0.04 ± 50% +0.1 0.12 ± 8% perf-profile.children.cycles-pp.task_work_run 0.01 ±200% +0.1 0.09 ± 12% perf-profile.children.cycles-pp.page_counter_uncharge 0.23 ± 18% +0.1 0.31 ± 8% perf-profile.children.cycles-pp.folio_batch_move_lru 0.00 +0.1 0.08 ± 10% perf-profile.children.cycles-pp.sched_clock_cpu 0.23 ± 18% +0.1 0.31 ± 8% perf-profile.children.cycles-pp.lru_add_drain 0.23 ± 18% +0.1 0.31 ± 8% perf-profile.children.cycles-pp.lru_add_drain_cpu 0.19 ± 12% +0.1 0.28 ± 11% perf-profile.children.cycles-pp.down_read_trylock 0.05 ± 7% +0.1 0.14 ± 8% perf-profile.children.cycles-pp.mem_cgroup_migrate 0.00 +0.1 0.09 ± 10% perf-profile.children.cycles-pp._find_next_bit 0.03 ± 82% +0.1 0.12 ± 8% perf-profile.children.cycles-pp.change_pte_range 0.03 ± 82% +0.1 0.12 ± 8% perf-profile.children.cycles-pp.task_numa_work 0.03 ± 82% +0.1 0.12 ± 8% perf-profile.children.cycles-pp.change_prot_numa 0.03 ± 82% +0.1 0.12 ± 8% perf-profile.children.cycles-pp.change_protection_range 0.03 ± 82% +0.1 0.12 ± 8% perf-profile.children.cycles-pp.change_pmd_range 0.02 ±123% +0.1 0.12 ± 6% perf-profile.children.cycles-pp.irqtime_account_irq 0.07 ± 13% +0.1 0.18 ± 24% perf-profile.children.cycles-pp.__irq_exit_rcu 0.02 ±122% +0.1 0.13 ± 6% perf-profile.children.cycles-pp.page_counter_charge 0.18 ± 12% +0.1 0.30 ± 9% perf-profile.children.cycles-pp.folio_copy 0.17 ± 13% +0.1 0.30 ± 9% perf-profile.children.cycles-pp.copy_page 0.09 ± 4% +0.1 0.24 ± 9% perf-profile.children.cycles-pp.sync_regs 0.27 ± 11% +0.2 0.51 ± 8% perf-profile.children.cycles-pp.move_to_new_folio 0.27 ± 11% +0.2 0.51 ± 8% perf-profile.children.cycles-pp.migrate_folio_extra 0.10 ± 9% +0.3 0.40 ± 7% perf-profile.children.cycles-pp.native_irq_return_iret 0.07 ± 12% +0.4 0.47 ± 9% perf-profile.children.cycles-pp.__default_send_IPI_dest_field 0.00 +0.4 0.44 ± 9% perf-profile.children.cycles-pp.native_flush_tlb_local 0.09 ± 9% +0.5 0.62 ± 9% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys 0.09 ± 10% +1.2 1.32 ± 7% perf-profile.children.cycles-pp.flush_tlb_func 0.26 ± 12% +1.6 1.85 ± 9% perf-profile.children.cycles-pp.llist_reverse_order 0.42 ± 11% +2.4 2.86 ± 8% perf-profile.children.cycles-pp.llist_add_batch 0.42 ± 11% +3.3 3.76 ± 8% perf-profile.children.cycles-pp.__sysvec_call_function 0.42 ± 11% +3.3 3.76 ± 8% perf-profile.children.cycles-pp.__flush_smp_call_function_queue 0.44 ± 11% +3.5 3.90 ± 8% perf-profile.children.cycles-pp.sysvec_call_function 0.58 ± 6% +4.4 4.95 ± 8% perf-profile.children.cycles-pp.asm_sysvec_call_function 3.44 ± 13% +6.5 9.89 ± 7% perf-profile.children.cycles-pp.__handle_mm_fault 3.47 ± 13% +6.5 9.95 ± 7% perf-profile.children.cycles-pp.handle_mm_fault 3.15 ± 13% +6.5 9.69 ± 7% perf-profile.children.cycles-pp.do_numa_page 0.94 ± 12% +6.7 7.59 ± 7% perf-profile.children.cycles-pp.on_each_cpu_cond_mask 0.94 ± 12% +6.7 7.59 ± 7% perf-profile.children.cycles-pp.smp_call_function_many_cond 3.83 ± 13% +6.7 10.49 ± 7% perf-profile.children.cycles-pp.do_user_addr_fault 3.84 ± 13% +6.7 10.50 ± 7% perf-profile.children.cycles-pp.exc_page_fault 4.08 ± 11% +6.7 10.76 ± 7% perf-profile.children.cycles-pp.asm_exc_page_fault 2.57 ± 13% +6.9 9.46 ± 7% perf-profile.children.cycles-pp.migrate_misplaced_page 2.36 ± 13% +6.9 9.28 ± 7% perf-profile.children.cycles-pp.migrate_pages_batch 2.36 ± 13% +6.9 9.29 ± 7% perf-profile.children.cycles-pp.migrate_pages 0.00 +7.6 7.57 ± 7% perf-profile.children.cycles-pp.try_to_unmap_flush 0.00 +7.6 7.57 ± 7% perf-profile.children.cycles-pp.arch_tlbbatch_flush 67.74 ± 4% -8.5 59.28 ± 2% perf-profile.self.cycles-pp.do_access 1.19 ± 15% -0.9 0.28 ± 9% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 0.03 ± 81% +0.0 0.08 ± 10% perf-profile.self.cycles-pp.change_pte_range 0.09 ± 4% +0.0 0.14 ± 20% perf-profile.self.cycles-pp._raw_spin_lock 0.15 ± 12% +0.0 0.20 ± 10% perf-profile.self.cycles-pp.up_read 0.00 +0.1 0.05 ± 8% perf-profile.self.cycles-pp.try_to_migrate_one 0.00 +0.1 0.07 ± 8% perf-profile.self.cycles-pp._find_next_bit 0.00 +0.1 0.07 ± 8% perf-profile.self.cycles-pp.native_sched_clock 0.00 +0.1 0.07 ± 12% perf-profile.self.cycles-pp.page_counter_uncharge 0.17 ± 13% +0.1 0.27 ± 9% perf-profile.self.cycles-pp.copy_page 0.01 ±200% +0.1 0.11 ± 8% perf-profile.self.cycles-pp.page_counter_charge 0.09 ± 4% +0.1 0.24 ± 9% perf-profile.self.cycles-pp.sync_regs 0.10 ± 9% +0.3 0.39 ± 8% perf-profile.self.cycles-pp.native_irq_return_iret 0.07 ± 12% +0.4 0.47 ± 9% perf-profile.self.cycles-pp.__default_send_IPI_dest_field 0.00 +0.4 0.44 ± 10% perf-profile.self.cycles-pp.native_flush_tlb_local 0.08 ± 13% +0.5 0.62 ± 7% perf-profile.self.cycles-pp.__flush_smp_call_function_queue 0.06 ± 15% +0.8 0.88 ± 7% perf-profile.self.cycles-pp.flush_tlb_func 0.26 ± 12% +1.6 1.85 ± 9% perf-profile.self.cycles-pp.llist_reverse_order 0.36 ± 11% +2.0 2.40 ± 8% perf-profile.self.cycles-pp.llist_add_batch 0.38 ± 13% +3.1 3.49 ± 7% perf-profile.self.cycles-pp.smp_call_function_many_cond To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests