Greeting, FYI, we noticed a -53.3% regression of will-it-scale.per_thread_ops due to commit: commit: 5df397dec7c4c08c23bd14f162f1228836faa4ce ("mm: delay page_remove_rmap() until after the TLB has been flushed") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: will-it-scale on test machine: 144 threads 4 sockets Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz (Haswell-EX) with 512G memory with following parameters: nr_task: 16 mode: thread test: page_fault3 cpufreq_governor: performance test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-hsw-4ex1/page_fault3/will-it-scale commit: 7cc8f9c714 ("mm: mmu_gather: prepare to gather encoded page pointers with flags") 5df397dec7 ("mm: delay page_remove_rmap() until after the TLB has been flushed") 7cc8f9c7146a5c2d 5df397dec7c4c08c23bd14f162f ---------------- --------------------------- %stddev %change %stddev \ | \ 4833919 ± 2% -53.3% 2256919 ± 5% will-it-scale.16.threads 87.36 +4.3% 91.08 will-it-scale.16.threads_idle 302119 ± 2% -53.3% 141057 ± 5% will-it-scale.per_thread_ops 4833919 ± 2% -53.3% 2256919 ± 5% will-it-scale.workload 3489403 ± 3% -44.9% 1922542 ± 2% numa-numastat.node0.local_node 3533773 ± 2% -43.5% 1995825 ± 2% numa-numastat.node0.numa_hit 2.76 ± 3% +1.2 3.99 ± 8% mpstat.cpu.all.irq% 8.14 -4.3 3.89 ± 3% mpstat.cpu.all.sys% 1.55 ± 2% -0.7 0.89 ± 5% mpstat.cpu.all.usr% 14653 ± 9% -19.9% 11733 ± 8% numa-meminfo.node3.Active 14653 ± 9% -19.9% 11733 ± 8% numa-meminfo.node3.Active(anon) 16893 ± 9% -23.3% 12958 ± 8% numa-meminfo.node3.Shmem 13.83 ± 2% -36.1% 8.83 ± 4% vmstat.procs.r 10277 ± 2% -48.3% 5312 ± 2% vmstat.system.cs 388169 +267.4% 1426305 ± 6% vmstat.system.in 3533791 ± 2% -43.5% 1995854 ± 2% numa-vmstat.node0.numa_hit 3489421 ± 3% -44.9% 1922571 ± 2% numa-vmstat.node0.numa_local 3662 ± 9% -19.9% 2933 ± 8% numa-vmstat.node3.nr_active_anon 4208 ± 9% -23.2% 3233 ± 7% numa-vmstat.node3.nr_shmem 3662 ± 9% -19.9% 2933 ± 8% numa-vmstat.node3.nr_zone_active_anon 2.279e+09 ± 6% -1.4e+09 8.495e+08 ± 19% syscalls.sys_write.noise.100% 2.286e+09 ± 6% -1.4e+09 8.607e+08 ± 18% syscalls.sys_write.noise.2% 2.284e+09 ± 6% -1.4e+09 8.585e+08 ± 18% syscalls.sys_write.noise.25% 2.286e+09 ± 6% -1.4e+09 8.606e+08 ± 18% syscalls.sys_write.noise.5% 2.282e+09 ± 6% -1.4e+09 8.556e+08 ± 18% syscalls.sys_write.noise.50% 2.281e+09 ± 6% -1.4e+09 8.524e+08 ± 18% syscalls.sys_write.noise.75% 345.00 -31.3% 237.17 ± 6% turbostat.Avg_MHz 13.98 -4.0 10.01 ± 7% turbostat.Busy% 2473 -4.0% 2373 turbostat.Bzy_MHz 645971 ± 7% -85.2% 95702 ± 10% turbostat.C1 0.21 ± 4% -0.2 0.01 turbostat.C1% 40618322 +40.0% 56882712 ± 28% turbostat.C1E 1.489e+08 +426.2% 7.835e+08 ± 7% turbostat.IRQ 51.67 -6.1% 48.50 ± 2% turbostat.PkgTmp 266729 +2.5% 273503 proc-vmstat.nr_mapped 1700 -3.9% 1634 proc-vmstat.nr_page_table_pages 4452235 -34.9% 2899429 ± 2% proc-vmstat.numa_hit 4216797 -36.8% 2664694 ± 2% proc-vmstat.numa_local 272667 ± 6% -12.1% 239725 ± 4% proc-vmstat.numa_pte_updates 548806 -1.5% 540706 proc-vmstat.pgactivate 4543721 -34.2% 2989971 ± 2% proc-vmstat.pgalloc_normal 1.456e+09 ± 2% -53.3% 6.802e+08 ± 5% proc-vmstat.pgfault 4496329 -35.1% 2917809 ± 2% proc-vmstat.pgfree 47639 ± 3% -3.8% 45842 proc-vmstat.pgreuse 0.21 ± 19% -40.7% 0.13 ± 20% sched_debug.cfs_rq:/.h_nr_running.avg 0.39 ± 8% -19.5% 0.31 ± 9% sched_debug.cfs_rq:/.h_nr_running.stddev 308353 ± 22% -51.8% 148647 ± 16% sched_debug.cfs_rq:/.min_vruntime.avg 1414517 ± 14% -76.7% 330200 ± 7% sched_debug.cfs_rq:/.min_vruntime.max 433797 ± 16% -72.0% 121251 ± 11% sched_debug.cfs_rq:/.min_vruntime.stddev 0.21 ± 19% -40.6% 0.13 ± 20% sched_debug.cfs_rq:/.nr_running.avg 0.39 ± 7% -19.3% 0.31 ± 9% sched_debug.cfs_rq:/.nr_running.stddev 214.54 ± 18% -41.7% 125.11 ± 13% sched_debug.cfs_rq:/.runnable_avg.avg 1005 ± 5% -31.8% 686.13 ± 2% sched_debug.cfs_rq:/.runnable_avg.max 343.01 ± 7% -39.9% 206.25 ± 7% sched_debug.cfs_rq:/.runnable_avg.stddev -834456 -79.4% -172083 sched_debug.cfs_rq:/.spread0.avg -1120300 -73.5% -297377 sched_debug.cfs_rq:/.spread0.min 433810 ± 16% -72.0% 121253 ± 11% sched_debug.cfs_rq:/.spread0.stddev 214.28 ± 18% -41.6% 125.05 ± 13% sched_debug.cfs_rq:/.util_avg.avg 995.30 ± 5% -31.1% 686.07 ± 2% sched_debug.cfs_rq:/.util_avg.max 342.65 ± 7% -39.8% 206.23 ± 7% sched_debug.cfs_rq:/.util_avg.stddev 156.53 ± 19% -66.3% 52.75 ± 23% sched_debug.cfs_rq:/.util_est_enqueued.avg 932.66 -31.6% 637.87 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.max 307.80 ± 8% -52.2% 147.21 ± 12% sched_debug.cfs_rq:/.util_est_enqueued.stddev 209592 ± 12% -40.2% 125325 ± 5% sched_debug.cpu.avg_idle.stddev 2125 ± 7% +78.2% 3787 ± 4% sched_debug.cpu.clock_task.stddev 699.47 ± 4% -33.2% 467.30 ± 19% sched_debug.cpu.curr->pid.avg 2013 ± 2% -16.6% 1679 ± 9% sched_debug.cpu.curr->pid.stddev 0.11 ± 3% -31.6% 0.08 ± 18% sched_debug.cpu.nr_running.avg 0.31 ± 2% -18.7% 0.25 ± 10% sched_debug.cpu.nr_running.stddev 12135 ± 8% -39.9% 7293 ± 6% sched_debug.cpu.nr_switches.avg 18199 ± 6% -36.4% 11579 ± 12% sched_debug.cpu.nr_switches.stddev 3.131e+09 ± 2% -39.8% 1.886e+09 ± 3% perf-stat.i.branch-instructions 44126077 ± 2% -20.4% 35115264 ± 24% perf-stat.i.branch-misses 5927448 ± 2% -45.8% 3214508 ± 4% perf-stat.i.cache-misses 10244 ± 2% -49.0% 5226 ± 2% perf-stat.i.context-switches 3.26 ± 2% +10.1% 3.59 ± 5% perf-stat.i.cpi 4.882e+10 -33.1% 3.268e+10 ± 8% perf-stat.i.cpu-cycles 161.87 +85.0% 299.50 ± 6% perf-stat.i.cpu-migrations 8264 ± 3% +23.6% 10215 ± 5% perf-stat.i.cycles-between-cache-misses 0.54 ± 2% -0.1 0.42 ± 20% perf-stat.i.dTLB-load-miss-rate% 23833616 ± 4% -52.0% 11438937 ± 24% perf-stat.i.dTLB-load-misses 4.386e+09 ± 2% -38.3% 2.706e+09 ± 5% perf-stat.i.dTLB-loads 3.08 -0.8 2.29 ± 7% perf-stat.i.dTLB-store-miss-rate% 99573880 ± 2% -53.4% 46450324 ± 5% perf-stat.i.dTLB-store-misses 3.126e+09 -35.8% 2.005e+09 ± 12% perf-stat.i.dTLB-stores 84.12 -70.2 13.94 ± 14% perf-stat.i.iTLB-load-miss-rate% 12215802 ± 2% -53.0% 5736925 ± 18% perf-stat.i.iTLB-load-misses 2291312 ± 2% +1452.8% 35580378 ± 4% perf-stat.i.iTLB-loads 1.503e+10 ± 2% -39.1% 9.153e+09 ± 3% perf-stat.i.instructions 1238 +33.1% 1648 ± 15% perf-stat.i.instructions-per-iTLB-miss 0.31 ± 2% -8.7% 0.28 ± 5% perf-stat.i.ipc 0.34 -33.1% 0.23 ± 8% perf-stat.i.metric.GHz 1664 ± 2% -24.0% 1265 ± 19% perf-stat.i.metric.K/sec 73.89 ± 2% -38.0% 45.81 ± 7% perf-stat.i.metric.M/sec 4813959 ± 2% -53.3% 2249535 ± 5% perf-stat.i.minor-faults 17.08 ± 3% +9.2 26.28 ± 9% perf-stat.i.node-load-miss-rate% 173197 ± 4% +26.8% 219640 ± 8% perf-stat.i.node-load-misses 798176 ± 4% -25.8% 592390 ± 4% perf-stat.i.node-loads 1.12 ± 4% +1.1 2.22 ± 13% perf-stat.i.node-store-miss-rate% 4880735 ± 2% -52.4% 2323565 ± 5% perf-stat.i.node-stores 4813959 ± 2% -53.3% 2249535 ± 5% perf-stat.i.page-faults 3.25 ± 2% +9.6% 3.56 ± 5% perf-stat.overall.cpi 8243 ± 3% +23.2% 10151 ± 5% perf-stat.overall.cycles-between-cache-misses 0.54 ± 2% -0.1 0.42 ± 20% perf-stat.overall.dTLB-load-miss-rate% 3.09 -0.8 2.29 ± 7% perf-stat.overall.dTLB-store-miss-rate% 84.20 -70.4 13.81 ± 14% perf-stat.overall.iTLB-load-miss-rate% 1230 +33.5% 1642 ± 15% perf-stat.overall.instructions-per-iTLB-miss 0.31 ± 2% -8.6% 0.28 ± 5% perf-stat.overall.ipc 17.82 ± 2% +9.2 27.04 ± 8% perf-stat.overall.node-load-miss-rate% 1.07 ± 3% +1.2 2.22 ± 11% perf-stat.overall.node-store-miss-rate% 939946 +30.6% 1227132 perf-stat.overall.path-length 3.12e+09 ± 2% -39.8% 1.88e+09 ± 3% perf-stat.ps.branch-instructions 43993862 ± 2% -20.5% 34990521 ± 24% perf-stat.ps.branch-misses 5907169 ± 2% -45.8% 3203367 ± 4% perf-stat.ps.cache-misses 10208 ± 2% -49.0% 5209 ± 2% perf-stat.ps.context-switches 4.866e+10 -33.1% 3.257e+10 ± 8% perf-stat.ps.cpu-cycles 161.37 +84.9% 298.43 ± 6% perf-stat.ps.cpu-migrations 23752290 ± 4% -52.0% 11400060 ± 24% perf-stat.ps.dTLB-load-misses 4.371e+09 ± 2% -38.3% 2.697e+09 ± 5% perf-stat.ps.dTLB-loads 99240240 ± 2% -53.3% 46295983 ± 5% perf-stat.ps.dTLB-store-misses 3.115e+09 -35.8% 1.998e+09 ± 12% perf-stat.ps.dTLB-stores 12173085 ± 2% -53.0% 5717673 ± 18% perf-stat.ps.iTLB-load-misses 2283336 ± 2% +1453.1% 35462398 ± 4% perf-stat.ps.iTLB-loads 1.498e+10 ± 2% -39.1% 9.122e+09 ± 3% perf-stat.ps.instructions 4797701 ± 2% -53.3% 2242024 ± 5% perf-stat.ps.minor-faults 172442 ± 4% +26.9% 218745 ± 8% perf-stat.ps.node-load-misses 795453 ± 4% -25.8% 590316 ± 4% perf-stat.ps.node-loads 4864155 ± 2% -52.4% 2315716 ± 5% perf-stat.ps.node-stores 4797701 ± 2% -53.3% 2242024 ± 5% perf-stat.ps.page-faults 4.543e+12 ± 2% -39.1% 2.767e+12 ± 3% perf-stat.total.instructions 54.05 ± 8% -10.8 43.22 ± 7% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 36.62 ± 9% -6.2 30.46 ± 7% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 36.90 ± 9% -6.0 30.86 ± 7% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 49.64 ± 8% -5.8 43.88 ± 7% perf-profile.calltrace.cycles-pp.testcase 13.85 ± 9% -4.8 9.06 ± 8% perf-profile.calltrace.cycles-pp.down_read_trylock.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 5.79 ± 9% -2.1 3.66 ± 9% perf-profile.calltrace.cycles-pp.up_read.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 5.32 ± 8% -1.8 3.55 ± 10% perf-profile.calltrace.cycles-pp.mt_find.find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 5.36 ± 8% -1.7 3.66 ± 10% perf-profile.calltrace.cycles-pp.find_vma.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 4.32 ± 9% -1.2 3.10 ± 7% perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.testcase 1.57 ± 8% -0.3 1.27 ± 7% perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase 1.72 ± 7% -0.3 1.43 ± 10% perf-profile.calltrace.cycles-pp.error_entry.testcase 1.14 ± 11% +0.2 1.39 ± 5% perf-profile.calltrace.cycles-pp.do_set_pte.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.46 ± 10% +0.3 1.75 ± 5% perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 1.33 ± 7% +0.5 1.78 ± 6% perf-profile.calltrace.cycles-pp.__filemap_get_folio.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault 1.53 ± 7% +0.5 2.03 ± 6% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault 1.80 ± 8% +0.5 2.32 ± 6% perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.88 ± 7% +0.6 2.45 ± 6% perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.00 +0.8 0.78 ± 4% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.up_read 0.00 +0.8 0.79 ± 4% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.up_read.do_user_addr_fault 0.00 +0.8 0.80 ± 6% perf-profile.calltrace.cycles-pp.__default_send_IPI_dest_field.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range 0.00 +0.8 0.81 ± 4% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.up_read.do_user_addr_fault.exc_page_fault 0.00 +0.8 0.85 ± 10% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.mt_find 0.00 +0.9 0.85 ± 8% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.__handle_mm_fault 0.00 +0.9 0.86 ± 10% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.mt_find.find_vma 0.00 +0.9 0.86 ± 8% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.__handle_mm_fault.handle_mm_fault 0.00 +0.9 0.88 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.up_read.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.00 +0.9 0.88 ± 10% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.mt_find.find_vma.do_user_addr_fault 0.00 +0.9 0.88 ± 8% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.00 +0.9 0.90 ± 7% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range 0.00 +0.9 0.94 ± 65% perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range 0.00 +1.0 0.96 ± 9% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 0.00 +1.0 0.96 ± 10% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.mt_find.find_vma.do_user_addr_fault.exc_page_fault 0.00 +1.0 0.97 ± 65% perf-profile.calltrace.cycles-pp.flush_tlb_func.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range 4.12 ± 7% +1.0 5.11 ± 6% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 0.00 +1.7 1.68 ± 9% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.down_read_trylock 0.00 +1.7 1.70 ± 9% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.down_read_trylock.do_user_addr_fault 0.00 +1.7 1.74 ± 9% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.down_read_trylock.do_user_addr_fault.exc_page_fault 0.00 +1.9 1.91 ± 10% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.down_read_trylock.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.00 +2.1 2.15 ± 6% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault 0.00 +2.2 2.17 ± 6% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault 0.00 +2.2 2.22 ± 6% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.00 +2.5 2.47 ± 6% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.00 +2.6 2.63 ± 8% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.testcase 0.00 +2.7 2.65 ± 8% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.testcase 0.00 +2.7 2.72 ± 8% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.testcase 0.00 +3.0 3.00 ± 21% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range.zap_pmd_range 0.00 +3.0 3.00 ± 21% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range 0.00 +3.4 3.36 ± 9% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.testcase 1.74 ± 9% +4.1 5.80 ± 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 1.74 ± 9% +4.1 5.80 ± 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 1.74 ± 9% +4.1 5.80 ± 7% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 1.74 ± 9% +4.1 5.80 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 1.72 ± 9% +4.1 5.78 ± 7% perf-profile.calltrace.cycles-pp.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.75 ± 9% +4.1 5.81 ± 7% perf-profile.calltrace.cycles-pp.__munmap 1.70 ± 9% +4.1 5.76 ± 7% perf-profile.calltrace.cycles-pp.unmap_region.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 1.69 ± 9% +4.1 5.76 ± 7% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_mas_align_munmap.__vm_munmap 1.69 ± 9% +4.1 5.76 ± 7% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_mas_align_munmap 1.69 ± 9% +4.1 5.76 ± 7% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap 1.66 ± 9% +4.1 5.74 ± 7% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 0.00 +4.3 4.35 ± 8% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 0.00 +8.2 8.17 ± 6% perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function 0.00 +8.5 8.49 ± 5% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function 47.84 ± 8% -9.1 38.75 ± 7% perf-profile.children.cycles-pp.asm_exc_page_fault 53.90 ± 8% -6.9 47.01 ± 7% perf-profile.children.cycles-pp.testcase 36.96 ± 9% -6.0 30.91 ± 7% perf-profile.children.cycles-pp.exc_page_fault 36.70 ± 9% -5.9 30.76 ± 7% perf-profile.children.cycles-pp.do_user_addr_fault 13.87 ± 9% -4.6 9.24 ± 8% perf-profile.children.cycles-pp.down_read_trylock 5.80 ± 8% -2.1 3.72 ± 8% perf-profile.children.cycles-pp.up_read 5.33 ± 8% -1.7 3.63 ± 10% perf-profile.children.cycles-pp.mt_find 5.37 ± 8% -1.7 3.67 ± 10% perf-profile.children.cycles-pp.find_vma 4.52 ± 9% -0.9 3.61 ± 7% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode 1.58 ± 8% -0.3 1.28 ± 7% perf-profile.children.cycles-pp.__irqentry_text_end 1.84 ± 7% -0.3 1.56 ± 10% perf-profile.children.cycles-pp.error_entry 0.65 ± 8% -0.2 0.47 ± 6% perf-profile.children.cycles-pp.page_remove_rmap 0.13 ± 16% -0.1 0.06 ± 14% perf-profile.children.cycles-pp.rwsem_down_read_slowpath 0.35 ± 8% -0.0 0.30 ± 9% perf-profile.children.cycles-pp.__mod_lruvec_state 0.07 ± 10% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.__traceiter_sched_switch 0.07 ± 12% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.perf_trace_sched_switch 0.09 ± 13% +0.0 0.12 ± 9% perf-profile.children.cycles-pp.__cond_resched 0.18 ± 9% +0.0 0.23 ± 6% perf-profile.children.cycles-pp.__might_resched 0.12 ± 9% +0.0 0.16 ± 11% perf-profile.children.cycles-pp.noop_dirty_folio 0.08 ± 10% +0.1 0.13 ± 20% perf-profile.children.cycles-pp.arch_scale_freq_tick 0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-pp.__alloc_pages 0.05 ± 47% +0.1 0.10 ± 26% perf-profile.children.cycles-pp.update_sg_lb_stats 0.06 ± 28% +0.1 0.13 ± 23% perf-profile.children.cycles-pp.native_apic_mem_write 0.08 ± 11% +0.1 0.15 ± 33% perf-profile.children.cycles-pp.update_sd_lb_stats 0.09 ± 12% +0.1 0.16 ± 33% perf-profile.children.cycles-pp.find_busiest_group 0.01 ±223% +0.1 0.08 ± 20% perf-profile.children.cycles-pp.kthread 0.01 ±223% +0.1 0.08 ± 17% perf-profile.children.cycles-pp.ret_from_fork 0.16 ± 11% +0.1 0.28 ± 28% perf-profile.children.cycles-pp.rebalance_domains 0.59 ± 9% +0.1 0.71 ± 6% perf-profile.children.cycles-pp.xas_load 0.32 ± 8% +0.2 0.49 ± 27% perf-profile.children.cycles-pp.__softirqentry_text_start 0.15 ± 37% +0.2 0.33 ± 11% perf-profile.children.cycles-pp.irqtime_account_irq 1.16 ± 11% +0.2 1.40 ± 5% perf-profile.children.cycles-pp.do_set_pte 0.00 +0.2 0.25 ± 7% perf-profile.children.cycles-pp.llist_reverse_order 1.46 ± 10% +0.3 1.76 ± 5% perf-profile.children.cycles-pp.finish_fault 0.47 ± 13% +0.3 0.80 ± 22% perf-profile.children.cycles-pp.__irq_exit_rcu 0.00 +0.4 0.36 ± 9% perf-profile.children.cycles-pp.llist_add_batch 1.36 ± 7% +0.5 1.83 ± 6% perf-profile.children.cycles-pp.__filemap_get_folio 0.00 +0.5 0.49 ± 5% perf-profile.children.cycles-pp.tlb_flush_rmaps 1.54 ± 7% +0.5 2.04 ± 6% perf-profile.children.cycles-pp.shmem_get_folio_gfp 1.80 ± 8% +0.5 2.34 ± 6% perf-profile.children.cycles-pp.shmem_fault 1.88 ± 7% +0.6 2.45 ± 6% perf-profile.children.cycles-pp.__do_fault 0.04 ± 71% +0.8 0.80 ± 6% perf-profile.children.cycles-pp.__default_send_IPI_dest_field 0.05 ± 46% +0.9 0.90 ± 7% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys 4.14 ± 7% +1.0 5.13 ± 6% perf-profile.children.cycles-pp.do_fault 0.12 ± 18% +2.9 3.01 ± 21% perf-profile.children.cycles-pp.on_each_cpu_cond_mask 0.12 ± 18% +2.9 3.01 ± 21% perf-profile.children.cycles-pp.smp_call_function_many_cond 1.74 ± 9% +4.1 5.80 ± 7% perf-profile.children.cycles-pp.__vm_munmap 1.74 ± 9% +4.1 5.80 ± 7% perf-profile.children.cycles-pp.__x64_sys_munmap 1.73 ± 9% +4.1 5.79 ± 7% perf-profile.children.cycles-pp.do_mas_align_munmap 1.75 ± 9% +4.1 5.81 ± 7% perf-profile.children.cycles-pp.__munmap 1.70 ± 9% +4.1 5.77 ± 7% perf-profile.children.cycles-pp.unmap_region 1.69 ± 9% +4.1 5.76 ± 7% perf-profile.children.cycles-pp.unmap_page_range 1.69 ± 9% +4.1 5.76 ± 7% perf-profile.children.cycles-pp.zap_pmd_range 1.69 ± 9% +4.1 5.77 ± 7% perf-profile.children.cycles-pp.unmap_vmas 1.69 ± 9% +4.1 5.76 ± 7% perf-profile.children.cycles-pp.zap_pte_range 2.02 ± 10% +4.1 6.11 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 2.02 ± 10% +4.1 6.11 ± 6% perf-profile.children.cycles-pp.do_syscall_64 0.12 ± 17% +4.2 4.35 ± 8% perf-profile.children.cycles-pp.flush_tlb_mm_range 0.11 ± 9% +11.2 11.32 ± 6% perf-profile.children.cycles-pp.__flush_smp_call_function_queue 0.12 ± 10% +11.3 11.42 ± 6% perf-profile.children.cycles-pp.__sysvec_call_function 0.15 ± 11% +11.6 11.74 ± 6% perf-profile.children.cycles-pp.sysvec_call_function 0.00 +12.5 12.54 ± 5% perf-profile.children.cycles-pp.native_flush_tlb_one_user 0.08 ± 13% +12.9 13.00 ± 5% perf-profile.children.cycles-pp.flush_tlb_func 0.25 ± 6% +13.0 13.23 ± 6% perf-profile.children.cycles-pp.asm_sysvec_call_function 13.72 ± 9% -6.4 7.30 ± 8% perf-profile.self.cycles-pp.down_read_trylock 5.74 ± 8% -2.9 2.83 ± 10% perf-profile.self.cycles-pp.up_read 5.27 ± 8% -2.6 2.67 ± 10% perf-profile.self.cycles-pp.mt_find 4.89 ± 11% -2.2 2.66 ± 10% perf-profile.self.cycles-pp.__handle_mm_fault 6.78 ± 8% -1.6 5.15 ± 7% perf-profile.self.cycles-pp.testcase 4.48 ± 9% -0.9 3.58 ± 7% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode 1.58 ± 8% -0.3 1.27 ± 7% perf-profile.self.cycles-pp.__irqentry_text_end 1.53 ± 7% -0.3 1.28 ± 8% perf-profile.self.cycles-pp.error_entry 0.54 ± 11% -0.2 0.38 ± 11% perf-profile.self.cycles-pp.___perf_sw_event 0.49 ± 10% -0.1 0.39 ± 9% perf-profile.self.cycles-pp.xas_load 0.28 ± 10% -0.1 0.18 ± 7% perf-profile.self.cycles-pp.__mod_node_page_state 0.23 ± 12% -0.1 0.14 ± 19% perf-profile.self.cycles-pp.exc_page_fault 0.25 ± 9% -0.1 0.16 ± 5% perf-profile.self.cycles-pp.page_remove_rmap 0.30 ± 6% -0.1 0.23 ± 9% perf-profile.self.cycles-pp.__perf_sw_event 0.19 ± 14% -0.1 0.13 ± 16% perf-profile.self.cycles-pp.mem_cgroup_from_task 0.24 ± 11% -0.1 0.18 ± 6% perf-profile.self.cycles-pp.shmem_fault 0.18 ± 9% -0.1 0.13 ± 5% perf-profile.self.cycles-pp.shmem_get_folio_gfp 0.15 ± 8% -0.0 0.11 ± 8% perf-profile.self.cycles-pp.__count_memcg_events 0.13 ± 11% -0.0 0.10 ± 13% perf-profile.self.cycles-pp.do_set_pte 0.09 ± 9% -0.0 0.07 ± 10% perf-profile.self.cycles-pp.current_time 0.08 ± 7% -0.0 0.06 ± 11% perf-profile.self.cycles-pp.xas_start 0.07 ± 28% +0.0 0.11 ± 20% perf-profile.self.cycles-pp.irqtime_account_irq 0.08 ± 10% +0.1 0.13 ± 20% perf-profile.self.cycles-pp.arch_scale_freq_tick 0.06 ± 19% +0.1 0.12 ± 22% perf-profile.self.cycles-pp.native_apic_mem_write 0.02 ±141% +0.1 0.08 ± 28% perf-profile.self.cycles-pp.update_sg_lb_stats 0.00 +0.1 0.06 perf-profile.self.cycles-pp.sysvec_call_function 0.06 ± 46% +0.1 0.14 ± 29% perf-profile.self.cycles-pp.ktime_get_update_offsets_now 0.00 +0.1 0.08 ± 11% perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys 0.30 ± 8% +0.2 0.46 ± 8% perf-profile.self.cycles-pp.do_user_addr_fault 0.15 ± 45% +0.2 0.33 ± 44% perf-profile.self.cycles-pp.ktime_get 0.00 +0.2 0.25 ± 7% perf-profile.self.cycles-pp.llist_reverse_order 0.00 +0.3 0.33 ± 6% perf-profile.self.cycles-pp.__flush_smp_call_function_queue 0.00 +0.4 0.36 ± 8% perf-profile.self.cycles-pp.llist_add_batch 0.00 +0.5 0.47 ± 7% perf-profile.self.cycles-pp.flush_tlb_func 0.00 +0.7 0.68 ± 12% perf-profile.self.cycles-pp.smp_call_function_many_cond 0.04 ± 71% +0.8 0.80 ± 6% perf-profile.self.cycles-pp.__default_send_IPI_dest_field 0.00 +12.5 12.52 ± 6% perf-profile.self.cycles-pp.native_flush_tlb_one_user If you fix the issue, kindly add following tag | Reported-by: kernel test robot | Link: https://lore.kernel.org/oe-lkp/202212051534.852804af-yujie.liu@intel.com To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://01.org/lkp