Greeting, FYI, we noticed a -1.4% regression of will-it-scale.per_thread_ops due to commit: commit: e498078ae9447c12f6ef1a060639428200bbf29f ("[PATCH 2/2] mm: prevent gup_fast from racing with COW during fork") url: https://github.com/0day-ci/linux/commits/Jason-Gunthorpe/Add-a-seqcount-between-gup_fast-and-copy_page_range/20201024-082022 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 40a03b750bb3ded71a0f21a0b7dfbf3b24068dcb in testcase: will-it-scale on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: nr_task: 100% mode: thread test: futex2 cpufreq_governor: performance ucode: 0x5002f01 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/futex2/will-it-scale/0x5002f01 commit: 9cb7c9fd44 ("mm: reorganize internal_get_user_pages_fast()") e498078ae9 ("mm: prevent gup_fast from racing with COW during fork") 9cb7c9fd44c983fd e498078ae9447c12f6ef1a06063 ---------------- --------------------------- %stddev %change %stddev \ | \ 2922312 -1.4% 2882609 will-it-scale.per_thread_ops 5.611e+08 -1.4% 5.535e+08 will-it-scale.workload 698642 ± 6% +9.2% 763213 ± 5% numa-meminfo.node2.MemUsed 100.67 ± 65% +64.0% 165.08 ± 2% sched_debug.cfs_rq:/.removed.load_avg.max 16926 ± 28% -44.1% 9461 ± 55% softirqs.NET_RX 34.47 -8.0% 31.71 boot-time.boot 5747 -9.2% 5219 boot-time.idle 221431 ± 4% +11.7% 247354 ± 5% proc-vmstat.numa_pte_updates 1120636 +1.7% 1139742 proc-vmstat.pgalloc_normal 1145 ± 7% +17.3% 1343 ± 8% slabinfo.task_group.active_objs 1145 ± 7% +17.3% 1343 ± 8% slabinfo.task_group.num_objs 4299 ± 15% +33.1% 5722 ± 19% interrupts.CPU101.CAL:Function_call_interrupts 5734 ± 18% +39.5% 7998 ± 31% interrupts.CPU122.CAL:Function_call_interrupts 4635 ± 10% +29.9% 6020 ± 16% interrupts.CPU146.CAL:Function_call_interrupts 345.50 ± 8% +66.6% 575.75 ± 28% interrupts.CPU146.RES:Rescheduling_interrupts 314.50 ± 3% +50.5% 473.25 ± 38% interrupts.CPU153.RES:Rescheduling_interrupts 4399 ± 15% +23.3% 5425 ± 16% interrupts.CPU171.CAL:Function_call_interrupts 470.75 ± 23% -29.3% 333.00 ± 11% interrupts.CPU24.RES:Rescheduling_interrupts 322.50 ± 5% +11.2% 358.50 ± 3% interrupts.CPU55.RES:Rescheduling_interrupts 317.00 ± 3% +7.4% 340.50 ± 3% interrupts.CPU59.RES:Rescheduling_interrupts 308.75 +107.9% 641.75 ± 58% interrupts.CPU7.RES:Rescheduling_interrupts 2883 ± 4% +7.8% 3108 interrupts.CPU92.TLB:TLB_shootdowns 1.64e+08 -3.2% 1.588e+08 perf-stat.i.branch-misses 13403934 ± 35% -35.8% 8604014 ± 3% perf-stat.i.cache-references 0.00 ± 8% -0.0 0.00 ± 14% perf-stat.i.dTLB-load-miss-rate% 1471544 ± 3% -52.6% 698114 ± 4% perf-stat.i.dTLB-load-misses 1.37 ± 17% +31.1% 1.80 ± 3% perf-stat.i.metric.K/sec 0.02 ± 35% -35.3% 0.02 ± 3% perf-stat.overall.MPKI 0.20 -0.0 0.19 perf-stat.overall.branch-miss-rate% 0.00 ± 3% -0.0 0.00 ± 4% perf-stat.overall.dTLB-load-miss-rate% 0.00 -0.0 0.00 perf-stat.overall.dTLB-store-miss-rate% 1.635e+08 -3.2% 1.582e+08 perf-stat.ps.branch-misses 13412014 ± 35% -35.6% 8635337 ± 3% perf-stat.ps.cache-references 1480248 ± 4% -52.2% 707692 ± 4% perf-stat.ps.dTLB-load-misses 23.09 -0.4 22.69 perf-profile.calltrace.cycles-pp.gup_pgd_range.internal_get_user_pages_fast.get_futex_key.futex_wait_setup.futex_wait 14.33 -0.2 14.15 perf-profile.calltrace.cycles-pp.__entry_text_start.syscall 2.29 -0.1 2.19 perf-profile.calltrace.cycles-pp.hash_futex.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex 2.97 -0.1 2.88 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex 3.95 -0.1 3.88 perf-profile.calltrace.cycles-pp.try_grab_compound_head.gup_pgd_range.internal_get_user_pages_fast.get_futex_key.futex_wait_setup 1.81 -0.0 1.78 perf-profile.calltrace.cycles-pp.get_user_pages_fast.get_futex_key.futex_wait_setup.futex_wait.do_futex 1.96 -0.0 1.93 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 1.14 +0.1 1.22 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.syscall 70.66 +0.2 70.82 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 31.58 +0.6 32.17 perf-profile.calltrace.cycles-pp.internal_get_user_pages_fast.get_futex_key.futex_wait_setup.futex_wait.do_futex 40.59 +0.6 41.19 perf-profile.calltrace.cycles-pp.get_futex_key.futex_wait_setup.futex_wait.do_futex.__x64_sys_futex 23.37 -0.4 22.94 perf-profile.children.cycles-pp.gup_pgd_range 2.38 -0.1 2.28 perf-profile.children.cycles-pp.hash_futex 9.24 -0.1 9.15 perf-profile.children.cycles-pp.__entry_text_start 3.07 -0.1 2.99 perf-profile.children.cycles-pp._raw_spin_lock 3.95 -0.1 3.89 perf-profile.children.cycles-pp.try_grab_compound_head 2.08 -0.0 2.04 perf-profile.children.cycles-pp.syscall_enter_from_user_mode 0.06 ± 7% +0.0 0.07 ± 5% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler 0.40 +0.0 0.45 perf-profile.children.cycles-pp.is_valid_gup_flags 0.00 +0.1 0.05 perf-profile.children.cycles-pp.irqtime_account_irq 0.00 +0.1 0.05 perf-profile.children.cycles-pp.irq_exit_rcu 1.15 +0.1 1.24 perf-profile.children.cycles-pp.exit_to_user_mode_prepare 70.86 +0.2 71.02 perf-profile.children.cycles-pp.__x64_sys_futex 40.93 +0.5 41.47 perf-profile.children.cycles-pp.get_futex_key 31.73 +0.7 32.40 perf-profile.children.cycles-pp.internal_get_user_pages_fast 19.00 -0.4 18.59 perf-profile.self.cycles-pp.gup_pgd_range 7.76 ± 2% -0.2 7.52 perf-profile.self.cycles-pp.futex_wait_setup 7.45 -0.2 7.21 perf-profile.self.cycles-pp.get_futex_key 6.89 -0.1 6.77 perf-profile.self.cycles-pp.futex_wait 2.25 -0.1 2.16 perf-profile.self.cycles-pp.hash_futex 3.04 -0.1 2.96 perf-profile.self.cycles-pp._raw_spin_lock 1.95 -0.0 1.91 perf-profile.self.cycles-pp.syscall_enter_from_user_mode 0.28 ± 2% +0.0 0.30 perf-profile.self.cycles-pp.is_valid_gup_flags 1.30 +0.0 1.35 perf-profile.self.cycles-pp.get_user_pages_fast 1.05 +0.1 1.13 perf-profile.self.cycles-pp.exit_to_user_mode_prepare 1.98 +0.1 2.07 perf-profile.self.cycles-pp.do_futex 8.14 +1.1 9.25 perf-profile.self.cycles-pp.internal_get_user_pages_fast will-it-scale.per_thread_ops 3e+06 +-----------------------------------------------------------------+ |OOO OOOOOO +OOO++ OO+O+++.++++++.+ ++++.+++++++ + + ++ +.+++| 2.8e+06 |-+ | 2.6e+06 |-+ | | | 2.4e+06 |-+ | 2.2e+06 |-+ | | | 2e+06 |-+ | 1.8e+06 |-+ | | | 1.6e+06 |-+ | 1.4e+06 |-+ | | | 1.2e+06 +-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen