Greeting, FYI, we noticed a 43.4% improvement of vm-scalability.throughput due to commit: commit: 4d8191276e029a0ea7ef58f329006972551dbe29 ("[PATCH 2/2] mm: memcg: add a new MEMCG_UPDATE_BATCH") url: https://github.com/0day-ci/linux/commits/Feng-Tang/mm-page_counter-relayout-structure-to-reduce-false-sharing/20201229-223627 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git dea8dcf2a9fa8cc540136a6cd885c3beece16ec3 in testcase: vm-scalability on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: runtime: 300s size: 1T test: lru-shm cpufreq_governor: performance ucode: 0x5003003 test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/ Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/1T/lkp-csl-2ap4/lru-shm/vm-scalability/0x5003003 commit: f13e623fa8 ("mm: page_counter: relayout structure to reduce false sharing") 4d8191276e ("mm: memcg: add a new MEMCG_UPDATE_BATCH") f13e623fa86ab7de 4d8191276e029a0ea7ef58f3290 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | 0:4 58% 2:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry.do_access 0:4 69% 3:4 perf-profile.calltrace.cycles-pp.error_entry.do_access 3:4 17% 3:4 perf-profile.children.cycles-pp.error_entry 0:4 3% 0:4 perf-profile.self.cycles-pp.error_entry %stddev %change %stddev \ | \ 0.02 ± 2% -38.1% 0.01 vm-scalability.free_time 318058 ± 4% +41.6% 450322 vm-scalability.median 1.29 ± 20% +0.6 1.87 ± 18% vm-scalability.median_stddev% 60698434 ± 4% +43.4% 87053351 vm-scalability.throughput 55128 ± 6% -25.3% 41169 ± 2% vm-scalability.time.involuntary_context_switches 7.077e+08 +8.8% 7.698e+08 vm-scalability.time.minor_page_faults 3381 ± 4% -24.5% 2551 vm-scalability.time.percent_of_cpu_this_job_got 8274 ± 4% -32.2% 5609 vm-scalability.time.system_time 2079 ± 3% +8.7% 2260 ± 3% vm-scalability.time.user_time 67274 +7.1% 72042 vm-scalability.time.voluntary_context_switches 3.17e+09 +8.8% 3.448e+09 vm-scalability.workload 14.50 ± 5% -4.6 9.88 mpstat.cpu.all.sys% 79.75 +5.3% 84.00 vmstat.cpu.id 35.50 ± 4% -23.2% 27.25 vmstat.procs.r 8528 +2.1% 8707 vmstat.system.cs 92369 ± 4% -16.8% 76889 ± 2% meminfo.Active 91324 ± 4% -16.9% 75846 ± 2% meminfo.Active(anon) 8538039 ± 5% -23.3% 6548365 ± 3% meminfo.Mapped 20654 ± 4% -9.7% 18643 meminfo.PageTables 1.759e+08 ± 3% +9.7% 1.929e+08 numa-numastat.node1.local_node 1.759e+08 ± 3% +9.7% 1.93e+08 numa-numastat.node1.numa_hit 1.76e+08 +11.7% 1.967e+08 numa-numastat.node3.local_node 1.761e+08 +11.7% 1.967e+08 numa-numastat.node3.numa_hit 0.83 ± 20% -70.0% 0.25 ±137% sched_debug.cfs_rq:/.load_avg.min 4209545 ± 11% -23.4% 3224267 ± 10% sched_debug.cfs_rq:/.min_vruntime.avg 4780801 ± 12% -24.9% 3589252 ± 11% sched_debug.cfs_rq:/.min_vruntime.max 3665900 ± 10% -21.0% 2895741 ± 10% sched_debug.cfs_rq:/.min_vruntime.min 801.36 ± 11% +18.7% 950.87 ± 10% sched_debug.cfs_rq:/.util_est_enqueued.max -22.25 -31.6% -15.21 sched_debug.cpu.nr_uninterruptible.min 262824 ± 42% +68.3% 442338 ± 20% numa-meminfo.node0.AnonPages.max 2362824 ± 15% -30.3% 1645765 ± 11% numa-meminfo.node0.Mapped 1983339 ± 7% -24.3% 1501294 ± 4% numa-meminfo.node1.Mapped 4331 ± 22% -25.1% 3244 ± 2% numa-meminfo.node1.PageTables 2047399 ± 4% -19.5% 1648724 ± 11% numa-meminfo.node2.Mapped 80277 ± 6% -17.8% 65959 ± 6% numa-meminfo.node3.Active 80217 ± 6% -18.3% 65547 ± 5% numa-meminfo.node3.Active(anon) 7105 ± 4% +13.5% 8061 ± 12% numa-meminfo.node3.KernelStack 2001243 ± 4% -19.7% 1607335 ± 6% numa-meminfo.node3.Mapped 22819 ± 4% -17.0% 18944 ± 3% proc-vmstat.nr_active_anon 2132359 ± 5% -23.4% 1632702 proc-vmstat.nr_mapped 5216 ± 5% -10.9% 4645 proc-vmstat.nr_page_table_pages 22819 ± 4% -17.0% 18944 ± 3% proc-vmstat.nr_zone_active_anon 7.104e+08 +8.8% 7.725e+08 proc-vmstat.numa_hit 7.101e+08 +8.8% 7.723e+08 proc-vmstat.numa_local 54433 ± 6% -19.1% 44024 proc-vmstat.pgactivate 7.114e+08 +8.7% 7.736e+08 proc-vmstat.pgalloc_normal 7.09e+08 +8.8% 7.71e+08 proc-vmstat.pgfault 7.114e+08 +8.7% 7.735e+08 proc-vmstat.pgfree 268108 ± 2% +8.6% 291111 proc-vmstat.pgreuse 584988 ± 17% -29.6% 412076 ± 13% numa-vmstat.node0.nr_mapped 497690 ± 8% -23.6% 380252 numa-vmstat.node1.nr_mapped 1081 ± 23% -25.4% 806.50 ± 6% numa-vmstat.node1.nr_page_table_pages 90657677 ± 3% +9.5% 99278002 numa-vmstat.node1.numa_hit 90503652 ± 3% +9.5% 99112443 numa-vmstat.node1.numa_local 507882 ± 4% -20.7% 402874 ± 12% numa-vmstat.node2.nr_mapped 20043 ± 6% -18.3% 16379 ± 5% numa-vmstat.node3.nr_active_anon 7112 ± 5% +13.2% 8050 ± 12% numa-vmstat.node3.nr_kernel_stack 501812 ± 4% -20.6% 398460 ± 5% numa-vmstat.node3.nr_mapped 20043 ± 6% -18.3% 16379 ± 5% numa-vmstat.node3.nr_zone_active_anon 90627456 +11.8% 1.013e+08 numa-vmstat.node3.numa_hit 90496033 +11.9% 1.012e+08 numa-vmstat.node3.numa_local 131566 ± 16% -27.3% 95685 ± 13% numa-vmstat.node3.numa_other 33495 ± 2% +8.5% 36327 ± 2% softirqs.CPU122.SCHED 33428 ± 2% +10.9% 37058 ± 2% softirqs.CPU123.SCHED 32980 ± 3% +10.1% 36319 softirqs.CPU127.SCHED 33546 ± 2% +8.7% 36459 ± 2% softirqs.CPU132.SCHED 33179 ± 5% +10.0% 36510 softirqs.CPU135.SCHED 31601 ± 8% +15.5% 36501 softirqs.CPU136.SCHED 33601 ± 2% +8.4% 36421 ± 2% softirqs.CPU138.SCHED 32180 ± 2% +13.0% 36361 softirqs.CPU142.SCHED 32887 ± 4% +12.0% 36840 ± 2% softirqs.CPU143.SCHED 33194 ± 3% +7.2% 35597 ± 2% softirqs.CPU162.SCHED 33042 ± 4% +10.6% 36534 softirqs.CPU27.SCHED 32749 ± 2% +11.6% 36545 softirqs.CPU36.SCHED 33101 ± 2% +10.3% 36497 softirqs.CPU42.SCHED 32471 ± 2% +11.1% 36075 softirqs.CPU44.SCHED 32250 ± 3% +10.3% 35566 ± 2% softirqs.CPU46.SCHED 32214 ± 6% +12.4% 36197 softirqs.CPU47.SCHED 7658 ± 13% +30.4% 9987 ± 21% softirqs.CPU67.RCU 7589 ± 13% +27.7% 9692 ± 12% softirqs.CPU69.RCU 0.05 ±136% -86.7% 0.01 ± 39% perf-sched.sch_delay.max.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown] 0.00 ±101% +222.2% 0.01 ± 39% perf-sched.sch_delay.max.ms.io_schedule.__lock_page_killable.filemap_fault.__do_fault 5.68 ± 60% -98.3% 0.10 ± 45% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 0.24 ± 73% +8.4e+05% 2049 ±173% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_select 5030 ± 14% +52.1% 7652 ± 12% perf-sched.total_wait_and_delay.max.ms 5029 ± 14% +50.6% 7576 ± 11% perf-sched.total_wait_time.max.ms 5.51 ± 18% -50.9% 2.71 ± 49% perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 0.98 ± 18% -46.1% 0.53 ± 48% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown] 4.82 ± 5% +14.3% 5.51 ± 8% perf-sched.wait_and_delay.avg.ms.preempt_schedule_common._cond_resched.stop_one_cpu.affine_move_task.__set_cpus_allowed_ptr 214.13 ± 12% +46.1% 312.79 ± 8% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll 1146 ± 4% -44.9% 632.00 ± 12% perf-sched.wait_and_delay.count.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown] 3107 ± 7% -37.9% 1930 ± 8% perf-sched.wait_and_delay.count.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown] 67.75 ± 10% -29.9% 47.50 ± 3% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll 1185 ± 26% +70.3% 2019 ± 21% perf-sched.wait_and_delay.max.ms.preempt_schedule_common._cond_resched.stop_one_cpu.affine_move_task.__set_cpus_allowed_ptr 2862 ± 23% +48.4% 4248 ± 28% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll 4513 ± 10% +54.1% 6953 ± 9% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork 5.50 ± 18% -51.0% 2.70 ± 49% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 0.96 ± 17% -47.3% 0.51 ± 50% perf-sched.wait_time.avg.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown] 4.82 ± 5% +14.3% 5.51 ± 8% perf-sched.wait_time.avg.ms.preempt_schedule_common._cond_resched.stop_one_cpu.affine_move_task.__set_cpus_allowed_ptr 0.06 ± 21% +182.5% 0.16 ± 73% perf-sched.wait_time.avg.ms.preempt_schedule_common._cond_resched.stop_one_cpu.sched_exec.bprm_execve 213.87 ± 12% +46.1% 312.55 ± 8% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll 1.50 ± 58% +2098.6% 33.07 ±139% perf-sched.wait_time.avg.ms.schedule_timeout.__skb_wait_for_more_packets.unix_dgram_recvmsg.__sys_recvfrom 8.74 ± 3% -24.6% 6.59 ± 6% perf-sched.wait_time.avg.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe 2303 ± 6% -19.1% 1864 ± 15% perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64 16.35 ± 27% -20.4% 13.01 perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown] 1185 ± 26% +70.3% 2019 ± 21% perf-sched.wait_time.max.ms.preempt_schedule_common._cond_resched.stop_one_cpu.affine_move_task.__set_cpus_allowed_ptr 2862 ± 23% +48.4% 4248 ± 28% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll 1.50 ± 58% +2098.6% 33.07 ±139% perf-sched.wait_time.max.ms.schedule_timeout.__skb_wait_for_more_packets.unix_dgram_recvmsg.__sys_recvfrom 533.83 ± 18% -56.5% 232.10 ± 37% perf-sched.wait_time.max.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe 4512 ± 10% +54.1% 6953 ± 9% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork 1.473e+10 +8.3% 1.596e+10 perf-stat.i.branch-instructions 67391842 -13.6% 58252334 ± 2% perf-stat.i.cache-misses 1.82 -10.5% 1.63 ± 2% perf-stat.i.cpi 1.141e+11 ± 3% -20.9% 9.022e+10 perf-stat.i.cpu-cycles 1585 ± 2% -4.5% 1514 ± 3% perf-stat.i.cycles-between-cache-misses 1.505e+10 +8.1% 1.626e+10 perf-stat.i.dTLB-loads 4.219e+09 +7.7% 4.544e+09 perf-stat.i.dTLB-stores 5.387e+10 +8.1% 5.822e+10 perf-stat.i.instructions 0.56 +11.1% 0.63 ± 2% perf-stat.i.ipc 0.59 ± 3% -21.4% 0.47 perf-stat.i.metric.GHz 178.65 +7.4% 191.79 perf-stat.i.metric.M/sec 2221317 +9.1% 2423232 perf-stat.i.minor-faults 5686701 ± 3% -20.6% 4517487 ± 6% perf-stat.i.node-load-misses 2717590 ± 5% -44.2% 1517657 ± 3% perf-stat.i.node-store-misses 8459610 ± 2% +10.0% 9304231 perf-stat.i.node-stores 2221319 +9.1% 2423234 perf-stat.i.page-faults 2.13 ± 4% -27.1% 1.55 perf-stat.overall.cpi 1693 ± 3% -8.6% 1547 ± 2% perf-stat.overall.cycles-between-cache-misses 6488 +8.4% 7033 ± 4% perf-stat.overall.instructions-per-iTLB-miss 0.47 ± 4% +36.9% 0.65 perf-stat.overall.ipc 82.36 -3.7 78.65 perf-stat.overall.node-load-miss-rate% 24.27 ± 5% -10.3 13.95 ± 3% perf-stat.overall.node-store-miss-rate% 1.522e+10 +7.4% 1.634e+10 perf-stat.ps.branch-instructions 69718706 -14.4% 59676985 ± 2% perf-stat.ps.cache-misses 1.181e+11 ± 3% -21.8% 9.228e+10 perf-stat.ps.cpu-cycles 1.553e+10 +7.1% 1.663e+10 perf-stat.ps.dTLB-loads 3543187 ± 4% +8.0% 3825044 ± 4% perf-stat.ps.dTLB-store-misses 4.332e+09 +6.9% 4.633e+09 perf-stat.ps.dTLB-stores 5.554e+10 +7.2% 5.953e+10 perf-stat.ps.instructions 1.46 +18.1% 1.72 ± 9% perf-stat.ps.major-faults 2307245 +7.9% 2489992 perf-stat.ps.minor-faults 5847320 ± 3% -21.4% 4594154 ± 6% perf-stat.ps.node-load-misses 2817678 ± 5% -45.0% 1550607 ± 3% perf-stat.ps.node-store-misses 8791418 +8.8% 9563343 perf-stat.ps.node-stores 2307247 +7.9% 2489994 perf-stat.ps.page-faults 1.706e+13 +8.0% 1.843e+13 perf-stat.total.instructions 1903 ± 39% +114.3% 4079 ± 53% interrupts.CPU1.CAL:Function_call_interrupts 1773 ± 38% +57.2% 2787 ± 32% interrupts.CPU102.CAL:Function_call_interrupts 1683 ± 44% +146.8% 4155 ± 74% interrupts.CPU113.CAL:Function_call_interrupts 1700 ± 46% +140.0% 4081 ± 75% interrupts.CPU115.CAL:Function_call_interrupts 73.75 ± 11% +103.1% 149.75 ± 68% interrupts.CPU115.RES:Rescheduling_interrupts 63.00 ± 8% +121.0% 139.25 ± 52% interrupts.CPU117.RES:Rescheduling_interrupts 59.75 ± 11% +137.7% 142.00 ± 70% interrupts.CPU118.RES:Rescheduling_interrupts 1658 ± 44% +115.0% 3566 ± 50% interrupts.CPU119.CAL:Function_call_interrupts 1700 ± 44% +60.4% 2727 ± 31% interrupts.CPU12.CAL:Function_call_interrupts 1979 ± 36% +202.4% 5986 ± 39% interrupts.CPU148.CAL:Function_call_interrupts 1415 ± 24% -39.1% 862.50 ± 37% interrupts.CPU154.NMI:Non-maskable_interrupts 1415 ± 24% -39.1% 862.50 ± 37% interrupts.CPU154.PMI:Performance_monitoring_interrupts 2737 ± 58% -47.0% 1449 ± 9% interrupts.CPU156.NMI:Non-maskable_interrupts 2737 ± 58% -47.0% 1449 ± 9% interrupts.CPU156.PMI:Performance_monitoring_interrupts 5.50 ± 71% +818.2% 50.50 ± 99% interrupts.CPU166.TLB:TLB_shootdowns 1705 ± 12% -29.8% 1197 ± 20% interrupts.CPU169.NMI:Non-maskable_interrupts 1705 ± 12% -29.8% 1197 ± 20% interrupts.CPU169.PMI:Performance_monitoring_interrupts 1664 ± 44% +51.8% 2527 ± 23% interrupts.CPU18.CAL:Function_call_interrupts 1620 ± 7% -16.6% 1351 ± 6% interrupts.CPU182.NMI:Non-maskable_interrupts 1620 ± 7% -16.6% 1351 ± 6% interrupts.CPU182.PMI:Performance_monitoring_interrupts 1704 ± 11% -20.5% 1354 ± 4% interrupts.CPU188.NMI:Non-maskable_interrupts 1704 ± 11% -20.5% 1354 ± 4% interrupts.CPU188.PMI:Performance_monitoring_interrupts 1662 ± 44% +107.1% 3442 ± 52% interrupts.CPU19.CAL:Function_call_interrupts 1706 ± 10% -25.6% 1269 ± 29% interrupts.CPU191.NMI:Non-maskable_interrupts 1706 ± 10% -25.6% 1269 ± 29% interrupts.CPU191.PMI:Performance_monitoring_interrupts 275.00 ± 63% -55.6% 122.00 ± 40% interrupts.CPU24.RES:Rescheduling_interrupts 205.25 ± 47% -40.1% 123.00 ± 63% interrupts.CPU26.RES:Rescheduling_interrupts 365.00 ± 49% -74.0% 95.00 ± 57% interrupts.CPU27.RES:Rescheduling_interrupts 196.75 ± 42% -54.8% 89.00 ± 47% interrupts.CPU28.RES:Rescheduling_interrupts 1535 ± 25% -40.4% 914.75 ± 37% interrupts.CPU3.NMI:Non-maskable_interrupts 1535 ± 25% -40.4% 914.75 ± 37% interrupts.CPU3.PMI:Performance_monitoring_interrupts 283.75 ± 75% -49.9% 142.25 ± 87% interrupts.CPU30.RES:Rescheduling_interrupts 444.25 ± 83% -61.1% 172.75 ±112% interrupts.CPU35.RES:Rescheduling_interrupts 243.50 ± 77% -59.1% 99.50 ± 71% interrupts.CPU39.RES:Rescheduling_interrupts 1719 ± 45% +81.2% 3115 ± 44% interrupts.CPU4.CAL:Function_call_interrupts 232.00 ± 69% -57.7% 98.25 ± 66% interrupts.CPU41.RES:Rescheduling_interrupts 313.50 ± 84% -71.5% 89.50 ± 69% interrupts.CPU46.RES:Rescheduling_interrupts 7.75 ±121% +638.7% 57.25 ±121% interrupts.CPU52.TLB:TLB_shootdowns 3.75 ± 34% +1753.3% 69.50 ± 60% interrupts.CPU59.TLB:TLB_shootdowns 1887 ± 16% -35.3% 1221 ± 27% interrupts.CPU62.NMI:Non-maskable_interrupts 1887 ± 16% -35.3% 1221 ± 27% interrupts.CPU62.PMI:Performance_monitoring_interrupts 1728 ± 13% -18.2% 1414 ± 7% interrupts.CPU65.NMI:Non-maskable_interrupts 1728 ± 13% -18.2% 1414 ± 7% interrupts.CPU65.PMI:Performance_monitoring_interrupts 2274 ± 45% -38.0% 1411 ± 6% interrupts.CPU67.NMI:Non-maskable_interrupts 2274 ± 45% -38.0% 1411 ± 6% interrupts.CPU67.PMI:Performance_monitoring_interrupts 2155 ± 7% +20.1% 2587 ± 17% interrupts.CPU73.CAL:Function_call_interrupts 120.50 ± 71% -52.1% 57.75 ± 4% interrupts.CPU81.RES:Rescheduling_interrupts 148.50 ± 89% -62.5% 55.75 ± 6% interrupts.CPU82.RES:Rescheduling_interrupts 157.75 ± 86% -58.2% 66.00 ± 17% interrupts.CPU84.RES:Rescheduling_interrupts 2192 ± 8% +26.9% 2782 ± 26% interrupts.CPU87.CAL:Function_call_interrupts 236.50 ±118% -76.8% 54.75 ± 5% interrupts.CPU88.RES:Rescheduling_interrupts 165.25 ±100% -66.4% 55.50 ± 7% interrupts.CPU91.RES:Rescheduling_interrupts 1844 ± 25% +54.6% 2852 ± 34% interrupts.CPU94.CAL:Function_call_interrupts 1798 ± 44% +131.5% 4164 ± 76% interrupts.CPU97.CAL:Function_call_interrupts 962.75 ± 22% +47.1% 1416 ± 5% interrupts.CPU97.NMI:Non-maskable_interrupts 962.75 ± 22% +47.1% 1416 ± 5% interrupts.CPU97.PMI:Performance_monitoring_interrupts 22.80 ± 60% -22.6 0.25 ±173% perf-profile.calltrace.cycles-pp.asm_exc_page_fault 22.75 ± 60% -22.5 0.25 ±173% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 22.73 ± 60% -22.5 0.25 ±173% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 22.50 ± 60% -22.3 0.24 ±173% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 42.69 ± 12% -17.0 25.71 ± 6% perf-profile.calltrace.cycles-pp.shmem_add_to_page_cache.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault 40.06 ± 12% -16.5 23.53 ± 6% perf-profile.calltrace.cycles-pp.mem_cgroup_charge.shmem_add_to_page_cache.shmem_getpage_gfp.shmem_fault.__do_fault 56.69 ± 9% -14.3 42.35 ± 4% perf-profile.calltrace.cycles-pp.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault 56.89 ± 9% -14.3 42.63 ± 4% perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault 56.92 ± 9% -14.2 42.68 ± 4% perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 61.30 ± 8% -13.5 47.82 ± 3% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 61.64 ± 8% -13.5 48.18 ± 3% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 15.26 ± 14% -5.6 9.71 ± 7% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.mem_cgroup_charge.shmem_add_to_page_cache.shmem_getpage_gfp.shmem_fault 2.72 ± 2% -0.6 2.09 ± 7% perf-profile.calltrace.cycles-pp.__pagevec_lru_add.lru_cache_add.shmem_getpage_gfp.shmem_fault.__do_fault 2.85 ± 2% -0.6 2.25 ± 7% perf-profile.calltrace.cycles-pp.lru_cache_add.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault 1.54 ± 7% -0.5 1.07 ± 10% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.lock_page_lruvec_irqsave.__pagevec_lru_add.lru_cache_add.shmem_getpage_gfp 1.49 ± 7% -0.5 1.02 ± 11% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.lock_page_lruvec_irqsave.__pagevec_lru_add.lru_cache_add 1.54 ± 7% -0.5 1.08 ± 10% perf-profile.calltrace.cycles-pp.lock_page_lruvec_irqsave.__pagevec_lru_add.lru_cache_add.shmem_getpage_gfp.shmem_fault 1.42 ± 4% +0.3 1.72 ± 3% perf-profile.calltrace.cycles-pp.shmem_alloc_page.shmem_alloc_and_acct_page.shmem_getpage_gfp.shmem_fault.__do_fault 0.62 ± 19% +0.4 1.06 ± 2% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.shmem_alloc_page.shmem_alloc_and_acct_page 0.12 ±173% +0.4 0.57 ± 2% perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_try_charge.try_charge.mem_cgroup_charge.shmem_add_to_page_cache 1.78 ± 8% +0.5 2.23 ± 4% perf-profile.calltrace.cycles-pp.try_charge.mem_cgroup_charge.shmem_add_to_page_cache.shmem_getpage_gfp.shmem_fault 1.21 ± 23% +0.5 1.68 ± 3% perf-profile.calltrace.cycles-pp.page_counter_try_charge.try_charge.mem_cgroup_charge.shmem_add_to_page_cache.shmem_getpage_gfp 1.03 ± 22% +0.5 1.52 ± 3% perf-profile.calltrace.cycles-pp.alloc_pages_vma.shmem_alloc_page.shmem_alloc_and_acct_page.shmem_getpage_gfp.shmem_fault 1.94 ± 6% +0.5 2.43 ± 4% perf-profile.calltrace.cycles-pp.shmem_alloc_and_acct_page.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault 0.00 +0.5 0.53 ± 2% perf-profile.calltrace.cycles-pp.unlock_page.filemap_map_pages.do_fault.__handle_mm_fault.handle_mm_fault 0.00 +0.5 0.53 ± 3% perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma 0.15 ±173% +0.6 0.70 ± 11% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 0.15 ±173% +0.6 0.70 ± 11% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 0.15 ±173% +0.6 0.70 ± 11% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 0.15 ±173% +0.6 0.70 ± 11% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 0.15 ±173% +0.6 0.70 ± 11% perf-profile.calltrace.cycles-pp.__munmap 0.78 ± 18% +0.6 1.35 ± 3% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.alloc_pages_vma.shmem_alloc_page.shmem_alloc_and_acct_page.shmem_getpage_gfp 0.17 ±173% +0.7 0.88 ± 3% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.shmem_alloc_page 2.69 ± 3% +0.8 3.52 ± 3% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 4.14 ± 3% +1.3 5.39 ± 5% perf-profile.calltrace.cycles-pp.clear_page_erms.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault 8.12 ± 26% +7.6 15.71 ± 4% perf-profile.calltrace.cycles-pp.do_rw_once 42.71 ± 12% -16.9 25.78 ± 6% perf-profile.children.cycles-pp.shmem_add_to_page_cache 40.19 ± 12% -16.6 23.62 ± 6% perf-profile.children.cycles-pp.mem_cgroup_charge 56.70 ± 9% -14.3 42.40 ± 4% perf-profile.children.cycles-pp.shmem_getpage_gfp 56.90 ± 9% -14.2 42.67 ± 4% perf-profile.children.cycles-pp.shmem_fault 56.93 ± 9% -14.2 42.71 ± 4% perf-profile.children.cycles-pp.__do_fault 61.33 ± 8% -13.4 47.91 ± 3% perf-profile.children.cycles-pp.do_fault 62.71 ± 8% -13.4 49.34 ± 3% perf-profile.children.cycles-pp.handle_mm_fault 61.69 ± 8% -13.4 48.32 ± 3% perf-profile.children.cycles-pp.__handle_mm_fault 63.34 ± 8% -13.2 50.13 ± 3% perf-profile.children.cycles-pp.do_user_addr_fault 63.41 ± 8% -13.2 50.21 ± 3% perf-profile.children.cycles-pp.exc_page_fault 64.44 ± 7% -12.3 52.11 ± 3% perf-profile.children.cycles-pp.asm_exc_page_fault 15.47 ± 13% -5.7 9.78 ± 7% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm 2.67 ± 5% -1.6 1.02 ± 5% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 1.73 ± 6% -1.0 0.69 ± 4% perf-profile.children.cycles-pp.__mod_memcg_state 2.77 ± 5% -0.7 2.11 ± 7% perf-profile.children.cycles-pp.__pagevec_lru_add 2.90 ± 5% -0.6 2.27 ± 7% perf-profile.children.cycles-pp.lru_cache_add 1.58 ± 8% -0.5 1.10 ± 10% perf-profile.children.cycles-pp.lock_page_lruvec_irqsave 1.69 ± 8% -0.5 1.23 ± 10% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 1.58 ± 8% -0.4 1.14 ± 10% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 0.62 ± 24% -0.3 0.31 ± 4% perf-profile.children.cycles-pp.page_remove_rmap 1.13 ± 6% -0.2 0.94 ± 6% perf-profile.children.cycles-pp.__count_memcg_events 1.12 ± 2% -0.1 0.99 ± 3% perf-profile.children.cycles-pp.page_add_file_rmap 0.55 ± 4% -0.1 0.46 ± 7% perf-profile.children.cycles-pp.mem_cgroup_charge_statistics 1.36 ± 3% -0.1 1.27 ± 3% perf-profile.children.cycles-pp.finish_fault 0.11 ± 11% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.obj_cgroup_charge 0.06 ± 9% +0.0 0.07 perf-profile.children.cycles-pp.__might_sleep 0.09 ± 5% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.___might_sleep 0.10 ± 5% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.cap_vm_enough_memory 0.10 ± 10% +0.0 0.13 ± 3% perf-profile.children.cycles-pp.xas_start 0.08 ± 5% +0.0 0.10 ± 8% perf-profile.children.cycles-pp.vmacache_find 0.09 ± 9% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.find_vma 0.09 ± 8% +0.0 0.12 ± 10% perf-profile.children.cycles-pp.xas_find_conflict 0.06 ± 9% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.cgroup_throttle_swaprate 0.10 ± 9% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.percpu_counter_add_batch 0.11 ± 7% +0.0 0.14 ± 5% perf-profile.children.cycles-pp.security_vm_enough_memory_mm 0.12 +0.0 0.16 ± 9% perf-profile.children.cycles-pp.shmem_pseudo_vma_init 0.16 ± 6% +0.0 0.21 ± 5% perf-profile.children.cycles-pp.find_get_entry 0.17 ± 4% +0.0 0.22 ± 5% perf-profile.children.cycles-pp.find_lock_entry 0.00 +0.1 0.05 ± 9% perf-profile.children.cycles-pp.__slab_alloc 0.00 +0.1 0.05 ± 9% perf-profile.children.cycles-pp.___slab_alloc 0.24 ± 5% +0.1 0.30 ± 5% perf-profile.children.cycles-pp.___perf_sw_event 0.01 ±173% +0.1 0.07 ± 34% perf-profile.children.cycles-pp.update_curr 0.29 ± 4% +0.1 0.36 ± 2% perf-profile.children.cycles-pp.xas_find 0.34 ± 9% +0.1 0.42 ± 8% perf-profile.children.cycles-pp.__mod_node_page_state 0.33 ± 4% +0.1 0.42 ± 3% perf-profile.children.cycles-pp.__perf_sw_event 0.29 ± 6% +0.1 0.38 ± 5% perf-profile.children.cycles-pp._raw_spin_lock 0.36 ± 13% +0.1 0.45 ± 3% perf-profile.children.cycles-pp.xas_load 0.41 ± 12% +0.1 0.52 ± 7% perf-profile.children.cycles-pp.__mod_lruvec_state 0.31 ± 24% +0.1 0.43 ± 8% perf-profile.children.cycles-pp.xas_store 0.42 +0.1 0.55 ± 3% perf-profile.children.cycles-pp.rmqueue_bulk 0.21 ± 3% +0.1 0.35 ± 25% perf-profile.children.cycles-pp.task_tick_fair 0.63 +0.1 0.77 ± 4% perf-profile.children.cycles-pp.sync_regs 0.33 ± 5% +0.2 0.49 ± 8% perf-profile.children.cycles-pp.lock_page_memcg 0.75 +0.2 0.92 ± 2% perf-profile.children.cycles-pp.rmqueue 0.51 ± 6% +0.2 0.69 ± 2% perf-profile.children.cycles-pp.unlock_page 0.94 +0.2 1.12 ± 2% perf-profile.children.cycles-pp.get_page_from_freelist 1.31 +0.2 1.55 ± 2% perf-profile.children.cycles-pp.alloc_pages_vma 1.24 +0.2 1.49 ± 3% perf-profile.children.cycles-pp.__alloc_pages_nodemask 1.45 +0.3 1.74 ± 3% perf-profile.children.cycles-pp.shmem_alloc_page 1.40 ± 6% +0.4 1.76 ± 4% perf-profile.children.cycles-pp.page_counter_try_charge 0.27 ± 88% +0.4 0.70 ± 11% perf-profile.children.cycles-pp.__munmap 1.82 ± 6% +0.4 2.26 ± 4% perf-profile.children.cycles-pp.try_charge 1.98 ± 4% +0.5 2.46 ± 3% perf-profile.children.cycles-pp.shmem_alloc_and_acct_page 3.39 ± 6% +0.5 3.90 ± 5% perf-profile.children.cycles-pp.native_irq_return_iret 2.81 ± 2% +0.8 3.65 ± 2% perf-profile.children.cycles-pp.filemap_map_pages 4.27 ± 3% +1.3 5.52 ± 5% perf-profile.children.cycles-pp.clear_page_erms 7.79 ± 27% +7.5 15.33 ± 4% perf-profile.children.cycles-pp.do_rw_once 22.16 ± 13% -11.2 10.93 ± 8% perf-profile.self.cycles-pp.mem_cgroup_charge 15.33 ± 13% -5.7 9.63 ± 8% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm 1.72 ± 6% -1.0 0.67 ± 4% perf-profile.self.cycles-pp.__mod_memcg_state 0.94 ± 4% -0.6 0.34 ± 11% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 1.58 ± 8% -0.4 1.14 ± 10% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 1.13 ± 5% -0.2 0.93 ± 7% perf-profile.self.cycles-pp.__count_memcg_events 0.08 ± 5% -0.1 0.03 ±100% perf-profile.self.cycles-pp.obj_cgroup_charge 0.09 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.___might_sleep 0.07 +0.0 0.09 ± 7% perf-profile.self.cycles-pp.xas_create 0.11 ± 4% +0.0 0.13 ± 3% perf-profile.self.cycles-pp.asm_exc_page_fault 0.10 +0.0 0.12 ± 6% perf-profile.self.cycles-pp.do_fault 0.09 ± 10% +0.0 0.11 ± 4% perf-profile.self.cycles-pp.percpu_counter_add_batch 0.08 ± 5% +0.0 0.10 ± 7% perf-profile.self.cycles-pp.xas_find_conflict 0.15 ± 5% +0.0 0.17 ± 4% perf-profile.self.cycles-pp.do_user_addr_fault 0.11 ± 4% +0.0 0.13 ± 3% perf-profile.self.cycles-pp.xas_find 0.24 ± 2% +0.0 0.27 perf-profile.self.cycles-pp.rmqueue 0.11 ± 4% +0.0 0.14 ± 11% perf-profile.self.cycles-pp.shmem_pseudo_vma_init 0.08 ± 10% +0.0 0.11 ± 6% perf-profile.self.cycles-pp.page_add_file_rmap 0.03 ±100% +0.0 0.06 perf-profile.self.cycles-pp.cap_vm_enough_memory 0.03 ±100% +0.0 0.07 ± 7% perf-profile.self.cycles-pp.cgroup_throttle_swaprate 0.19 ± 6% +0.0 0.23 ± 4% perf-profile.self.cycles-pp.___perf_sw_event 0.23 ± 4% +0.0 0.27 ± 4% perf-profile.self.cycles-pp._raw_spin_lock 0.15 ± 5% +0.1 0.20 ± 7% perf-profile.self.cycles-pp.__alloc_pages_nodemask 0.01 ±173% +0.1 0.07 ± 24% perf-profile.self.cycles-pp.task_tick_fair 0.22 ± 3% +0.1 0.28 perf-profile.self.cycles-pp.handle_mm_fault 0.28 +0.1 0.35 ± 5% perf-profile.self.cycles-pp.__handle_mm_fault 0.22 ± 5% +0.1 0.28 ± 3% perf-profile.self.cycles-pp.alloc_set_pte 0.29 ± 3% +0.1 0.36 ± 2% perf-profile.self.cycles-pp.rmqueue_bulk 0.20 ± 16% +0.1 0.27 ± 10% perf-profile.self.cycles-pp.shmem_fault 0.16 ± 13% +0.1 0.23 ± 6% perf-profile.self.cycles-pp.xas_store 0.27 ± 13% +0.1 0.34 ± 5% perf-profile.self.cycles-pp.xas_load 0.33 ± 9% +0.1 0.41 ± 8% perf-profile.self.cycles-pp.__mod_node_page_state 0.38 ± 2% +0.1 0.46 ± 8% perf-profile.self.cycles-pp.__pagevec_lru_add 0.22 ± 7% +0.1 0.31 ± 2% perf-profile.self.cycles-pp.shmem_add_to_page_cache 0.45 ± 4% +0.1 0.55 ± 6% perf-profile.self.cycles-pp.try_charge 0.62 +0.1 0.76 ± 4% perf-profile.self.cycles-pp.sync_regs 0.33 ± 4% +0.2 0.48 ± 8% perf-profile.self.cycles-pp.lock_page_memcg 0.49 ± 7% +0.2 0.65 ± 2% perf-profile.self.cycles-pp.unlock_page 0.20 ± 16% +0.2 0.39 ± 12% perf-profile.self.cycles-pp.shmem_alloc_and_acct_page 0.83 ± 6% +0.3 1.16 ± 5% perf-profile.self.cycles-pp.page_counter_try_charge 3.39 ± 6% +0.5 3.90 ± 5% perf-profile.self.cycles-pp.native_irq_return_iret 1.89 +0.6 2.45 ± 2% perf-profile.self.cycles-pp.filemap_map_pages 4.24 ± 3% +1.2 5.45 ± 5% perf-profile.self.cycles-pp.clear_page_erms 4.46 ± 2% +1.4 5.86 ± 5% perf-profile.self.cycles-pp.shmem_getpage_gfp 4.07 ± 26% +3.7 7.75 ± 3% perf-profile.self.cycles-pp.do_access 6.29 ± 28% +6.2 12.50 ± 3% perf-profile.self.cycles-pp.do_rw_once vm-scalability.throughput 1.2e+08 +-----------------------------------------------------------------+ | O | 1e+08 |-+ O O O O O O O O O O O O | | | | O O O O O O O O | 8e+07 |-+ | |. .+.+..+. .+.+..+.+. .+.+..+. .+.+.. .+.. | 6e+07 |-+..+ +..+ +..+ +.+. +. .+..+.+ +.| | + | 4e+07 |-+ | | | | | 2e+07 |-+ | | | 0 +-----------------------------------------------------------------+ vm-scalability.free_time 0.025 +-------------------------------------------------------------------+ |.+.. .+.+.+.. .+.. .+.. | | +.+..+.+. .+..+ +.+.+..+ +.+..+.+.+..+. .+. .+.| 0.02 |-+ + +. +. | | | | | 0.015 |-+ | | O O O O O O O O O O O O O O O O O O O O O | 0.01 |-+ | | | | | 0.005 |-+ | | | | | 0 +-------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Oliver Sang