* [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression
@ 2024-01-22 8:39 kernel test robot
2024-01-22 21:39 ` Yosry Ahmed
0 siblings, 1 reply; 6+ messages in thread
From: kernel test robot @ 2024-01-22 8:39 UTC (permalink / raw)
To: Yosry Ahmed
Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner,
Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen,
Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song,
Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups,
linux-mm, ying.huang, feng.tang, fengwei.yin, oliver.sang
hi, Yosry Ahmed,
per your suggestion in
https://lore.kernel.org/all/CAJD7tkameJBrJQxRj+ibKL6-yd-i0wyoyv2cgZdh3ZepA1p7wA@mail.gmail.com/
"I think it would be useful to know if there are
regressions/improvements in other microbenchmarks, at least to
investigate whether they represent real regressions."
we still report below two regressions to you just FYI what we observed in our
microbenchmark tests.
(we still captured will-it-scale::fallocate regression but ignore here per
your commit message)
Hello,
kernel test robot noticed a -36.6% regression of vm-scalability.throughput on:
commit: 8d59d2214c2362e7a9d185d80b613e632581af7b ("mm: memcg: make stats flushing threshold per-memcg")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: vm-scalability
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:
runtime: 300s
size: 1T
test: lru-shm
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
In addition to that, the commit also has significant impact on the following tests:
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -32.3% regression |
| test machine | 104 threads 2 sockets (Skylake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=50% |
| | test=tlb_flush2 |
+------------------+----------------------------------------------------------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240122/202401221624.cb53a8ca-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/1T/lkp-cpl-4sp2/lru-shm/vm-scalability
commit:
e0bf1dc859 ("mm: memcg: move vmstats structs definition above flushing code")
8d59d2214c ("mm: memcg: make stats flushing threshold per-memcg")
e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.01 +86.7% 0.02 vm-scalability.free_time
946447 -37.8% 588327 vm-scalability.median
2.131e+08 -36.6% 1.351e+08 vm-scalability.throughput
284.74 +6.3% 302.62 vm-scalability.time.elapsed_time
284.74 +6.3% 302.62 vm-scalability.time.elapsed_time.max
30485 +14.8% 34987 vm-scalability.time.involuntary_context_switches
1893 +43.6% 2718 vm-scalability.time.percent_of_cpu_this_job_got
3855 +67.7% 6467 vm-scalability.time.system_time
1537 +14.5% 1760 vm-scalability.time.user_time
120009 -5.6% 113290 vm-scalability.time.voluntary_context_switches
6.46 +3.5 9.95 mpstat.cpu.all.sys%
21.22 +38.8% 29.46 vmstat.procs.r
0.01 ± 20% +1887.0% 0.18 ±203% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.01 ± 28% +63.3% 0.01 ± 29% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
113624 ± 5% +14.0% 129566 ± 3% meminfo.Active
113476 ± 5% +14.0% 129417 ± 3% meminfo.Active(anon)
3987746 +46.0% 5821636 meminfo.Mapped
16345 +14.6% 18729 meminfo.PageTables
474.17 ± 3% -88.9% 52.50 ±125% perf-c2c.DRAM.local
483.17 ± 5% -79.3% 99.83 ± 70% perf-c2c.DRAM.remote
1045 ± 5% -71.9% 294.00 ± 63% perf-c2c.HITM.local
119.50 ± 10% -78.8% 25.33 ± 20% perf-c2c.HITM.remote
392.33 +35.4% 531.17 turbostat.Avg_MHz
10.35 +3.7 14.00 turbostat.Busy%
90.56 -3.7 86.86 turbostat.C1%
0.28 ± 5% -31.5% 0.19 turbostat.IPC
481.33 +2.5% 493.38 turbostat.PkgWatt
999019 ± 3% +44.4% 1442651 ± 2% numa-meminfo.node0.Mapped
1005687 ± 4% +44.1% 1449402 ± 3% numa-meminfo.node1.Mapped
3689 ± 3% +21.7% 4490 ± 7% numa-meminfo.node1.PageTables
980589 ± 2% +42.3% 1395777 ± 2% numa-meminfo.node2.Mapped
96484 ± 5% +22.0% 117715 ± 4% numa-meminfo.node3.Active
96430 ± 5% +22.1% 117694 ± 4% numa-meminfo.node3.Active(anon)
991367 ± 3% +42.7% 1414337 ± 4% numa-meminfo.node3.Mapped
251219 ± 3% +44.8% 363745 ± 2% numa-vmstat.node0.nr_mapped
253252 ± 2% +44.6% 366087 ± 3% numa-vmstat.node1.nr_mapped
927.67 ± 3% +21.9% 1130 ± 7% numa-vmstat.node1.nr_page_table_pages
248171 ± 2% +42.5% 353541 ± 4% numa-vmstat.node2.nr_mapped
24188 ± 5% +21.6% 29410 ± 4% numa-vmstat.node3.nr_active_anon
245825 ± 2% +45.5% 357622 ± 3% numa-vmstat.node3.nr_mapped
1038 ± 11% +17.8% 1224 ± 6% numa-vmstat.node3.nr_page_table_pages
24188 ± 5% +21.6% 29410 ± 4% numa-vmstat.node3.nr_zone_active_anon
28376 ± 5% +14.0% 32338 ± 3% proc-vmstat.nr_active_anon
993504 +46.6% 1456136 proc-vmstat.nr_mapped
4060 +15.5% 4691 proc-vmstat.nr_page_table_pages
28376 ± 5% +14.0% 32338 ± 3% proc-vmstat.nr_zone_active_anon
1.066e+09 -2.0% 1.045e+09 proc-vmstat.numa_hit
1.065e+09 -2.0% 1.044e+09 proc-vmstat.numa_local
5659 +5.6% 5978 proc-vmstat.unevictable_pgs_culled
34604288 +3.7% 35898496 proc-vmstat.unevictable_pgs_scanned
1223376 ± 14% +119.1% 2680582 ± 9% sched_debug.cfs_rq:/.avg_vruntime.avg
1673909 ± 14% +97.6% 3308254 ± 8% sched_debug.cfs_rq:/.avg_vruntime.max
810795 ± 15% +145.8% 1993289 ± 9% sched_debug.cfs_rq:/.avg_vruntime.min
156233 ± 8% +55.1% 242331 ± 6% sched_debug.cfs_rq:/.avg_vruntime.stddev
1223376 ± 14% +119.1% 2680582 ± 9% sched_debug.cfs_rq:/.min_vruntime.avg
1673909 ± 14% +97.6% 3308254 ± 8% sched_debug.cfs_rq:/.min_vruntime.max
810795 ± 15% +145.8% 1993289 ± 9% sched_debug.cfs_rq:/.min_vruntime.min
156233 ± 8% +55.1% 242331 ± 6% sched_debug.cfs_rq:/.min_vruntime.stddev
126445 ± 3% -11.0% 112493 ± 4% sched_debug.cpu.avg_idle.stddev
1447 ± 15% +32.0% 1910 ± 9% sched_debug.cpu.nr_switches.min
0.71 +13.4% 0.80 perf-stat.i.MPKI
2.343e+10 -7.9% 2.157e+10 perf-stat.i.branch-instructions
0.36 -0.0 0.35 perf-stat.i.branch-miss-rate%
30833194 -7.3% 28584190 perf-stat.i.branch-misses
26.04 -1.4 24.66 perf-stat.i.cache-miss-rate%
51345490 ± 3% +40.7% 72258633 ± 3% perf-stat.i.cache-misses
1.616e+08 ± 6% +58.6% 2.562e+08 ± 6% perf-stat.i.cache-references
1.29 +9.4% 1.42 perf-stat.i.cpi
8.394e+10 +33.7% 1.122e+11 perf-stat.i.cpu-cycles
505.77 -2.6% 492.52 perf-stat.i.cpu-migrations
0.03 +0.0 0.03 ± 2% perf-stat.i.dTLB-load-miss-rate%
2.335e+10 -7.4% 2.162e+10 perf-stat.i.dTLB-loads
0.03 +0.0 0.03 perf-stat.i.dTLB-store-miss-rate%
3948344 -8.0% 3633633 perf-stat.i.dTLB-store-misses
6.549e+09 -7.0% 6.09e+09 perf-stat.i.dTLB-stores
17546602 -22.8% 13551001 perf-stat.i.iTLB-load-misses
2552560 -2.6% 2485876 perf-stat.i.iTLB-loads
8.367e+10 -7.5% 7.737e+10 perf-stat.i.instructions
4706 +7.7% 5070 perf-stat.i.instructions-per-iTLB-miss
0.81 -12.0% 0.72 perf-stat.i.ipc
1.59 ± 3% -22.3% 1.23 ± 4% perf-stat.i.major-faults
0.37 +34.2% 0.49 perf-stat.i.metric.GHz
233.98 -6.9% 217.90 perf-stat.i.metric.M/sec
3619177 -9.5% 3276556 perf-stat.i.minor-faults
74.28 +4.8 79.04 perf-stat.i.node-load-miss-rate%
2898733 ± 4% +49.0% 4320557 perf-stat.i.node-load-misses
1928237 ± 4% -11.9% 1698426 perf-stat.i.node-loads
13383344 ± 2% +4.7% 14013398 ± 3% perf-stat.i.node-stores
3619179 -9.5% 3276558 perf-stat.i.page-faults
0.61 ± 3% +52.5% 0.94 ± 3% perf-stat.overall.MPKI
31.95 ± 2% -3.6 28.34 ± 3% perf-stat.overall.cache-miss-rate%
1.00 +45.0% 1.45 perf-stat.overall.cpi
0.07 +0.0 0.08 ± 4% perf-stat.overall.dTLB-load-miss-rate%
87.62 -2.6 85.05 perf-stat.overall.iTLB-load-miss-rate%
4778 +20.2% 5745 perf-stat.overall.instructions-per-iTLB-miss
1.00 -31.0% 0.69 perf-stat.overall.ipc
59.75 ± 3% +11.8 71.59 perf-stat.overall.node-load-miss-rate%
5145 +1.8% 5239 perf-stat.overall.path-length
2.405e+10 -6.3% 2.252e+10 perf-stat.ps.branch-instructions
31203502 -6.4% 29219514 perf-stat.ps.branch-misses
52696784 ± 3% +43.4% 75547948 ± 3% perf-stat.ps.cache-misses
1.652e+08 ± 6% +61.7% 2.672e+08 ± 7% perf-stat.ps.cache-references
8.584e+10 +36.3% 1.17e+11 perf-stat.ps.cpu-cycles
506.29 -2.0% 496.05 perf-stat.ps.cpu-migrations
2.395e+10 -5.9% 2.254e+10 perf-stat.ps.dTLB-loads
4059043 -6.2% 3806002 perf-stat.ps.dTLB-store-misses
6.688e+09 -5.7% 6.308e+09 perf-stat.ps.dTLB-stores
17944396 -21.8% 14028927 perf-stat.ps.iTLB-load-misses
2534093 -2.7% 2465233 perf-stat.ps.iTLB-loads
8.575e+10 -6.0% 8.059e+10 perf-stat.ps.instructions
1.60 ± 3% -23.2% 1.23 ± 4% perf-stat.ps.major-faults
3726053 -7.7% 3439511 perf-stat.ps.minor-faults
2942507 ± 4% +52.0% 4472428 perf-stat.ps.node-load-misses
1980077 ± 4% -10.4% 1774633 perf-stat.ps.node-loads
13780660 ± 2% +6.8% 14716100 ± 3% perf-stat.ps.node-stores
3726055 -7.7% 3439513 perf-stat.ps.page-faults
37.11 -6.7 30.40 ± 6% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
21.14 -3.8 17.36 ± 7% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
21.05 -3.8 17.29 ± 7% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
21.05 -3.8 17.29 ± 7% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
21.05 -3.8 17.29 ± 7% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
21.00 -3.8 17.25 ± 7% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
20.70 -3.7 17.00 ± 7% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
20.69 -3.7 16.99 ± 7% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
20.64 -3.7 16.95 ± 7% perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
9.51 ± 3% -1.9 7.57 ± 2% perf-profile.calltrace.cycles-pp.do_rw_once
4.54 -1.4 3.19 perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
2.83 -0.9 1.96 perf-profile.calltrace.cycles-pp.next_uptodate_folio.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
3.90 -0.6 3.34 ± 5% perf-profile.calltrace.cycles-pp.clear_page_erms.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
4.44 ± 6% -0.5 3.98 ± 3% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
1.17 ± 3% -0.4 0.73 ± 6% perf-profile.calltrace.cycles-pp.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
1.42 ± 2% -0.4 0.99 ± 2% perf-profile.calltrace.cycles-pp.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
1.32 ± 2% -0.4 0.91 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
1.19 ± 2% -0.4 0.82 perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp
0.96 ± 2% -0.3 0.65 ± 2% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio
0.98 ± 2% -0.3 0.68 ± 4% perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.do_access
1.21 +0.5 1.69 ± 5% perf-profile.calltrace.cycles-pp.__munmap
1.21 +0.5 1.69 ± 5% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
1.21 +0.5 1.69 ± 5% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
1.21 +0.5 1.69 ± 5% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
1.21 +0.5 1.69 ± 5% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
1.21 +0.5 1.69 ± 5% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.21 +0.5 1.69 ± 5% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
1.20 +0.5 1.68 ± 5% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
1.20 +0.5 1.68 ± 5% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
1.20 +0.5 1.68 ± 5% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
1.20 +0.5 1.69 ± 5% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
1.18 +0.5 1.67 ± 6% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
0.84 ± 2% +0.6 1.43 ± 5% perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp
0.58 ± 3% +0.6 1.18 ± 5% perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
0.00 +0.8 0.79 ± 4% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
0.00 +1.0 1.02 ± 5% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio
0.00 +1.1 1.08 ± 4% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
0.00 +1.5 1.46 ± 5% perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
3.29 ± 3% +1.9 5.19 perf-profile.calltrace.cycles-pp.finish_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
3.02 ± 4% +2.0 5.00 perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_read_fault.do_fault.__handle_mm_fault
2.84 ± 4% +2.0 4.86 perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault.do_fault
2.73 ± 4% +2.0 4.77 perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault
1.48 ± 4% +2.1 3.56 ± 2% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.folio_add_file_rmap_range.set_pte_range.finish_fault
0.57 ± 4% +2.8 3.35 ± 2% perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp
1.96 ± 5% +2.9 4.86 ± 2% perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
3.65 ± 2% +3.1 6.77 ± 2% perf-profile.calltrace.cycles-pp.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
0.80 ± 4% +3.1 3.92 ± 3% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp
2.68 ± 3% +3.4 6.08 ± 2% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
7.71 ± 6% +3.9 11.66 ± 2% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
67.18 +6.3 73.46 ± 3% perf-profile.calltrace.cycles-pp.do_access
1.46 ± 9% +7.1 8.57 ± 16% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio
1.50 ± 9% +7.1 8.61 ± 16% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp
1.38 ± 10% +7.1 8.51 ± 16% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru
51.46 +7.6 59.08 ± 3% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
2.98 ± 5% +7.7 10.66 ± 14% perf-profile.calltrace.cycles-pp.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
2.84 ± 6% +7.7 10.56 ± 14% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
34.18 +8.5 42.68 ± 4% perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
34.14 +8.5 42.64 ± 4% perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault
33.95 +8.6 42.51 ± 4% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault
42.88 +8.8 51.70 ± 4% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
42.34 +9.0 51.30 ± 4% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
42.29 +9.0 51.28 ± 4% perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
45.07 +9.6 54.62 ± 4% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
44.95 +9.6 54.53 ± 4% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
43.72 +9.9 53.64 ± 4% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
17.28 ± 2% +13.8 31.05 ± 6% perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
21.14 -3.8 17.36 ± 7% perf-profile.children.cycles-pp.cpu_startup_entry
21.14 -3.8 17.36 ± 7% perf-profile.children.cycles-pp.do_idle
21.14 -3.8 17.36 ± 7% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
21.09 -3.8 17.33 ± 7% perf-profile.children.cycles-pp.cpuidle_idle_call
21.05 -3.8 17.29 ± 7% perf-profile.children.cycles-pp.start_secondary
20.79 -3.7 17.07 ± 7% perf-profile.children.cycles-pp.cpuidle_enter
20.78 -3.7 17.07 ± 7% perf-profile.children.cycles-pp.cpuidle_enter_state
20.72 -3.7 17.02 ± 7% perf-profile.children.cycles-pp.acpi_idle_enter
20.71 -3.7 17.01 ± 7% perf-profile.children.cycles-pp.acpi_safe_halt
20.79 -3.6 17.19 ± 6% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
11.52 -3.1 8.42 perf-profile.children.cycles-pp.do_rw_once
4.62 -1.4 3.24 perf-profile.children.cycles-pp.filemap_map_pages
2.89 -0.9 2.00 perf-profile.children.cycles-pp.next_uptodate_folio
3.98 -0.6 3.39 ± 5% perf-profile.children.cycles-pp.clear_page_erms
4.46 ± 6% -0.5 3.99 ± 3% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
1.18 ± 4% -0.4 0.74 ± 6% perf-profile.children.cycles-pp.shmem_inode_acct_blocks
1.44 ± 2% -0.4 1.00 ± 2% perf-profile.children.cycles-pp.shmem_alloc_folio
1.40 -0.4 0.99 perf-profile.children.cycles-pp.alloc_pages_mpol
1.27 -0.4 0.90 perf-profile.children.cycles-pp.__alloc_pages
1.01 ± 2% -0.3 0.68 perf-profile.children.cycles-pp.get_page_from_freelist
1.02 ± 2% -0.3 0.70 ± 4% perf-profile.children.cycles-pp.sync_regs
0.77 ± 2% -0.3 0.51 perf-profile.children.cycles-pp.rmqueue
0.81 ± 2% -0.2 0.60 perf-profile.children.cycles-pp.__perf_sw_event
0.53 ± 3% -0.2 0.34 ± 2% perf-profile.children.cycles-pp.__rmqueue_pcplist
0.68 ± 2% -0.2 0.50 ± 5% perf-profile.children.cycles-pp.__mod_lruvec_state
0.65 ± 6% -0.2 0.47 ± 2% perf-profile.children.cycles-pp._raw_spin_lock
0.47 ± 3% -0.2 0.29 ± 2% perf-profile.children.cycles-pp.rmqueue_bulk
0.65 ± 2% -0.2 0.49 perf-profile.children.cycles-pp.___perf_sw_event
0.64 ± 4% -0.1 0.49 ± 5% perf-profile.children.cycles-pp.xas_load
0.54 -0.1 0.39 ± 4% perf-profile.children.cycles-pp.__mod_node_page_state
0.49 ± 2% -0.1 0.35 ± 3% perf-profile.children.cycles-pp.lock_vma_under_rcu
0.54 ± 5% -0.1 0.40 ± 2% perf-profile.children.cycles-pp.xas_find
0.39 ± 4% -0.1 0.28 ± 3% perf-profile.children.cycles-pp.__pte_offset_map_lock
0.39 ± 3% -0.1 0.29 ± 3% perf-profile.children.cycles-pp.xas_descend
0.32 ± 4% -0.1 0.22 ± 8% perf-profile.children.cycles-pp.__dquot_alloc_space
0.30 ± 3% -0.1 0.22 ± 3% perf-profile.children.cycles-pp.mas_walk
0.20 ± 13% -0.1 0.13 ± 5% perf-profile.children.cycles-pp.shmem_recalc_inode
0.26 ± 2% -0.1 0.19 ± 3% perf-profile.children.cycles-pp.filemap_get_entry
0.18 ± 5% -0.1 0.12 ± 5% perf-profile.children.cycles-pp.xas_find_conflict
0.28 ± 4% -0.1 0.22 ± 8% perf-profile.children.cycles-pp.__x64_sys_execve
0.28 ± 4% -0.1 0.22 ± 8% perf-profile.children.cycles-pp.do_execveat_common
0.28 ± 4% -0.1 0.22 ± 8% perf-profile.children.cycles-pp.execve
0.29 ± 3% -0.1 0.24 ± 8% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.16 ± 5% -0.1 0.11 ± 8% perf-profile.children.cycles-pp.error_entry
0.14 ± 5% -0.0 0.09 ± 8% perf-profile.children.cycles-pp.__percpu_counter_limited_add
0.15 ± 5% -0.0 0.10 ± 10% perf-profile.children.cycles-pp.inode_add_bytes
0.07 ± 6% -0.0 0.02 ± 99% perf-profile.children.cycles-pp.__folio_throttle_swaprate
0.10 -0.0 0.06 ± 13% perf-profile.children.cycles-pp.security_vm_enough_memory_mm
0.18 ± 7% -0.0 0.14 ± 13% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.16 ± 5% -0.0 0.12 perf-profile.children.cycles-pp.handle_pte_fault
0.17 ± 7% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.xas_start
0.14 ± 6% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.__pte_offset_map
0.07 ± 5% -0.0 0.03 ± 70% perf-profile.children.cycles-pp.policy_nodemask
0.16 ± 4% -0.0 0.13 ± 12% perf-profile.children.cycles-pp.folio_mark_accessed
0.19 ± 4% -0.0 0.16 ± 8% perf-profile.children.cycles-pp.bprm_execve
0.11 ± 9% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.down_read_trylock
0.16 ± 6% -0.0 0.13 ± 5% perf-profile.children.cycles-pp.percpu_counter_add_batch
0.11 ± 6% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.up_read
0.15 ± 7% -0.0 0.12 ± 13% perf-profile.children.cycles-pp.folio_unlock
0.10 ± 4% -0.0 0.07 ± 6% perf-profile.children.cycles-pp.__libc_fork
0.07 ± 6% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.ksys_read
0.10 ± 3% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.kernel_clone
0.09 ± 5% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
0.09 ± 5% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.__x64_sys_openat
0.08 ± 8% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.do_filp_open
0.08 ± 8% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.path_openat
0.07 -0.0 0.04 ± 45% perf-profile.children.cycles-pp.vfs_read
0.09 ± 4% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.__do_sys_clone
0.10 ± 6% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.pte_offset_map_nolock
0.08 ± 8% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.do_sys_openat2
0.07 ± 5% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.copy_process
0.16 ± 5% -0.0 0.14 ± 6% perf-profile.children.cycles-pp.exec_binprm
0.10 ± 6% -0.0 0.08 ± 7% perf-profile.children.cycles-pp.__vm_enough_memory
0.16 ± 4% -0.0 0.14 ± 6% perf-profile.children.cycles-pp.search_binary_handler
0.08 -0.0 0.06 ± 9% perf-profile.children.cycles-pp.__irqentry_text_end
0.09 ± 5% -0.0 0.07 ± 7% perf-profile.children.cycles-pp._compound_head
0.15 ± 5% -0.0 0.13 ± 7% perf-profile.children.cycles-pp.xas_create
0.15 ± 4% -0.0 0.14 ± 8% perf-profile.children.cycles-pp.load_elf_binary
0.12 ± 4% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.kmem_cache_alloc_lru
0.05 ± 8% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.propagate_protected_usage
0.25 ± 2% +0.0 0.30 ± 4% perf-profile.children.cycles-pp.page_counter_try_charge
0.02 ±141% +0.0 0.06 ± 7% perf-profile.children.cycles-pp.mod_objcg_state
0.00 +0.1 0.07 ± 14% perf-profile.children.cycles-pp.tlb_finish_mmu
1.25 +0.5 1.72 ± 5% perf-profile.children.cycles-pp.unmap_vmas
1.24 +0.5 1.71 ± 5% perf-profile.children.cycles-pp.zap_pte_range
1.24 +0.5 1.71 ± 5% perf-profile.children.cycles-pp.unmap_page_range
1.24 +0.5 1.71 ± 5% perf-profile.children.cycles-pp.zap_pmd_range
1.21 +0.5 1.69 ± 5% perf-profile.children.cycles-pp.__munmap
1.22 +0.5 1.71 ± 5% perf-profile.children.cycles-pp.__vm_munmap
1.21 +0.5 1.70 ± 5% perf-profile.children.cycles-pp.__x64_sys_munmap
1.25 +0.5 1.74 ± 5% perf-profile.children.cycles-pp.do_vmi_align_munmap
1.25 +0.5 1.74 ± 5% perf-profile.children.cycles-pp.do_vmi_munmap
1.22 +0.5 1.72 ± 5% perf-profile.children.cycles-pp.unmap_region
0.85 ± 2% +0.6 1.44 ± 5% perf-profile.children.cycles-pp.lru_add_fn
0.60 ± 3% +0.6 1.20 ± 4% perf-profile.children.cycles-pp.page_remove_rmap
3.30 ± 3% +1.9 5.20 perf-profile.children.cycles-pp.finish_fault
3.04 ± 4% +2.0 5.01 perf-profile.children.cycles-pp.set_pte_range
2.85 ± 4% +2.0 4.87 perf-profile.children.cycles-pp.folio_add_file_rmap_range
1.97 ± 5% +2.9 4.88 ± 2% perf-profile.children.cycles-pp.mem_cgroup_commit_charge
3.69 ± 2% +3.1 6.80 ± 2% perf-profile.children.cycles-pp.shmem_add_to_page_cache
7.74 ± 6% +3.9 11.69 ± 2% perf-profile.children.cycles-pp.__mem_cgroup_charge
0.80 ± 4% +4.0 4.85 ± 3% perf-profile.children.cycles-pp.__count_memcg_events
6.12 ± 3% +6.1 12.18 perf-profile.children.cycles-pp.__mod_lruvec_page_state
2.99 ± 3% +6.6 9.56 ± 2% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
61.44 +6.7 68.11 ± 3% perf-profile.children.cycles-pp.do_access
1.58 ± 9% +7.1 8.72 ± 16% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
1.45 ± 9% +7.2 8.63 ± 16% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
1.53 ± 9% +7.2 8.72 ± 16% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
2.98 ± 5% +7.7 10.67 ± 14% perf-profile.children.cycles-pp.folio_add_lru
2.86 ± 6% +7.8 10.63 ± 14% perf-profile.children.cycles-pp.folio_batch_move_lru
49.12 +8.3 57.47 ± 3% perf-profile.children.cycles-pp.asm_exc_page_fault
34.19 +8.5 42.68 ± 4% perf-profile.children.cycles-pp.__do_fault
34.15 +8.5 42.65 ± 4% perf-profile.children.cycles-pp.shmem_fault
33.99 +8.6 42.54 ± 4% perf-profile.children.cycles-pp.shmem_get_folio_gfp
43.06 +8.8 51.84 ± 4% perf-profile.children.cycles-pp.__handle_mm_fault
42.43 +8.9 51.37 ± 4% perf-profile.children.cycles-pp.do_fault
42.38 +9.0 51.34 ± 4% perf-profile.children.cycles-pp.do_read_fault
45.26 +9.5 54.78 ± 4% perf-profile.children.cycles-pp.exc_page_fault
45.15 +9.5 54.69 ± 4% perf-profile.children.cycles-pp.do_user_addr_fault
43.91 +9.9 53.80 ± 4% perf-profile.children.cycles-pp.handle_mm_fault
17.31 ± 2% +13.8 31.07 ± 5% perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
12.24 -4.5 7.76 ± 3% perf-profile.self.cycles-pp.shmem_get_folio_gfp
17.96 -3.3 14.66 ± 4% perf-profile.self.cycles-pp.acpi_safe_halt
10.95 -3.2 7.74 perf-profile.self.cycles-pp.do_rw_once
5.96 -1.4 4.58 ± 2% perf-profile.self.cycles-pp.do_access
2.40 -0.8 1.64 perf-profile.self.cycles-pp.next_uptodate_folio
3.92 -0.6 3.36 ± 5% perf-profile.self.cycles-pp.clear_page_erms
4.40 ± 6% -0.5 3.95 ± 3% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
1.52 ± 2% -0.4 1.10 ± 2% perf-profile.self.cycles-pp.filemap_map_pages
1.02 ± 2% -0.3 0.70 ± 4% perf-profile.self.cycles-pp.sync_regs
0.50 ± 7% -0.2 0.27 ± 5% perf-profile.self.cycles-pp.shmem_inode_acct_blocks
0.63 ± 5% -0.2 0.46 ± 3% perf-profile.self.cycles-pp._raw_spin_lock
0.42 ± 2% -0.1 0.27 ± 2% perf-profile.self.cycles-pp.rmqueue_bulk
0.52 -0.1 0.38 ± 4% perf-profile.self.cycles-pp.__mod_node_page_state
0.56 ± 2% -0.1 0.42 perf-profile.self.cycles-pp.___perf_sw_event
0.31 ± 3% -0.1 0.20 ± 2% perf-profile.self.cycles-pp.shmem_add_to_page_cache
0.38 ± 4% -0.1 0.28 perf-profile.self.cycles-pp.__handle_mm_fault
0.36 ± 4% -0.1 0.26 ± 2% perf-profile.self.cycles-pp.xas_descend
0.30 ± 2% -0.1 0.22 ± 2% perf-profile.self.cycles-pp.mas_walk
0.33 ± 3% -0.1 0.26 ± 10% perf-profile.self.cycles-pp.lru_add_fn
0.20 ± 3% -0.1 0.14 ± 5% perf-profile.self.cycles-pp.asm_exc_page_fault
0.21 ± 5% -0.1 0.15 ± 6% perf-profile.self.cycles-pp.get_page_from_freelist
0.26 ± 9% -0.1 0.20 ± 15% perf-profile.self.cycles-pp.xas_store
0.16 ± 7% -0.1 0.11 ± 6% perf-profile.self.cycles-pp.__perf_sw_event
0.18 ± 2% -0.1 0.13 ± 5% perf-profile.self.cycles-pp.__alloc_pages
0.22 ± 4% -0.1 0.17 ± 4% perf-profile.self.cycles-pp.handle_mm_fault
0.20 ± 8% -0.1 0.14 ± 5% perf-profile.self.cycles-pp.xas_find
0.15 ± 6% -0.0 0.10 ± 7% perf-profile.self.cycles-pp.error_entry
0.17 ± 2% -0.0 0.12 ± 6% perf-profile.self.cycles-pp.__dquot_alloc_space
0.17 ± 6% -0.0 0.13 ± 10% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.22 ± 4% -0.0 0.18 ± 9% perf-profile.self.cycles-pp.xas_load
0.23 ± 4% -0.0 0.19 ± 10% perf-profile.self.cycles-pp.zap_pte_range
0.12 ± 7% -0.0 0.08 ± 10% perf-profile.self.cycles-pp.__percpu_counter_limited_add
0.14 ± 3% -0.0 0.09 ± 7% perf-profile.self.cycles-pp.rmqueue
0.15 ± 6% -0.0 0.10 ± 9% perf-profile.self.cycles-pp.__mod_lruvec_state
0.15 ± 2% -0.0 0.11 ± 6% perf-profile.self.cycles-pp.do_user_addr_fault
0.12 ± 7% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.folio_add_lru
0.16 ± 7% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.xas_start
0.06 ± 7% -0.0 0.02 ± 99% perf-profile.self.cycles-pp.finish_fault
0.16 ± 4% -0.0 0.12 ± 12% perf-profile.self.cycles-pp.folio_mark_accessed
0.11 ± 8% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.__pte_offset_map_lock
0.13 ± 6% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.__pte_offset_map
0.11 ± 9% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.down_read_trylock
0.12 ± 3% -0.0 0.09 ± 5% perf-profile.self.cycles-pp.do_read_fault
0.09 ± 5% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
0.16 ± 4% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.percpu_counter_add_batch
0.11 ± 6% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.shmem_alloc_and_add_folio
0.08 ± 8% -0.0 0.05 perf-profile.self.cycles-pp.xas_find_conflict
0.12 ± 4% -0.0 0.09 ± 7% perf-profile.self.cycles-pp.folio_add_file_rmap_range
0.10 ± 6% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.up_read
0.12 ± 4% -0.0 0.10 ± 6% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.09 ± 4% -0.0 0.07 ± 7% perf-profile.self.cycles-pp.exc_page_fault
0.13 ± 6% -0.0 0.10 ± 9% perf-profile.self.cycles-pp.page_remove_rmap
0.08 -0.0 0.06 ± 8% perf-profile.self.cycles-pp.__irqentry_text_end
0.19 ± 5% -0.0 0.17 ± 5% perf-profile.self.cycles-pp.cgroup_rstat_updated
0.09 ± 6% -0.0 0.07 ± 5% perf-profile.self.cycles-pp.set_pte_range
0.07 ± 5% -0.0 0.05 ± 7% perf-profile.self.cycles-pp._compound_head
0.08 -0.0 0.06 ± 9% perf-profile.self.cycles-pp.lock_vma_under_rcu
0.05 ± 8% +0.0 0.08 ± 8% perf-profile.self.cycles-pp.propagate_protected_usage
2.93 ± 4% +0.4 3.35 ± 3% perf-profile.self.cycles-pp.__mod_lruvec_page_state
0.77 ± 7% +1.5 2.23 ± 3% perf-profile.self.cycles-pp.__mem_cgroup_charge
0.75 ± 4% +4.0 4.80 ± 3% perf-profile.self.cycles-pp.__count_memcg_events
2.83 ± 3% +6.6 9.40 ± 2% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
1.45 ± 9% +7.2 8.63 ± 16% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale
commit:
e0bf1dc859 ("mm: memcg: move vmstats structs definition above flushing code")
8d59d2214c ("mm: memcg: make stats flushing threshold per-memcg")
e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6
---------------- ---------------------------
%stddev %change %stddev
\ | \
4.05 -1.2 2.81 mpstat.cpu.all.usr%
193.83 ± 6% +69.3% 328.17 ± 8% perf-c2c.DRAM.local
1216 ± 8% +27.1% 1546 ± 6% perf-c2c.DRAM.remote
150.33 ± 13% -40.0% 90.17 ± 13% perf-c2c.HITM.remote
0.04 -25.0% 0.03 turbostat.IPC
316.16 -1.5% 311.47 turbostat.PkgWatt
30.54 +4.9% 32.04 turbostat.RAMWatt
2132437 -32.3% 1444430 will-it-scale.52.processes
41008 -32.3% 27776 will-it-scale.per_process_ops
2132437 -32.3% 1444430 will-it-scale.workload
3.113e+08 ± 3% -31.7% 2.125e+08 ± 4% numa-numastat.node0.local_node
3.114e+08 ± 3% -31.7% 2.126e+08 ± 4% numa-numastat.node0.numa_hit
3.322e+08 ± 2% -32.5% 2.243e+08 ± 3% numa-numastat.node1.local_node
3.323e+08 ± 2% -32.5% 2.243e+08 ± 3% numa-numastat.node1.numa_hit
3.114e+08 ± 3% -31.7% 2.126e+08 ± 4% numa-vmstat.node0.numa_hit
3.113e+08 ± 3% -31.7% 2.125e+08 ± 4% numa-vmstat.node0.numa_local
3.323e+08 ± 2% -32.5% 2.243e+08 ± 3% numa-vmstat.node1.numa_hit
3.322e+08 ± 2% -32.5% 2.243e+08 ± 3% numa-vmstat.node1.numa_local
0.00 ± 19% -61.1% 0.00 ± 31% perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
217.07 ± 11% -46.4% 116.39 ± 23% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
218.50 ± 6% +19.1% 260.33 ± 4% perf-sched.wait_and_delay.count.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
217.06 ± 11% -46.4% 116.38 ± 23% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
6.436e+08 -32.1% 4.369e+08 proc-vmstat.numa_hit
6.435e+08 -32.1% 4.368e+08 proc-vmstat.numa_local
6.432e+08 -32.1% 4.368e+08 proc-vmstat.pgalloc_normal
1.286e+09 -32.1% 8.726e+08 proc-vmstat.pgfault
6.432e+08 -32.1% 4.367e+08 proc-vmstat.pgfree
170696 ± 8% +3.4% 176515 ± 8% sched_debug.cpu.clock.avg
170703 ± 8% +3.4% 176522 ± 8% sched_debug.cpu.clock.max
170689 ± 8% +3.4% 176508 ± 8% sched_debug.cpu.clock.min
169431 ± 8% +3.4% 175248 ± 8% sched_debug.cpu.clock_task.avg
169630 ± 8% +3.4% 175429 ± 8% sched_debug.cpu.clock_task.max
162542 ± 8% +3.5% 168260 ± 8% sched_debug.cpu.clock_task.min
170690 ± 8% +3.4% 176508 ± 8% sched_debug.cpu_clk
170117 ± 8% +3.4% 175938 ± 8% sched_debug.ktime
171259 ± 8% +3.4% 177078 ± 8% sched_debug.sched_clk
4.06 +80.8% 7.34 perf-stat.i.MPKI
4.066e+09 -23.3% 3.12e+09 perf-stat.i.branch-instructions
0.57 -0.0 0.55 perf-stat.i.branch-miss-rate%
23478297 -25.0% 17605102 perf-stat.i.branch-misses
17.25 +7.0 24.27 perf-stat.i.cache-miss-rate%
82715093 ± 2% +35.9% 1.124e+08 perf-stat.i.cache-misses
4.795e+08 ± 2% -3.4% 4.63e+08 perf-stat.i.cache-references
7.14 +32.9% 9.49 perf-stat.i.cpi
134.85 -1.2% 133.29 perf-stat.i.cpu-migrations
1760 ± 2% -26.5% 1294 perf-stat.i.cycles-between-cache-misses
0.26 -0.0 0.24 perf-stat.i.dTLB-load-miss-rate%
13461491 -31.7% 9190211 perf-stat.i.dTLB-load-misses
5.141e+09 -24.1% 3.902e+09 perf-stat.i.dTLB-loads
0.45 -0.0 0.44 perf-stat.i.dTLB-store-miss-rate%
12934403 -32.2% 8773143 perf-stat.i.dTLB-store-misses
2.841e+09 -29.9% 1.992e+09 perf-stat.i.dTLB-stores
14.76 +1.4 16.18 ± 4% perf-stat.i.iTLB-load-miss-rate%
7454399 ± 2% -22.7% 5760387 ± 4% perf-stat.i.iTLB-load-misses
43026423 -30.6% 29840650 perf-stat.i.iTLB-loads
2.042e+10 -24.7% 1.538e+10 perf-stat.i.instructions
0.14 -24.6% 0.11 perf-stat.i.ipc
815.65 -20.2% 651.03 perf-stat.i.metric.K/sec
120.43 -24.3% 91.11 perf-stat.i.metric.M/sec
4264808 -32.2% 2892980 perf-stat.i.minor-faults
11007315 ± 2% +39.7% 15375516 perf-stat.i.node-load-misses
1459152 ± 6% +45.1% 2116827 ± 5% perf-stat.i.node-loads
7872989 ± 2% -26.2% 5812458 perf-stat.i.node-store-misses
4264808 -32.2% 2892980 perf-stat.i.page-faults
4.05 +80.4% 7.31 perf-stat.overall.MPKI
0.58 -0.0 0.57 perf-stat.overall.branch-miss-rate%
17.25 +7.0 24.27 perf-stat.overall.cache-miss-rate%
7.13 +32.7% 9.46 perf-stat.overall.cpi
1759 ± 2% -26.5% 1294 perf-stat.overall.cycles-between-cache-misses
0.26 -0.0 0.23 perf-stat.overall.dTLB-load-miss-rate%
0.45 -0.0 0.44 perf-stat.overall.dTLB-store-miss-rate%
14.77 +1.4 16.18 ± 4% perf-stat.overall.iTLB-load-miss-rate%
0.14 -24.7% 0.11 perf-stat.overall.ipc
2882666 +11.2% 3206246 perf-stat.overall.path-length
4.052e+09 -23.3% 3.11e+09 perf-stat.ps.branch-instructions
23421504 -25.0% 17574476 perf-stat.ps.branch-misses
82419384 ± 2% +35.9% 1.12e+08 perf-stat.ps.cache-misses
4.778e+08 ± 2% -3.4% 4.614e+08 perf-stat.ps.cache-references
134.44 -1.1% 132.98 perf-stat.ps.cpu-migrations
13415064 -31.7% 9160067 perf-stat.ps.dTLB-load-misses
5.124e+09 -24.1% 3.89e+09 perf-stat.ps.dTLB-loads
12889609 -32.2% 8744145 perf-stat.ps.dTLB-store-misses
2.831e+09 -29.9% 1.986e+09 perf-stat.ps.dTLB-stores
7428050 ± 2% -22.7% 5741276 ± 4% perf-stat.ps.iTLB-load-misses
42877049 -30.6% 29741122 perf-stat.ps.iTLB-loads
2.035e+10 -24.7% 1.533e+10 perf-stat.ps.instructions
4250034 -32.2% 2883410 perf-stat.ps.minor-faults
10968228 ± 2% +39.7% 15322266 perf-stat.ps.node-load-misses
1454274 ± 6% +45.1% 2109746 ± 5% perf-stat.ps.node-loads
7845298 ± 2% -26.2% 5792864 perf-stat.ps.node-store-misses
4250034 -32.2% 2883410 perf-stat.ps.page-faults
6.147e+12 -24.7% 4.631e+12 perf-stat.total.instructions
26.77 -1.8 24.93 ± 3% perf-profile.calltrace.cycles-pp.intel_idle_ibrs.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
26.75 -1.8 24.92 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
26.75 -1.8 24.92 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
26.84 -1.8 25.00 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
26.75 -1.8 24.92 ± 2% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
26.75 -1.8 24.92 ± 2% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
26.75 -1.8 24.92 ± 2% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
27.05 -1.8 25.29 ± 3% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
13.02 ± 2% -1.4 11.60 ± 4% perf-profile.calltrace.cycles-pp.testcase
5.54 ± 5% -1.0 4.52 ± 3% perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single
1.37 ± 2% -0.9 0.51 ± 58% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__madvise
10.38 ± 3% -0.8 9.54 ± 2% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
2.38 ± 2% -0.8 1.63 ± 3% perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush
4.02 ± 3% -0.7 3.32 ± 3% perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
1.92 ± 4% -0.4 1.49 ± 2% perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.testcase
1.36 ± 2% -0.4 0.99 perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
1.30 ± 10% -0.4 0.94 ± 6% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault
1.50 ± 11% -0.3 1.19 ± 5% perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
1.13 ± 3% -0.3 0.83 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
0.71 ± 3% -0.3 0.43 ± 44% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__madvise
1.02 ± 3% -0.3 0.75 perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
0.97 ± 3% -0.3 0.72 perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
0.77 ± 2% -0.2 0.58 ± 2% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
0.71 ± 2% -0.1 0.60 ± 3% perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages
1.20 +0.1 1.34 perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
1.10 ± 2% +0.2 1.28 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
1.04 ± 2% +0.2 1.24 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
0.83 +0.2 1.07 ± 2% perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
0.81 ± 2% +0.3 1.08 perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
0.88 ± 10% +0.3 1.16 ± 4% perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault
0.71 ± 2% +0.3 1.00 perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
0.76 ± 3% +0.3 1.09 ± 2% perf-profile.calltrace.cycles-pp.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.73 ± 3% +0.3 1.07 ± 2% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +0.6 0.55 ± 2% perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_commit_charge.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault
6.60 ± 4% +0.6 7.18 ± 3% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
6.54 ± 4% +0.6 7.13 ± 3% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
0.00 +0.7 0.74 ± 3% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single
0.00 +0.8 0.79 ± 2% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
0.00 +0.8 0.79 ± 3% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.lru_add_fn.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
0.00 +0.8 0.80 ± 3% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault
5.80 ± 5% +0.8 6.60 ± 3% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
0.00 +0.8 0.82 perf-profile.calltrace.cycles-pp.__count_memcg_events.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush
0.69 ± 4% +0.9 1.59 ± 2% perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
30.43 +1.1 31.57 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
29.22 +1.5 30.69 perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
29.05 +1.5 30.56 perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
22.56 ± 2% +2.3 24.87 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single
22.36 ± 2% +2.3 24.70 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
22.11 ± 2% +2.4 24.55 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush
22.70 +2.6 25.35 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
22.38 +2.7 25.08 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
24.10 +2.7 26.82 perf-profile.calltrace.cycles-pp.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
24.09 +2.7 26.82 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise
24.07 +2.7 26.79 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior
22.14 +2.8 24.93 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
59.76 +2.9 62.64 perf-profile.calltrace.cycles-pp.__madvise
57.63 +3.5 61.10 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
57.27 +3.6 60.85 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
56.41 +3.8 60.20 perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
56.37 +3.8 60.17 perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
55.94 +3.9 59.88 perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
55.85 +4.0 59.82 perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
26.75 -1.8 24.92 ± 2% perf-profile.children.cycles-pp.start_secondary
26.98 -1.8 25.22 ± 3% perf-profile.children.cycles-pp.intel_idle_ibrs
27.05 -1.8 25.29 ± 3% perf-profile.children.cycles-pp.cpu_startup_entry
27.05 -1.8 25.29 ± 3% perf-profile.children.cycles-pp.do_idle
27.05 -1.8 25.29 ± 3% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
27.05 -1.8 25.29 ± 3% perf-profile.children.cycles-pp.cpuidle_enter
27.05 -1.8 25.29 ± 3% perf-profile.children.cycles-pp.cpuidle_enter_state
27.05 -1.8 25.29 ± 3% perf-profile.children.cycles-pp.cpuidle_idle_call
13.66 ± 2% -1.3 12.38 perf-profile.children.cycles-pp.testcase
5.55 ± 5% -1.0 4.52 ± 3% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
2.39 ± 2% -0.8 1.63 ± 3% perf-profile.children.cycles-pp.page_counter_uncharge
4.03 ± 3% -0.7 3.32 ± 3% perf-profile.children.cycles-pp.uncharge_batch
1.96 ± 4% -0.4 1.52 ± 2% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
1.30 -0.4 0.94 ± 2% perf-profile.children.cycles-pp.error_entry
1.36 ± 2% -0.4 0.99 perf-profile.children.cycles-pp.__irqentry_text_end
1.30 ± 10% -0.4 0.94 ± 6% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
1.51 ± 11% -0.3 1.19 ± 5% perf-profile.children.cycles-pp.uncharge_folio
1.14 ± 3% -0.3 0.84 perf-profile.children.cycles-pp.flush_tlb_mm_range
1.02 ± 3% -0.3 0.75 perf-profile.children.cycles-pp.flush_tlb_func
0.98 ± 3% -0.3 0.72 perf-profile.children.cycles-pp.native_flush_tlb_one_user
0.73 ± 2% -0.2 0.52 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.69 ± 2% -0.2 0.50 ± 2% perf-profile.children.cycles-pp.native_irq_return_iret
0.79 ± 2% -0.2 0.60 ± 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.51 ± 2% -0.1 0.38 ± 2% perf-profile.children.cycles-pp.sync_regs
0.41 ± 3% -0.1 0.29 ± 3% perf-profile.children.cycles-pp.__perf_sw_event
0.44 ± 2% -0.1 0.32 ± 2% perf-profile.children.cycles-pp.vma_alloc_folio
0.72 ± 2% -0.1 0.61 ± 3% perf-profile.children.cycles-pp.propagate_protected_usage
0.39 -0.1 0.28 ± 2% perf-profile.children.cycles-pp.alloc_pages_mpol
0.35 ± 3% -0.1 0.25 ± 3% perf-profile.children.cycles-pp.__alloc_pages
0.34 ± 2% -0.1 0.24 ± 4% perf-profile.children.cycles-pp.___perf_sw_event
0.30 ± 3% -0.1 0.21 ± 5% perf-profile.children.cycles-pp.lock_vma_under_rcu
0.32 ± 2% -0.1 0.24 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.12 ± 4% -0.1 0.03 ± 70% perf-profile.children.cycles-pp.down_read
0.25 ± 3% -0.1 0.18 ± 4% perf-profile.children.cycles-pp.mas_walk
0.25 ± 3% -0.1 0.18 ± 2% perf-profile.children.cycles-pp.get_page_from_freelist
0.17 ± 4% -0.1 0.11 ± 3% perf-profile.children.cycles-pp.__pte_offset_map_lock
0.14 ± 3% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.clear_page_erms
0.17 ± 2% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.find_vma_prev
0.13 ± 2% -0.0 0.09 perf-profile.children.cycles-pp.percpu_counter_add_batch
0.11 ± 4% -0.0 0.07 ± 10% perf-profile.children.cycles-pp.__cond_resched
0.13 ± 2% -0.0 0.10 ± 7% perf-profile.children.cycles-pp.free_pages_and_swap_cache
0.06 ± 7% -0.0 0.03 ± 70% perf-profile.children.cycles-pp.unmap_vmas
0.11 ± 3% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.free_unref_page_list
0.06 -0.0 0.03 ± 70% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.09 ± 7% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.free_swap_cache
0.09 ± 7% -0.0 0.07 ± 7% perf-profile.children.cycles-pp.__munmap
0.09 ± 8% -0.0 0.06 ± 6% perf-profile.children.cycles-pp._raw_spin_lock
0.09 ± 5% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.handle_pte_fault
0.08 ± 8% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.do_vmi_munmap
0.08 ± 4% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.__mod_lruvec_state
0.07 ± 6% -0.0 0.05 ± 8% perf-profile.children.cycles-pp.rmqueue
0.07 ± 9% -0.0 0.05 ± 7% perf-profile.children.cycles-pp.unmap_region
0.08 ± 8% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.__vm_munmap
0.08 ± 8% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.__x64_sys_munmap
0.08 ± 8% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.do_vmi_align_munmap
0.08 ± 5% -0.0 0.07 ± 7% perf-profile.children.cycles-pp.try_charge_memcg
1.27 +0.1 1.40 perf-profile.children.cycles-pp.unmap_page_range
1.17 +0.2 1.32 perf-profile.children.cycles-pp.zap_pmd_range
1.12 +0.2 1.29 perf-profile.children.cycles-pp.zap_pte_range
0.84 +0.2 1.07 ± 2% perf-profile.children.cycles-pp.lru_add_fn
0.81 ± 2% +0.3 1.08 perf-profile.children.cycles-pp.page_remove_rmap
0.89 ± 10% +0.3 1.16 ± 4% perf-profile.children.cycles-pp.mem_cgroup_commit_charge
0.77 ± 3% +0.3 1.09 ± 2% perf-profile.children.cycles-pp.folio_add_new_anon_rmap
6.62 ± 4% +0.6 7.19 ± 3% perf-profile.children.cycles-pp.exc_page_fault
6.56 ± 4% +0.6 7.14 ± 3% perf-profile.children.cycles-pp.do_user_addr_fault
1.44 ± 2% +0.6 2.08 ± 2% perf-profile.children.cycles-pp.__mod_lruvec_page_state
5.80 ± 5% +0.8 6.61 ± 3% perf-profile.children.cycles-pp.handle_mm_fault
30.44 +1.1 31.58 perf-profile.children.cycles-pp.tlb_finish_mmu
29.23 +1.5 30.69 perf-profile.children.cycles-pp.tlb_batch_pages_flush
29.19 +1.5 30.66 perf-profile.children.cycles-pp.release_pages
1.63 ± 5% +1.5 3.13 ± 2% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
1.32 ± 4% +1.6 2.97 ± 2% perf-profile.children.cycles-pp.__count_memcg_events
24.12 +2.7 26.84 perf-profile.children.cycles-pp.lru_add_drain
24.12 +2.7 26.84 perf-profile.children.cycles-pp.lru_add_drain_cpu
24.09 +2.7 26.81 perf-profile.children.cycles-pp.folio_batch_move_lru
59.80 +2.9 62.68 perf-profile.children.cycles-pp.__madvise
57.82 +3.4 61.26 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
57.44 +3.5 60.99 perf-profile.children.cycles-pp.do_syscall_64
56.41 +3.8 60.20 perf-profile.children.cycles-pp.__x64_sys_madvise
56.37 +3.8 60.17 perf-profile.children.cycles-pp.do_madvise
55.94 +3.9 59.88 perf-profile.children.cycles-pp.madvise_vma_behavior
55.85 +4.0 59.82 perf-profile.children.cycles-pp.zap_page_range_single
45.26 +5.0 50.23 perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
44.75 +5.0 49.80 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
44.26 +5.2 49.50 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
26.98 -1.8 25.22 ± 3% perf-profile.self.cycles-pp.intel_idle_ibrs
1.67 ± 3% -0.6 1.02 ± 3% perf-profile.self.cycles-pp.page_counter_uncharge
1.92 ± 5% -0.4 1.49 ± 2% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
1.47 ± 2% -0.4 1.06 ± 2% perf-profile.self.cycles-pp.testcase
1.36 ± 2% -0.4 0.99 perf-profile.self.cycles-pp.__irqentry_text_end
1.30 -0.4 0.94 perf-profile.self.cycles-pp.error_entry
1.30 ± 10% -0.4 0.94 ± 6% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
1.18 ± 8% -0.3 0.86 ± 6% perf-profile.self.cycles-pp.uncharge_batch
1.50 ± 11% -0.3 1.19 ± 5% perf-profile.self.cycles-pp.uncharge_folio
0.98 ± 3% -0.3 0.72 perf-profile.self.cycles-pp.native_flush_tlb_one_user
0.71 ± 2% -0.2 0.51 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.69 ± 2% -0.2 0.50 ± 2% perf-profile.self.cycles-pp.native_irq_return_iret
0.50 ± 4% -0.2 0.30 ± 5% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.75 ± 2% -0.2 0.56 ± 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.51 ± 2% -0.1 0.38 ± 2% perf-profile.self.cycles-pp.sync_regs
0.35 ± 3% -0.1 0.23 ± 2% perf-profile.self.cycles-pp.folio_batch_move_lru
0.36 ± 5% -0.1 0.24 ± 2% perf-profile.self.cycles-pp.lru_add_fn
0.39 ± 2% -0.1 0.27 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.72 ± 2% -0.1 0.61 ± 3% perf-profile.self.cycles-pp.propagate_protected_usage
0.45 -0.1 0.34 ± 2% perf-profile.self.cycles-pp.release_pages
0.54 ± 4% -0.1 0.45 ± 4% perf-profile.self.cycles-pp.__mod_lruvec_page_state
0.30 ± 2% -0.1 0.21 ± 3% perf-profile.self.cycles-pp.___perf_sw_event
0.52 ± 5% -0.1 0.43 ± 5% perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave
0.28 ± 3% -0.1 0.21 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.25 ± 3% -0.1 0.18 ± 4% perf-profile.self.cycles-pp.mas_walk
0.24 ± 2% -0.1 0.17 ± 4% perf-profile.self.cycles-pp.__handle_mm_fault
0.16 ± 4% -0.1 0.10 ± 9% perf-profile.self.cycles-pp.zap_pte_range
0.14 ± 4% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.clear_page_erms
0.08 ± 6% -0.0 0.03 ± 70% perf-profile.self.cycles-pp.__cond_resched
0.13 -0.0 0.09 perf-profile.self.cycles-pp.percpu_counter_add_batch
0.14 ± 5% -0.0 0.11 ± 3% perf-profile.self.cycles-pp.handle_mm_fault
0.11 ± 3% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.do_user_addr_fault
0.08 ± 6% -0.0 0.04 ± 44% perf-profile.self.cycles-pp.__perf_sw_event
0.07 ± 10% -0.0 0.04 ± 44% perf-profile.self.cycles-pp.tlb_finish_mmu
0.09 ± 7% -0.0 0.06 ± 6% perf-profile.self.cycles-pp.free_swap_cache
0.08 ± 7% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.lock_vma_under_rcu
0.09 ± 8% -0.0 0.06 ± 6% perf-profile.self.cycles-pp._raw_spin_lock
0.07 ± 7% -0.0 0.04 ± 44% perf-profile.self.cycles-pp.asm_exc_page_fault
0.10 ± 3% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.page_remove_rmap
0.08 ± 6% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.flush_tlb_mm_range
0.08 ± 6% -0.0 0.06 ± 9% perf-profile.self.cycles-pp.do_anonymous_page
0.08 ± 7% -0.0 0.06 ± 6% perf-profile.self.cycles-pp.unmap_page_range
0.08 ± 5% -0.0 0.06 ± 6% perf-profile.self.cycles-pp.__alloc_pages
0.08 ± 6% -0.0 0.06 ± 8% perf-profile.self.cycles-pp.do_madvise
0.07 ± 10% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.up_read
1.58 ± 6% +1.5 3.09 ± 2% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
1.27 ± 5% +1.7 2.93 ± 2% perf-profile.self.cycles-pp.__count_memcg_events
44.25 +5.2 49.50 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression 2024-01-22 8:39 [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression kernel test robot @ 2024-01-22 21:39 ` Yosry Ahmed 2024-01-23 7:21 ` Oliver Sang 0 siblings, 1 reply; 6+ messages in thread From: Yosry Ahmed @ 2024-01-22 21:39 UTC (permalink / raw) To: kernel test robot Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner, Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen, Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song, Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups, linux-mm, ying.huang, feng.tang, fengwei.yin [-- Attachment #1: Type: text/plain, Size: 3423 bytes --] On Mon, Jan 22, 2024 at 12:39 AM kernel test robot <oliver.sang@intel.com> wrote: > > > > hi, Yosry Ahmed, > > per your suggestion in > https://lore.kernel.org/all/CAJD7tkameJBrJQxRj+ibKL6-yd-i0wyoyv2cgZdh3ZepA1p7wA@mail.gmail.com/ > "I think it would be useful to know if there are > regressions/improvements in other microbenchmarks, at least to > investigate whether they represent real regressions." > > we still report below two regressions to you just FYI what we observed in our > microbenchmark tests. > (we still captured will-it-scale::fallocate regression but ignore here per > your commit message) > > > Hello, > > kernel test robot noticed a -36.6% regression of vm-scalability.throughput on: > > > commit: 8d59d2214c2362e7a9d185d80b613e632581af7b ("mm: memcg: make stats flushing threshold per-memcg") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > testcase: vm-scalability > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory > parameters: > > runtime: 300s > size: 1T > test: lru-shm > cpufreq_governor: performance > > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/ > > In addition to that, the commit also has significant impact on the following tests: > > +------------------+----------------------------------------------------------------------------------------------------+ > | testcase: change | will-it-scale: will-it-scale.per_process_ops -32.3% regression | > | test machine | 104 threads 2 sockets (Skylake) with 192G memory | > | test parameters | cpufreq_governor=performance | > | | mode=process | > | | nr_task=50% | > | | test=tlb_flush2 | > +------------------+----------------------------------------------------------------------------------------------------+ > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@intel.com Thanks for reporting this. We have had these patches running on O(10K) machines in our production for a while now, and there haven't been any complaints (at least not yet). OTOH, we do see significant CPU savings on reading memcg stats. That being said, I think we can improve the performance here by caching pointers to the parent_memcg->vmstats_percpu and memcg->vmstats in struct memcg_vmstat_percpu. This should significantly reduce the memory fetches in the loop in memcg_rstat_updated(). Oliver, would you be able to test if the attached patch helps? It's based on 8d59d2214c236. [..] [-- Attachment #2: 0001-mm-memcg-optimize-parent-iteration-in-memcg_rstat_up.patch --] [-- Type: application/octet-stream, Size: 4006 bytes --] From 8d04c38137c71d1577a8576fb75db07f3bf92491 Mon Sep 17 00:00:00 2001 From: Yosry Ahmed <yosryahmed@google.com> Date: Mon, 22 Jan 2024 21:35:29 +0000 Subject: [PATCH] mm: memcg: optimize parent iteration in memcg_rstat_updated() Signed-off-by: Yosry Ahmed <yosryahmed@google.com> --- mm/memcontrol.c | 45 ++++++++++++++++++++++++++++----------------- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c5aa0c2cb68b2..b5ec4a8413215 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -634,6 +634,10 @@ struct memcg_vmstats_percpu { /* Stats updates since the last flush */ unsigned int stats_updates; + + /* Cached pointers for fast updates in memcg_rstat_updated() */ + struct memcg_vmstats_percpu *parent; + struct memcg_vmstats *vmstats; }; struct memcg_vmstats { @@ -698,36 +702,34 @@ static void memcg_stats_unlock(void) } -static bool memcg_should_flush_stats(struct mem_cgroup *memcg) +static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats) { - return atomic64_read(&memcg->vmstats->stats_updates) > + return atomic64_read(&vmstats->stats_updates) > MEMCG_CHARGE_BATCH * num_online_cpus(); } static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { + struct memcg_vmstats_percpu *statc; int cpu = smp_processor_id(); - unsigned int x; if (!val) return; cgroup_rstat_updated(memcg->css.cgroup, cpu); - - for (; memcg; memcg = parent_mem_cgroup(memcg)) { - x = __this_cpu_add_return(memcg->vmstats_percpu->stats_updates, - abs(val)); - - if (x < MEMCG_CHARGE_BATCH) + statc = this_cpu_ptr(memcg->vmstats_percpu); + for (; statc; statc = statc->parent) { + statc->stats_updates += abs(val); + if (statc->stats_updates < MEMCG_CHARGE_BATCH) continue; /* * If @memcg is already flush-able, increasing stats_updates is * redundant. Avoid the overhead of the atomic update. */ - if (!memcg_should_flush_stats(memcg)) - atomic64_add(x, &memcg->vmstats->stats_updates); - __this_cpu_write(memcg->vmstats_percpu->stats_updates, 0); + if (!memcg_vmstats_needs_flush(statc->vmstats)) + atomic64_add(x, &statc->vmstats->stats_updates); + statc->stats_updates = 0; } } @@ -751,7 +753,7 @@ static void do_flush_stats(void) void mem_cgroup_flush_stats(void) { - if (memcg_should_flush_stats(root_mem_cgroup)) + if (memcg_vmstats_needs_flush(root_mem_cgroup->vmstats)) do_flush_stats(); } @@ -765,7 +767,7 @@ void mem_cgroup_flush_stats_ratelimited(void) static void flush_memcg_stats_dwork(struct work_struct *w) { /* - * Deliberately ignore memcg_should_flush_stats() here so that flushing + * Deliberately ignore memcg_vmstats_needs_flush() here so that flushing * in latency-sensitive paths is as cheap as possible. */ do_flush_stats(); @@ -5453,10 +5455,11 @@ static void mem_cgroup_free(struct mem_cgroup *memcg) __mem_cgroup_free(memcg); } -static struct mem_cgroup *mem_cgroup_alloc(void) +static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent) { + struct memcg_vmstats_percpu *statc, *pstatc; struct mem_cgroup *memcg; - int node; + int node, cpu; int __maybe_unused i; long error = -ENOMEM; @@ -5480,6 +5483,14 @@ static struct mem_cgroup *mem_cgroup_alloc(void) if (!memcg->vmstats_percpu) goto fail; + for_each_possible_cpu(cpu) { + if (parent) + pstatc = per_cpu_ptr(parent->vmstats_percpu, cpu); + statc = per_cpu_ptr(memcg->vmstats_percpu, cpu); + statc->parent = parent ? pstatc : NULL; + statc->vmstats = memcg->vmstats; + } + for_each_node(node) if (alloc_mem_cgroup_per_node_info(memcg, node)) goto fail; @@ -5525,7 +5536,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) struct mem_cgroup *memcg, *old_memcg; old_memcg = set_active_memcg(parent); - memcg = mem_cgroup_alloc(); + memcg = mem_cgroup_alloc(parent); set_active_memcg(old_memcg); if (IS_ERR(memcg)) return ERR_CAST(memcg); -- 2.43.0.429.g432eaa2c6b-goog ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression 2024-01-22 21:39 ` Yosry Ahmed @ 2024-01-23 7:21 ` Oliver Sang 2024-01-23 7:42 ` Yosry Ahmed 0 siblings, 1 reply; 6+ messages in thread From: Oliver Sang @ 2024-01-23 7:21 UTC (permalink / raw) To: Yosry Ahmed Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner, Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen, Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song, Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups, linux-mm, ying.huang, feng.tang, fengwei.yin, oliver.sang hi, Yosry Ahmed, On Mon, Jan 22, 2024 at 01:39:19PM -0800, Yosry Ahmed wrote: > On Mon, Jan 22, 2024 at 12:39 AM kernel test robot > <oliver.sang@intel.com> wrote: > > > > > > > > hi, Yosry Ahmed, > > > > per your suggestion in > > https://lore.kernel.org/all/CAJD7tkameJBrJQxRj+ibKL6-yd-i0wyoyv2cgZdh3ZepA1p7wA@mail.gmail.com/ > > "I think it would be useful to know if there are > > regressions/improvements in other microbenchmarks, at least to > > investigate whether they represent real regressions." > > > > we still report below two regressions to you just FYI what we observed in our > > microbenchmark tests. > > (we still captured will-it-scale::fallocate regression but ignore here per > > your commit message) > > > > > > Hello, > > > > kernel test robot noticed a -36.6% regression of vm-scalability.throughput on: > > > > > > commit: 8d59d2214c2362e7a9d185d80b613e632581af7b ("mm: memcg: make stats flushing threshold per-memcg") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > testcase: vm-scalability > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory > > parameters: > > > > runtime: 300s > > size: 1T > > test: lru-shm > > cpufreq_governor: performance > > > > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. > > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/ > > > > In addition to that, the commit also has significant impact on the following tests: > > > > +------------------+----------------------------------------------------------------------------------------------------+ > > | testcase: change | will-it-scale: will-it-scale.per_process_ops -32.3% regression | > > | test machine | 104 threads 2 sockets (Skylake) with 192G memory | > > | test parameters | cpufreq_governor=performance | > > | | mode=process | > > | | nr_task=50% | > > | | test=tlb_flush2 | > > +------------------+----------------------------------------------------------------------------------------------------+ > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > the same patch/commit), kindly add following tags > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > | Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@intel.com > > Thanks for reporting this. We have had these patches running on O(10K) > machines in our production for a while now, and there haven't been any > complaints (at least not yet). OTOH, we do see significant CPU savings > on reading memcg stats. > > That being said, I think we can improve the performance here by > caching pointers to the parent_memcg->vmstats_percpu and > memcg->vmstats in struct memcg_vmstat_percpu. This should > significantly reduce the memory fetches in the loop in > memcg_rstat_updated(). > > Oliver, would you be able to test if the attached patch helps? It's > based on 8d59d2214c236. the patch failed to compile: build_errors: - "mm/memcontrol.c:731:38: error: 'x' undeclared (first use in this function)" > > [..] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression 2024-01-23 7:21 ` Oliver Sang @ 2024-01-23 7:42 ` Yosry Ahmed 2024-01-24 8:26 ` Oliver Sang 0 siblings, 1 reply; 6+ messages in thread From: Yosry Ahmed @ 2024-01-23 7:42 UTC (permalink / raw) To: Oliver Sang Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner, Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen, Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song, Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups, linux-mm, ying.huang, feng.tang, fengwei.yin [-- Attachment #1: Type: text/plain, Size: 379 bytes --] > > Oliver, would you be able to test if the attached patch helps? It's > > based on 8d59d2214c236. > > the patch failed to compile: > > build_errors: > - "mm/memcontrol.c:731:38: error: 'x' undeclared (first use in this function)" Apologizes, apparently I sent the patch with some pending diff in my tree that I hadn't committed. Please find a fixed patch attached. Thanks. [-- Attachment #2: 0001-mm-memcg-optimize-parent-iteration-in-memcg_rstat_up.patch --] [-- Type: application/octet-stream, Size: 4036 bytes --] From 1b00b4e0bbc215fcebb9d3d45e5d63135b7b7e89 Mon Sep 17 00:00:00 2001 From: Yosry Ahmed <yosryahmed@google.com> Date: Mon, 22 Jan 2024 21:35:29 +0000 Subject: [PATCH] mm: memcg: optimize parent iteration in memcg_rstat_updated() Signed-off-by: Yosry Ahmed <yosryahmed@google.com> --- mm/memcontrol.c | 46 +++++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c5aa0c2cb68b2..d6a9d6dad2f00 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -634,6 +634,10 @@ struct memcg_vmstats_percpu { /* Stats updates since the last flush */ unsigned int stats_updates; + + /* Cached pointers for fast updates in memcg_rstat_updated() */ + struct memcg_vmstats_percpu *parent; + struct memcg_vmstats *vmstats; }; struct memcg_vmstats { @@ -698,36 +702,35 @@ static void memcg_stats_unlock(void) } -static bool memcg_should_flush_stats(struct mem_cgroup *memcg) +static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats) { - return atomic64_read(&memcg->vmstats->stats_updates) > + return atomic64_read(&vmstats->stats_updates) > MEMCG_CHARGE_BATCH * num_online_cpus(); } static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { + struct memcg_vmstats_percpu *statc; int cpu = smp_processor_id(); - unsigned int x; if (!val) return; cgroup_rstat_updated(memcg->css.cgroup, cpu); - - for (; memcg; memcg = parent_mem_cgroup(memcg)) { - x = __this_cpu_add_return(memcg->vmstats_percpu->stats_updates, - abs(val)); - - if (x < MEMCG_CHARGE_BATCH) + statc = this_cpu_ptr(memcg->vmstats_percpu); + for (; statc; statc = statc->parent) { + statc->stats_updates += abs(val); + if (statc->stats_updates < MEMCG_CHARGE_BATCH) continue; /* * If @memcg is already flush-able, increasing stats_updates is * redundant. Avoid the overhead of the atomic update. */ - if (!memcg_should_flush_stats(memcg)) - atomic64_add(x, &memcg->vmstats->stats_updates); - __this_cpu_write(memcg->vmstats_percpu->stats_updates, 0); + if (!memcg_vmstats_needs_flush(statc->vmstats)) + atomic64_add(statc->stats_updates, + &statc->vmstats->stats_updates); + statc->stats_updates = 0; } } @@ -751,7 +754,7 @@ static void do_flush_stats(void) void mem_cgroup_flush_stats(void) { - if (memcg_should_flush_stats(root_mem_cgroup)) + if (memcg_vmstats_needs_flush(root_mem_cgroup->vmstats)) do_flush_stats(); } @@ -765,7 +768,7 @@ void mem_cgroup_flush_stats_ratelimited(void) static void flush_memcg_stats_dwork(struct work_struct *w) { /* - * Deliberately ignore memcg_should_flush_stats() here so that flushing + * Deliberately ignore memcg_vmstats_needs_flush() here so that flushing * in latency-sensitive paths is as cheap as possible. */ do_flush_stats(); @@ -5453,10 +5456,11 @@ static void mem_cgroup_free(struct mem_cgroup *memcg) __mem_cgroup_free(memcg); } -static struct mem_cgroup *mem_cgroup_alloc(void) +static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent) { + struct memcg_vmstats_percpu *statc, *pstatc; struct mem_cgroup *memcg; - int node; + int node, cpu; int __maybe_unused i; long error = -ENOMEM; @@ -5480,6 +5484,14 @@ static struct mem_cgroup *mem_cgroup_alloc(void) if (!memcg->vmstats_percpu) goto fail; + for_each_possible_cpu(cpu) { + if (parent) + pstatc = per_cpu_ptr(parent->vmstats_percpu, cpu); + statc = per_cpu_ptr(memcg->vmstats_percpu, cpu); + statc->parent = parent ? pstatc : NULL; + statc->vmstats = memcg->vmstats; + } + for_each_node(node) if (alloc_mem_cgroup_per_node_info(memcg, node)) goto fail; @@ -5525,7 +5537,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) struct mem_cgroup *memcg, *old_memcg; old_memcg = set_active_memcg(parent); - memcg = mem_cgroup_alloc(); + memcg = mem_cgroup_alloc(parent); set_active_memcg(old_memcg); if (IS_ERR(memcg)) return ERR_CAST(memcg); -- 2.43.0.429.g432eaa2c6b-goog ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression 2024-01-23 7:42 ` Yosry Ahmed @ 2024-01-24 8:26 ` Oliver Sang 2024-01-24 9:11 ` Yosry Ahmed 0 siblings, 1 reply; 6+ messages in thread From: Oliver Sang @ 2024-01-24 8:26 UTC (permalink / raw) To: Yosry Ahmed Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner, Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen, Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song, Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups, linux-mm, ying.huang, feng.tang, fengwei.yin, oliver.sang [-- Attachment #1: Type: text/plain, Size: 3943 bytes --] hi, Yosry Ahmed, On Mon, Jan 22, 2024 at 11:42:04PM -0800, Yosry Ahmed wrote: > > > Oliver, would you be able to test if the attached patch helps? It's > > > based on 8d59d2214c236. > > > > the patch failed to compile: > > > > build_errors: > > - "mm/memcontrol.c:731:38: error: 'x' undeclared (first use in this function)" > > Apologizes, apparently I sent the patch with some pending diff in my > tree that I hadn't committed. Please find a fixed patch attached. the regression disappears after applying the patch. Tested-by: kernel test robot <oliver.sang@intel.com> for 1st regression we reported (details is attached as vm-scalability): ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/1T/lkp-cpl-4sp2/lru-shm/vm-scalability commit: e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code 8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg 0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated() e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 946447 -37.8% 588327 -1.1% 936279 vm-scalability.median 2.131e+08 -36.6% 1.351e+08 -1.4% 2.102e+08 vm-scalability.throughput for 2nd regression (details is attached as will-it-scale-tlb_flush2): ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale commit: e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code 8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg 0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated() e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 2132437 -32.3% 1444430 +0.9% 2151460 will-it-scale.52.processes 41008 -32.3% 27776 +0.9% 41373 will-it-scale.per_process_ops interesting thing is, it also helps on will-it-scale:fallocate tests which we ignored in original report (details is attached as will-it-scale-fallocate1): ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/fallocate1/will-it-scale commit: e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code 8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg 0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated() e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 5426049 -33.8% 3590953 +3.3% 5605429 will-it-scale.224.processes 24222 -33.8% 16030 +3.3% 25023 will-it-scale.per_process_ops > > Thanks. [-- Attachment #2: vm-scalability --] [-- Type: text/plain, Size: 82648 bytes --] ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/1T/lkp-cpl-4sp2/lru-shm/vm-scalability commit: e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code 8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg 0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated() e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 0.01 +86.7% 0.02 +3.7% 0.01 vm-scalability.free_time 946447 -37.8% 588327 -1.1% 936279 vm-scalability.median 2.131e+08 -36.6% 1.351e+08 -1.4% 2.102e+08 vm-scalability.throughput 284.74 +6.3% 302.62 +1.5% 288.98 vm-scalability.time.elapsed_time 284.74 +6.3% 302.62 +1.5% 288.98 vm-scalability.time.elapsed_time.max 30485 +14.8% 34987 +0.1% 30514 vm-scalability.time.involuntary_context_switches 1893 +43.6% 2718 -0.1% 1891 vm-scalability.time.percent_of_cpu_this_job_got 3855 +67.7% 6467 +1.7% 3922 vm-scalability.time.system_time 1537 +14.5% 1760 +0.5% 1545 vm-scalability.time.user_time 120009 -5.6% 113290 -0.4% 119542 vm-scalability.time.voluntary_context_switches 6.46 +3.5 9.95 +0.0 6.48 mpstat.cpu.all.sys% 21.22 +38.8% 29.46 -0.3% 21.14 vmstat.procs.r 9376 +0.5% 9422 +2.6% 9621 vmstat.system.cs 233326 -0.6% 231877 -1.2% 230635 vmstat.system.in 113624 ± 5% +14.0% 129566 ± 3% -11.8% 100234 ± 3% meminfo.Active 113476 ± 5% +14.0% 129417 ± 3% -11.8% 100070 ± 3% meminfo.Active(anon) 3987746 +46.0% 5821636 -0.8% 3954895 meminfo.Mapped 16345 +14.6% 18729 -2.4% 15952 ± 2% meminfo.PageTables 474.17 ± 3% -88.9% 52.50 ±125% -98.2% 8.67 ± 31% perf-c2c.DRAM.local 483.17 ± 5% -79.3% 99.83 ± 70% -87.3% 61.17 ± 21% perf-c2c.DRAM.remote 1045 ± 5% -71.9% 294.00 ± 63% -80.7% 202.17 ± 6% perf-c2c.HITM.local 119.50 ± 10% -78.8% 25.33 ± 20% -79.8% 24.17 ± 28% perf-c2c.HITM.remote 392.33 +35.4% 531.17 -0.1% 392.00 turbostat.Avg_MHz 10.35 +3.7 14.00 -0.0 10.34 turbostat.Busy% 90.56 -3.7 86.86 +0.0 90.57 turbostat.C1% 0.28 ± 5% -31.5% 0.19 +0.0% 0.28 ± 2% turbostat.IPC 481.33 +2.5% 493.38 -0.3% 480.09 turbostat.PkgWatt 999019 ± 3% +44.4% 1442651 ± 2% -1.4% 984731 ± 4% numa-meminfo.node0.Mapped 1005687 ± 4% +44.1% 1449402 ± 3% +2.3% 1029138 numa-meminfo.node1.Mapped 3689 ± 3% +21.7% 4490 ± 7% +5.7% 3899 ± 7% numa-meminfo.node1.PageTables 980589 ± 2% +42.3% 1395777 ± 2% +1.6% 996328 ± 3% numa-meminfo.node2.Mapped 96484 ± 5% +22.0% 117715 ± 4% -9.0% 87779 ± 4% numa-meminfo.node3.Active 96430 ± 5% +22.1% 117694 ± 4% -9.0% 87737 ± 4% numa-meminfo.node3.Active(anon) 991367 ± 3% +42.7% 1414337 ± 4% -0.7% 984261 ± 2% numa-meminfo.node3.Mapped 251219 ± 3% +44.8% 363745 ± 2% -2.9% 244018 ± 5% numa-vmstat.node0.nr_mapped 253252 ± 2% +44.6% 366087 ± 3% +0.8% 255216 numa-vmstat.node1.nr_mapped 927.67 ± 3% +21.9% 1130 ± 7% +4.4% 968.41 ± 7% numa-vmstat.node1.nr_page_table_pages 248171 ± 2% +42.5% 353541 ± 4% -0.7% 246429 ± 3% numa-vmstat.node2.nr_mapped 24188 ± 5% +21.6% 29410 ± 4% -9.2% 21963 ± 4% numa-vmstat.node3.nr_active_anon 245825 ± 2% +45.5% 357622 ± 3% -0.2% 245258 ± 3% numa-vmstat.node3.nr_mapped 1038 ± 11% +17.8% 1224 ± 6% -4.5% 992.13 ± 2% numa-vmstat.node3.nr_page_table_pages 24188 ± 5% +21.6% 29410 ± 4% -9.2% 21963 ± 4% numa-vmstat.node3.nr_zone_active_anon 284.74 +6.3% 302.62 +1.5% 288.98 time.elapsed_time 284.74 +6.3% 302.62 +1.5% 288.98 time.elapsed_time.max 30485 +14.8% 34987 +0.1% 30514 time.involuntary_context_switches 448.67 ± 3% -18.4% 366.00 ± 4% -2.5% 437.67 ± 2% time.major_page_faults 1893 +43.6% 2718 -0.1% 1891 time.percent_of_cpu_this_job_got 3855 +67.7% 6467 +1.7% 3922 time.system_time 1537 +14.5% 1760 +0.5% 1545 time.user_time 120009 -5.6% 113290 -0.4% 119542 time.voluntary_context_switches 28376 ± 5% +14.0% 32338 ± 3% -11.8% 25021 ± 3% proc-vmstat.nr_active_anon 993504 +46.6% 1456136 -0.8% 985427 proc-vmstat.nr_mapped 4060 +15.5% 4691 -2.1% 3977 proc-vmstat.nr_page_table_pages 28376 ± 5% +14.0% 32338 ± 3% -11.8% 25021 ± 3% proc-vmstat.nr_zone_active_anon 1.066e+09 -2.0% 1.045e+09 -0.0% 1.066e+09 proc-vmstat.numa_hit 1.065e+09 -2.0% 1.044e+09 +0.0% 1.065e+09 proc-vmstat.numa_local 69848 ± 2% -1.5% 68819 ± 2% -13.1% 60717 ± 2% proc-vmstat.pgactivate 5659 +5.6% 5978 +1.0% 5713 proc-vmstat.unevictable_pgs_culled 34604288 +3.7% 35898496 +1.1% 35001600 proc-vmstat.unevictable_pgs_scanned 0.08 ±111% +14.5% 0.09 ± 85% +351.6% 0.36 ± 46% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 0.01 ± 35% +20.5% 0.01 ± 30% -49.3% 0.01 ± 17% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 0.01 ± 20% +1887.0% 0.18 ±203% +61.1% 0.01 ± 51% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 6.73 ±221% +10.1% 7.41 ±102% +1086.8% 79.86 ±111% perf-sched.sch_delay.max.ms.pipe_write.vfs_write.ksys_write.do_syscall_64 20.95 ±118% +66.6% 34.90 ± 45% +372.6% 99.03 ± 24% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 10.50 ±172% -78.5% 2.26 ±205% +824.8% 97.06 ± 88% perf-sched.sch_delay.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read 0.01 ± 28% +63.3% 0.01 ± 29% +30.0% 0.01 ± 24% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open 4539 ± 6% -0.0% 4538 ± 3% -12.0% 3994 ± 2% perf-sched.total_wait_and_delay.max.ms 4539 ± 6% -0.0% 4538 ± 3% -12.0% 3994 ± 2% perf-sched.total_wait_time.max.ms 524.54 ± 91% +0.1% 525.21 ± 91% -91.3% 45.89 ± 2% perf-sched.wait_and_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 524.53 ± 91% +0.1% 524.88 ± 91% -91.3% 45.88 ± 2% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 1223376 ± 14% +119.1% 2680582 ± 9% +1.4% 1239953 ± 14% sched_debug.cfs_rq:/.avg_vruntime.avg 1673909 ± 14% +97.6% 3308254 ± 8% -0.6% 1663719 ± 12% sched_debug.cfs_rq:/.avg_vruntime.max 810795 ± 15% +145.8% 1993289 ± 9% +0.1% 811327 ± 18% sched_debug.cfs_rq:/.avg_vruntime.min 156233 ± 8% +55.1% 242331 ± 6% +6.4% 166243 ± 8% sched_debug.cfs_rq:/.avg_vruntime.stddev 0.17 ± 36% +10.4% 0.19 ± 25% -76.3% 0.04 ± 10% sched_debug.cfs_rq:/.h_nr_running.avg 0.28 ± 17% -2.6% 0.27 ± 8% -29.5% 0.20 ± 7% sched_debug.cfs_rq:/.h_nr_running.stddev 1223376 ± 14% +119.1% 2680582 ± 9% +1.4% 1239953 ± 14% sched_debug.cfs_rq:/.min_vruntime.avg 1673909 ± 14% +97.6% 3308254 ± 8% -0.6% 1663719 ± 12% sched_debug.cfs_rq:/.min_vruntime.max 810795 ± 15% +145.8% 1993289 ± 9% +0.1% 811327 ± 18% sched_debug.cfs_rq:/.min_vruntime.min 156233 ± 8% +55.1% 242331 ± 6% +6.4% 166243 ± 8% sched_debug.cfs_rq:/.min_vruntime.stddev 0.17 ± 36% +10.6% 0.19 ± 25% -76.5% 0.04 ± 10% sched_debug.cfs_rq:/.nr_running.avg 0.28 ± 16% -2.0% 0.27 ± 8% -29.8% 0.19 ± 8% sched_debug.cfs_rq:/.nr_running.stddev 247.44 ± 17% -1.9% 242.83 ± 9% -34.5% 162.07 ± 7% sched_debug.cfs_rq:/.runnable_avg.stddev 182.32 ± 33% +8.6% 197.96 ± 25% -71.3% 52.23 ± 11% sched_debug.cfs_rq:/.util_avg.avg 245.26 ± 17% -1.9% 240.57 ± 10% -35.1% 159.09 ± 6% sched_debug.cfs_rq:/.util_avg.stddev 31.39 ± 29% +19.1% 37.39 ± 23% -73.0% 8.47 ± 31% sched_debug.cfs_rq:/.util_est_enqueued.avg 126445 ± 3% -11.0% 112493 ± 4% -12.3% 110951 ± 8% sched_debug.cpu.avg_idle.stddev 5970 ± 46% -24.7% 4497 ± 54% -95.0% 300.50 ± 6% sched_debug.cpu.curr->pid.avg 7221 ± 31% -15.2% 6125 ± 28% -62.3% 2719 ± 9% sched_debug.cpu.curr->pid.stddev 0.25 ± 21% +0.1% 0.25 ± 12% -39.2% 0.15 ± 8% sched_debug.cpu.nr_running.stddev 1447 ± 15% +32.0% 1910 ± 9% +3.3% 1494 ± 14% sched_debug.cpu.nr_switches.min 0.71 +13.4% 0.80 -1.8% 0.69 ± 2% perf-stat.i.MPKI 2.343e+10 -7.9% 2.157e+10 -1.2% 2.315e+10 perf-stat.i.branch-instructions 0.36 -0.0 0.35 -0.0 0.35 perf-stat.i.branch-miss-rate% 30833194 -7.3% 28584190 -0.8% 30598705 perf-stat.i.branch-misses 26.04 -1.4 24.66 -0.5 25.53 ± 2% perf-stat.i.cache-miss-rate% 51345490 ± 3% +40.7% 72258633 ± 3% +1.1% 51925265 ± 5% perf-stat.i.cache-misses 1.616e+08 ± 6% +58.6% 2.562e+08 ± 6% +4.9% 1.695e+08 ± 10% perf-stat.i.cache-references 9297 +0.6% 9355 +2.8% 9558 perf-stat.i.context-switches 1.29 +9.4% 1.42 -0.9% 1.28 perf-stat.i.cpi 8.394e+10 +33.7% 1.122e+11 -0.6% 8.344e+10 perf-stat.i.cpu-cycles 505.77 -2.6% 492.52 -1.2% 499.66 perf-stat.i.cpu-migrations 0.03 +0.0 0.03 ± 2% -0.0 0.03 ± 2% perf-stat.i.dTLB-load-miss-rate% 2.335e+10 -7.4% 2.162e+10 -0.9% 2.315e+10 perf-stat.i.dTLB-loads 0.03 +0.0 0.03 -0.0 0.03 perf-stat.i.dTLB-store-miss-rate% 3948344 -8.0% 3633633 -2.0% 3867670 perf-stat.i.dTLB-store-misses 6.549e+09 -7.0% 6.09e+09 -0.3% 6.528e+09 perf-stat.i.dTLB-stores 17546602 -22.8% 13551001 -15.2% 14872025 perf-stat.i.iTLB-load-misses 2552560 -2.6% 2485876 +0.1% 2555872 perf-stat.i.iTLB-loads 8.367e+10 -7.5% 7.737e+10 -0.9% 8.288e+10 perf-stat.i.instructions 4706 +7.7% 5070 +4.6% 4922 perf-stat.i.instructions-per-iTLB-miss 0.81 -12.0% 0.72 +0.5% 0.82 perf-stat.i.ipc 1.59 ± 3% -22.3% 1.23 ± 4% -4.5% 1.52 ± 3% perf-stat.i.major-faults 0.37 +34.2% 0.49 -0.4% 0.37 perf-stat.i.metric.GHz 233.98 -6.9% 217.90 -0.7% 232.33 perf-stat.i.metric.M/sec 3619177 -9.5% 3276556 -2.3% 3535780 perf-stat.i.minor-faults 74.28 +4.8 79.04 +0.5 74.78 perf-stat.i.node-load-miss-rate% 2898733 ± 4% +49.0% 4320557 -3.5% 2796977 ± 6% perf-stat.i.node-load-misses 1928237 ± 4% -11.9% 1698426 -0.4% 1920388 ± 6% perf-stat.i.node-loads 13383344 ± 2% +4.7% 14013398 ± 3% -0.3% 13338644 ± 3% perf-stat.i.node-stores 3619179 -9.5% 3276558 -2.3% 3535782 perf-stat.i.page-faults 0.61 ± 3% +52.5% 0.94 ± 3% +2.1% 0.63 ± 5% perf-stat.overall.MPKI 31.95 ± 2% -3.6 28.34 ± 3% -1.0 30.92 ± 4% perf-stat.overall.cache-miss-rate% 1.00 +45.0% 1.45 +0.3% 1.00 perf-stat.overall.cpi 0.07 +0.0 0.08 ± 4% +0.0 0.07 ± 2% perf-stat.overall.dTLB-load-miss-rate% 0.06 -0.0 0.06 -0.0 0.06 perf-stat.overall.dTLB-store-miss-rate% 87.62 -2.6 85.05 -1.9 85.75 perf-stat.overall.iTLB-load-miss-rate% 4778 +20.2% 5745 +17.3% 5604 perf-stat.overall.instructions-per-iTLB-miss 1.00 -31.0% 0.69 -0.3% 1.00 perf-stat.overall.ipc 59.75 ± 3% +11.8 71.59 -0.8 58.91 ± 5% perf-stat.overall.node-load-miss-rate% 5145 +1.8% 5239 +1.2% 5208 perf-stat.overall.path-length 2.405e+10 -6.3% 2.252e+10 -0.3% 2.396e+10 perf-stat.ps.branch-instructions 31203502 -6.4% 29219514 -0.3% 31124801 perf-stat.ps.branch-misses 52696784 ± 3% +43.4% 75547948 ± 3% +1.9% 53714277 ± 5% perf-stat.ps.cache-misses 1.652e+08 ± 6% +61.7% 2.672e+08 ± 7% +5.7% 1.746e+08 ± 11% perf-stat.ps.cache-references 9279 +0.5% 9326 +2.6% 9525 perf-stat.ps.context-switches 8.584e+10 +36.3% 1.17e+11 +0.1% 8.594e+10 perf-stat.ps.cpu-cycles 506.29 -2.0% 496.05 -0.7% 502.50 perf-stat.ps.cpu-migrations 2.395e+10 -5.9% 2.254e+10 -0.1% 2.393e+10 perf-stat.ps.dTLB-loads 4059043 -6.2% 3806002 -1.1% 4012385 perf-stat.ps.dTLB-store-misses 6.688e+09 -5.7% 6.308e+09 +0.4% 6.714e+09 perf-stat.ps.dTLB-stores 17944396 -21.8% 14028927 -14.9% 15276611 perf-stat.ps.iTLB-load-misses 2534093 -2.7% 2465233 +0.2% 2538321 perf-stat.ps.iTLB-loads 8.575e+10 -6.0% 8.059e+10 -0.2% 8.561e+10 perf-stat.ps.instructions 1.60 ± 3% -23.2% 1.23 ± 4% -3.9% 1.54 ± 2% perf-stat.ps.major-faults 3726053 -7.7% 3439511 -1.4% 3674617 perf-stat.ps.minor-faults 2942507 ± 4% +52.0% 4472428 -2.9% 2857607 ± 6% perf-stat.ps.node-load-misses 1980077 ± 4% -10.4% 1774633 +0.5% 1989918 ± 6% perf-stat.ps.node-loads 13780660 ± 2% +6.8% 14716100 ± 3% +0.6% 13865246 ± 3% perf-stat.ps.node-stores 3726055 -7.7% 3439513 -1.4% 3674618 perf-stat.ps.page-faults 2.447e+13 -0.2% 2.443e+13 +1.2% 2.477e+13 perf-stat.total.instructions 37.11 -6.7 30.40 ± 6% +3.0 40.09 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter 21.14 -3.8 17.36 ± 7% +3.1 24.20 ± 2% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 21.05 -3.8 17.29 ± 7% +3.0 24.08 ± 2% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 21.05 -3.8 17.29 ± 7% +3.0 24.09 ± 2% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 21.05 -3.8 17.29 ± 7% +3.0 24.09 ± 2% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify 21.00 -3.8 17.25 ± 7% +3.0 24.02 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 20.70 -3.7 17.00 ± 7% +2.9 23.59 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 20.69 -3.7 16.99 ± 7% +2.9 23.57 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 20.64 -3.7 16.95 ± 7% +2.8 23.48 perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 9.51 ± 3% -1.9 7.57 ± 2% -1.1 8.44 perf-profile.calltrace.cycles-pp.do_rw_once 4.54 -1.4 3.19 -0.5 4.08 ± 2% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault 2.83 -0.9 1.96 -0.3 2.55 ± 2% perf-profile.calltrace.cycles-pp.next_uptodate_folio.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault 0.75 ± 2% -0.6 0.17 ±141% -0.1 0.68 ± 3% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio 3.90 -0.6 3.34 ± 5% -0.5 3.37 ± 2% perf-profile.calltrace.cycles-pp.clear_page_erms.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault 4.44 ± 6% -0.5 3.98 ± 3% -0.8 3.68 ± 2% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault 2.96 ± 4% -0.4 2.52 ± 26% +2.4 5.40 ± 4% perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 1.17 ± 3% -0.4 0.73 ± 6% -0.2 1.01 ± 4% perf-profile.calltrace.cycles-pp.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault 1.42 ± 2% -0.4 0.99 ± 2% -0.1 1.28 ± 2% perf-profile.calltrace.cycles-pp.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault 1.32 ± 2% -0.4 0.91 -0.1 1.19 ± 2% perf-profile.calltrace.cycles-pp.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault 0.61 ± 6% -0.4 0.23 ±141% +0.5 1.09 ± 3% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues 1.19 ± 2% -0.4 0.82 -0.1 1.07 ± 3% perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp 2.24 ± 6% -0.3 1.91 ± 24% +1.8 4.08 ± 4% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state 0.92 ± 12% -0.3 0.60 ± 74% +0.8 1.69 ± 7% perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt 0.96 ± 2% -0.3 0.65 ± 2% -0.1 0.86 ± 3% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio 0.98 ± 2% -0.3 0.68 ± 4% -0.1 0.89 ± 2% perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.do_access 0.76 ± 7% -0.3 0.49 ± 75% +0.6 1.37 ± 4% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt 0.77 ± 7% -0.3 0.49 ± 75% +0.6 1.38 ± 4% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt 0.80 ± 17% -0.2 0.57 ± 74% +0.8 1.63 ± 12% perf-profile.calltrace.cycles-pp.release_pages.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict 1.99 ± 6% -0.2 1.77 ± 21% +1.5 3.44 ± 4% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm 1.99 ± 6% -0.2 1.77 ± 21% +1.5 3.44 ± 4% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm 1.99 ± 6% -0.2 1.77 ± 21% +1.5 3.44 ± 4% perf-profile.calltrace.cycles-pp.ret_from_fork_asm 1.95 ± 6% -0.2 1.74 ± 21% +1.4 3.37 ± 4% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 1.60 ± 7% -0.2 1.38 ± 25% +1.4 3.00 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter 1.60 ± 7% -0.2 1.38 ± 25% +1.4 2.99 ± 3% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt 1.96 ± 6% -0.2 1.74 ± 21% +1.4 3.39 ± 4% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 1.92 ± 6% -0.2 1.71 ± 21% +1.4 3.32 ± 4% perf-profile.calltrace.cycles-pp.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread.ret_from_fork 1.92 ± 6% -0.2 1.71 ± 21% +1.4 3.32 ± 4% perf-profile.calltrace.cycles-pp.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.calltrace.cycles-pp.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlinkat 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.calltrace.cycles-pp.unlinkat 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64 2.17 ± 17% -0.2 1.97 ± 30% +2.2 4.38 ± 12% perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_evict_inode.evict.do_unlinkat.__x64_sys_unlinkat 1.81 ± 6% -0.2 1.61 ± 22% +1.4 3.18 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread 1.81 ± 6% -0.2 1.61 ± 22% +1.4 3.18 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work 1.81 ± 6% -0.2 1.61 ± 22% +1.4 3.18 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work 1.76 ± 6% -0.2 1.57 ± 22% +1.3 3.10 ± 4% perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm 1.79 ± 6% -0.2 1.60 ± 22% +1.4 3.15 ± 4% perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail 1.79 ± 6% -0.2 1.60 ± 22% +1.4 3.15 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit 1.79 ± 6% -0.2 1.60 ± 22% +1.4 3.15 ± 4% perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail 1.79 ± 6% -0.2 1.60 ± 22% +1.4 3.15 ± 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit 1.80 ± 6% -0.2 1.60 ± 22% +1.4 3.16 ± 4% perf-profile.calltrace.cycles-pp.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty 1.80 ± 6% -0.2 1.60 ± 22% +1.4 3.16 ± 4% perf-profile.calltrace.cycles-pp.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb 1.36 ± 10% -0.2 1.18 ± 28% +1.2 2.56 ± 5% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 0.83 ± 17% -0.2 0.67 ± 53% +0.8 1.66 ± 12% perf-profile.calltrace.cycles-pp.filemap_remove_folio.truncate_inode_folio.shmem_undo_range.shmem_evict_inode.evict 0.81 ± 17% -0.1 0.66 ± 53% +0.8 1.65 ± 12% perf-profile.calltrace.cycles-pp.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict.do_unlinkat 1.99 ± 5% -0.1 1.84 ± 22% +1.4 3.40 ± 5% perf-profile.calltrace.cycles-pp.write 1.98 ± 5% -0.1 1.84 ± 22% +1.4 3.40 ± 5% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 1.98 ± 5% -0.1 1.84 ± 22% +1.4 3.40 ± 5% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 1.98 ± 5% -0.1 1.84 ± 22% +1.4 3.40 ± 5% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 1.98 ± 5% -0.1 1.84 ± 22% +1.4 3.40 ± 5% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 1.96 ± 5% -0.1 1.83 ± 22% +1.4 3.37 ± 5% perf-profile.calltrace.cycles-pp.console_flush_all.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write 1.96 ± 5% -0.1 1.83 ± 22% +1.4 3.37 ± 5% perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write 1.96 ± 5% -0.1 1.83 ± 22% +1.4 3.37 ± 5% perf-profile.calltrace.cycles-pp.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write.ksys_write 1.96 ± 5% -0.1 1.83 ± 22% +1.4 3.37 ± 5% perf-profile.calltrace.cycles-pp.devkmsg_emit.devkmsg_write.vfs_write.ksys_write.do_syscall_64 1.96 ± 5% -0.1 1.83 ± 22% +1.4 3.37 ± 5% perf-profile.calltrace.cycles-pp.devkmsg_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.80 ± 5% -0.1 1.69 ± 22% +1.3 3.13 ± 5% perf-profile.calltrace.cycles-pp.serial8250_console_write.console_flush_all.console_unlock.vprintk_emit.devkmsg_emit 0.55 ± 47% -0.1 0.45 ± 74% +0.7 1.27 ± 12% perf-profile.calltrace.cycles-pp.__filemap_remove_folio.filemap_remove_folio.truncate_inode_folio.shmem_undo_range.shmem_evict_inode 0.98 ± 17% -0.1 0.89 ± 30% +1.0 1.95 ± 12% perf-profile.calltrace.cycles-pp.truncate_inode_folio.shmem_undo_range.shmem_evict_inode.evict.do_unlinkat 1.38 ± 5% -0.1 1.31 ± 20% +1.0 2.38 ± 6% perf-profile.calltrace.cycles-pp.wait_for_lsr.serial8250_console_write.console_flush_all.console_unlock.vprintk_emit 1.18 ± 5% -0.1 1.12 ± 20% +0.9 2.04 ± 6% perf-profile.calltrace.cycles-pp.io_serial_in.wait_for_lsr.serial8250_console_write.console_flush_all.console_unlock 0.00 +0.0 0.00 +0.6 0.59 ± 13% perf-profile.calltrace.cycles-pp.ktime_get.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt 0.00 +0.0 0.00 +0.6 0.61 ± 11% perf-profile.calltrace.cycles-pp.find_lock_entries.shmem_undo_range.shmem_evict_inode.evict.do_unlinkat 0.00 +0.0 0.00 +0.7 0.66 ± 8% perf-profile.calltrace.cycles-pp.perf_adjust_freq_unthr_context.perf_event_task_tick.scheduler_tick.update_process_times.tick_sched_handle 0.00 +0.0 0.00 +0.7 0.66 ± 5% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt 0.00 +0.0 0.00 +0.7 0.66 ± 8% perf-profile.calltrace.cycles-pp.perf_event_task_tick.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler 0.00 +0.0 0.00 +0.8 0.76 ± 12% perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.__folio_batch_release.shmem_undo_range.shmem_evict_inode 0.00 +0.1 0.09 ±223% +0.8 0.80 ± 12% perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt 0.00 +0.1 0.09 ±223% +0.8 0.82 ± 7% perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter 1.21 +0.5 1.69 ± 5% -0.2 1.04 ± 5% perf-profile.calltrace.cycles-pp.__munmap 1.20 +0.5 1.68 ± 5% -0.2 1.03 ± 5% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 1.20 +0.5 1.68 ± 5% -0.2 1.03 ± 5% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 1.20 +0.5 1.68 ± 5% -0.2 1.03 ± 5% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap 1.21 +0.5 1.69 ± 5% -0.2 1.04 ± 5% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 1.21 +0.5 1.69 ± 5% -0.2 1.04 ± 5% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 1.21 +0.5 1.69 ± 5% -0.2 1.04 ± 5% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 1.21 +0.5 1.69 ± 5% -0.2 1.04 ± 5% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 1.21 +0.5 1.69 ± 5% -0.2 1.04 ± 5% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.21 +0.5 1.69 ± 5% -0.2 1.04 ± 5% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 1.20 +0.5 1.69 ± 5% -0.2 1.04 ± 5% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 1.18 +0.5 1.67 ± 6% -0.2 1.01 ± 6% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 0.84 ± 2% +0.6 1.43 ± 5% -0.1 0.78 ± 2% perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp 0.58 ± 3% +0.6 1.18 ± 5% -0.2 0.36 ± 71% perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 0.00 +0.8 0.79 ± 4% +0.0 0.00 perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range 0.00 +1.0 1.02 ± 5% +0.0 0.00 perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio 0.00 +1.1 1.08 ± 4% +0.0 0.00 perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range 0.00 +1.5 1.46 ± 5% +0.0 0.00 perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 3.29 ± 3% +1.9 5.19 -0.3 3.00 ± 2% perf-profile.calltrace.cycles-pp.finish_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault 3.02 ± 4% +2.0 5.00 -0.3 2.77 ± 2% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_read_fault.do_fault.__handle_mm_fault 2.84 ± 4% +2.0 4.86 -0.2 2.60 ± 2% perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault.do_fault 2.73 ± 4% +2.0 4.77 -0.2 2.50 ± 2% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault 1.48 ± 4% +2.1 3.56 ± 2% -0.1 1.36 ± 3% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.folio_add_file_rmap_range.set_pte_range.finish_fault 0.57 ± 4% +2.8 3.35 ± 2% -0.2 0.36 ± 70% perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp 1.96 ± 5% +2.9 4.86 ± 2% -0.3 1.65 ± 3% perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault 3.65 ± 2% +3.1 6.77 ± 2% -0.4 3.29 ± 3% perf-profile.calltrace.cycles-pp.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault 0.80 ± 4% +3.1 3.92 ± 3% -0.0 0.77 ± 4% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp 2.68 ± 3% +3.4 6.08 ± 2% -0.2 2.48 ± 5% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault 7.71 ± 6% +3.9 11.66 ± 2% -1.3 6.46 ± 2% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault 67.18 +6.3 73.46 ± 3% -8.0 59.16 perf-profile.calltrace.cycles-pp.do_access 1.46 ± 9% +7.1 8.57 ± 16% -0.0 1.44 ± 10% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio 1.50 ± 9% +7.1 8.61 ± 16% -0.0 1.48 ± 10% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp 1.38 ± 10% +7.1 8.51 ± 16% -0.0 1.38 ± 11% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru 51.46 +7.6 59.08 ± 3% -5.7 45.73 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access 2.98 ± 5% +7.7 10.66 ± 14% -0.1 2.84 ± 5% perf-profile.calltrace.cycles-pp.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault 2.84 ± 6% +7.7 10.56 ± 14% -0.1 2.72 ± 5% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault 34.18 +8.5 42.68 ± 4% -4.1 30.12 perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault 34.14 +8.5 42.64 ± 4% -4.1 30.09 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault 33.95 +8.6 42.51 ± 4% -4.0 29.91 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault 42.88 +8.8 51.70 ± 4% -4.9 38.00 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 42.34 +9.0 51.30 ± 4% -4.8 37.50 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 42.29 +9.0 51.28 ± 4% -4.8 37.46 perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 45.07 +9.6 54.62 ± 4% -5.1 39.97 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access 44.95 +9.6 54.53 ± 4% -5.1 39.85 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 43.72 +9.9 53.64 ± 4% -5.0 38.76 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 17.28 ± 2% +13.8 31.05 ± 6% -2.1 15.20 ± 2% perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault 21.14 -3.8 17.36 ± 7% +3.1 24.20 ± 2% perf-profile.children.cycles-pp.cpu_startup_entry 21.14 -3.8 17.36 ± 7% +3.1 24.20 ± 2% perf-profile.children.cycles-pp.do_idle 21.14 -3.8 17.36 ± 7% +3.1 24.20 ± 2% perf-profile.children.cycles-pp.secondary_startup_64_no_verify 21.09 -3.8 17.33 ± 7% +3.0 24.13 ± 2% perf-profile.children.cycles-pp.cpuidle_idle_call 21.05 -3.8 17.29 ± 7% +3.0 24.09 ± 2% perf-profile.children.cycles-pp.start_secondary 20.79 -3.7 17.07 ± 7% +2.9 23.69 perf-profile.children.cycles-pp.cpuidle_enter 20.78 -3.7 17.07 ± 7% +2.9 23.67 perf-profile.children.cycles-pp.cpuidle_enter_state 20.71 -3.7 17.01 ± 7% +2.9 23.57 perf-profile.children.cycles-pp.acpi_safe_halt 20.72 -3.7 17.02 ± 7% +2.9 23.59 perf-profile.children.cycles-pp.acpi_idle_enter 20.79 -3.6 17.19 ± 6% +2.6 23.34 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 11.52 -3.1 8.42 -1.2 10.31 ± 2% perf-profile.children.cycles-pp.do_rw_once 4.62 -1.4 3.24 -0.5 4.16 ± 2% perf-profile.children.cycles-pp.filemap_map_pages 2.89 -0.9 2.00 -0.3 2.61 ± 3% perf-profile.children.cycles-pp.next_uptodate_folio 3.98 -0.6 3.39 ± 5% -0.5 3.43 ± 2% perf-profile.children.cycles-pp.clear_page_erms 4.46 ± 6% -0.5 3.99 ± 3% -0.8 3.70 ± 2% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm 1.18 ± 4% -0.4 0.74 ± 6% -0.2 1.02 ± 4% perf-profile.children.cycles-pp.shmem_inode_acct_blocks 1.44 ± 2% -0.4 1.00 ± 2% -0.1 1.29 ± 2% perf-profile.children.cycles-pp.shmem_alloc_folio 1.40 -0.4 0.99 -0.1 1.26 ± 2% perf-profile.children.cycles-pp.alloc_pages_mpol 6.86 -0.4 6.47 ± 5% -0.8 6.07 ± 3% perf-profile.children.cycles-pp.native_irq_return_iret 1.27 -0.4 0.90 -0.1 1.14 ± 3% perf-profile.children.cycles-pp.__alloc_pages 3.06 ± 4% -0.4 2.68 ± 17% +1.9 4.91 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 1.01 ± 2% -0.3 0.68 -0.1 0.89 ± 3% perf-profile.children.cycles-pp.get_page_from_freelist 1.02 ± 2% -0.3 0.70 ± 4% -0.1 0.93 ± 2% perf-profile.children.cycles-pp.sync_regs 2.34 ± 5% -0.3 2.09 ± 15% +1.3 3.69 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.77 ± 2% -0.3 0.51 -0.1 0.70 ± 3% perf-profile.children.cycles-pp.rmqueue 2.34 ± 5% -0.3 2.08 ± 15% +1.3 3.68 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt 2.00 ± 6% -0.2 1.78 ± 21% +1.5 3.45 ± 4% perf-profile.children.cycles-pp.ret_from_fork 2.00 ± 6% -0.2 1.78 ± 21% +1.5 3.45 ± 4% perf-profile.children.cycles-pp.ret_from_fork_asm 1.99 ± 6% -0.2 1.77 ± 21% +1.5 3.44 ± 4% perf-profile.children.cycles-pp.kthread 1.95 ± 6% -0.2 1.74 ± 21% +1.4 3.37 ± 4% perf-profile.children.cycles-pp.process_one_work 0.81 ± 2% -0.2 0.60 -0.1 0.72 ± 4% perf-profile.children.cycles-pp.__perf_sw_event 2.04 ± 7% -0.2 1.83 ± 17% +1.2 3.21 ± 5% perf-profile.children.cycles-pp.__hrtimer_run_queues 1.96 ± 6% -0.2 1.74 ± 21% +1.4 3.39 ± 4% perf-profile.children.cycles-pp.worker_thread 1.92 ± 6% -0.2 1.71 ± 21% +1.4 3.32 ± 4% perf-profile.children.cycles-pp.drm_fb_helper_damage_work 1.92 ± 6% -0.2 1.71 ± 21% +1.4 3.32 ± 4% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.children.cycles-pp.__x64_sys_unlinkat 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.children.cycles-pp.do_unlinkat 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.children.cycles-pp.evict 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.children.cycles-pp.unlinkat 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.children.cycles-pp.shmem_evict_inode 2.18 ± 17% -0.2 1.97 ± 30% +2.2 4.39 ± 12% perf-profile.children.cycles-pp.shmem_undo_range 1.81 ± 6% -0.2 1.61 ± 22% +1.4 3.18 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb 1.81 ± 6% -0.2 1.61 ± 22% +1.4 3.18 ± 4% perf-profile.children.cycles-pp.drm_atomic_commit 1.81 ± 6% -0.2 1.61 ± 22% +1.4 3.18 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit 0.53 ± 3% -0.2 0.34 ± 2% -0.1 0.47 ± 4% perf-profile.children.cycles-pp.__rmqueue_pcplist 1.79 ± 6% -0.2 1.60 ± 22% +1.4 3.15 ± 4% perf-profile.children.cycles-pp.drm_fb_memcpy 1.79 ± 6% -0.2 1.60 ± 22% +1.4 3.15 ± 4% perf-profile.children.cycles-pp.memcpy_toio 1.79 ± 6% -0.2 1.60 ± 22% +1.4 3.15 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail_rpm 1.79 ± 6% -0.2 1.60 ± 22% +1.4 3.15 ± 4% perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update 1.79 ± 6% -0.2 1.60 ± 22% +1.4 3.15 ± 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes 1.80 ± 6% -0.2 1.60 ± 22% +1.4 3.16 ± 4% perf-profile.children.cycles-pp.commit_tail 1.80 ± 6% -0.2 1.60 ± 22% +1.4 3.16 ± 4% perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail 1.50 ± 9% -0.2 1.31 ± 18% +0.7 2.23 ± 6% perf-profile.children.cycles-pp.tick_nohz_highres_handler 0.68 ± 2% -0.2 0.50 ± 5% +0.0 0.69 ± 3% perf-profile.children.cycles-pp.__mod_lruvec_state 0.65 ± 6% -0.2 0.47 ± 2% +0.0 0.69 ± 5% perf-profile.children.cycles-pp._raw_spin_lock 0.47 ± 3% -0.2 0.29 ± 2% -0.1 0.42 ± 3% perf-profile.children.cycles-pp.rmqueue_bulk 1.32 ± 5% -0.2 1.16 ± 15% +0.6 1.88 ± 3% perf-profile.children.cycles-pp.update_process_times 0.65 ± 2% -0.2 0.49 -0.1 0.58 ± 4% perf-profile.children.cycles-pp.___perf_sw_event 1.32 ± 5% -0.2 1.16 ± 15% +0.6 1.89 ± 3% perf-profile.children.cycles-pp.tick_sched_handle 0.64 ± 4% -0.1 0.49 ± 5% +0.1 0.71 ± 2% perf-profile.children.cycles-pp.xas_load 0.54 -0.1 0.39 ± 4% -0.0 0.53 ± 3% perf-profile.children.cycles-pp.__mod_node_page_state 2.00 ± 4% -0.1 1.86 ± 21% +1.4 3.40 ± 5% perf-profile.children.cycles-pp.vprintk_emit 1.99 ± 5% -0.1 1.85 ± 21% +1.4 3.40 ± 5% perf-profile.children.cycles-pp.write 0.49 ± 2% -0.1 0.35 ± 3% -0.0 0.46 ± 4% perf-profile.children.cycles-pp.lock_vma_under_rcu 0.54 ± 5% -0.1 0.40 ± 2% +0.0 0.56 ± 2% perf-profile.children.cycles-pp.xas_find 1.96 ± 5% -0.1 1.83 ± 22% +1.4 3.37 ± 5% perf-profile.children.cycles-pp.devkmsg_emit 1.96 ± 5% -0.1 1.83 ± 22% +1.4 3.37 ± 5% perf-profile.children.cycles-pp.devkmsg_write 1.98 ± 4% -0.1 1.85 ± 21% +1.4 3.40 ± 5% perf-profile.children.cycles-pp.console_flush_all 1.98 ± 4% -0.1 1.85 ± 21% +1.4 3.40 ± 5% perf-profile.children.cycles-pp.console_unlock 2.00 ± 4% -0.1 1.88 ± 22% +1.4 3.42 ± 5% perf-profile.children.cycles-pp.ksys_write 2.00 ± 4% -0.1 1.88 ± 22% +1.4 3.42 ± 5% perf-profile.children.cycles-pp.vfs_write 0.39 ± 4% -0.1 0.28 ± 3% -0.0 0.34 ± 6% perf-profile.children.cycles-pp.__pte_offset_map_lock 1.07 ± 5% -0.1 0.96 ± 14% +0.4 1.51 ± 2% perf-profile.children.cycles-pp.scheduler_tick 1.82 ± 5% -0.1 1.71 ± 21% +1.3 3.16 ± 6% perf-profile.children.cycles-pp.serial8250_console_write 0.39 ± 3% -0.1 0.29 ± 3% +0.0 0.42 ± 4% perf-profile.children.cycles-pp.xas_descend 0.32 ± 4% -0.1 0.22 ± 8% -0.0 0.27 ± 5% perf-profile.children.cycles-pp.__dquot_alloc_space 0.56 ± 8% -0.1 0.47 ± 15% +0.3 0.82 ± 7% perf-profile.children.cycles-pp.xas_store 1.64 ± 4% -0.1 1.55 ± 21% +1.2 2.85 ± 6% perf-profile.children.cycles-pp.wait_for_lsr 0.98 ± 17% -0.1 0.89 ± 30% +1.0 1.95 ± 12% perf-profile.children.cycles-pp.truncate_inode_folio 0.52 ± 7% -0.1 0.43 ± 29% +0.4 0.95 ± 5% perf-profile.children.cycles-pp.irq_exit_rcu 1.39 ± 5% -0.1 1.31 ± 22% +1.0 2.42 ± 6% perf-profile.children.cycles-pp.io_serial_in 0.30 ± 3% -0.1 0.22 ± 3% -0.0 0.28 ± 3% perf-profile.children.cycles-pp.mas_walk 1.06 ± 13% -0.1 0.98 ± 22% +0.8 1.87 ± 10% perf-profile.children.cycles-pp.release_pages 0.84 ± 17% -0.1 0.76 ± 29% +0.8 1.68 ± 12% perf-profile.children.cycles-pp.filemap_remove_folio 0.20 ± 13% -0.1 0.13 ± 5% -0.0 0.16 ± 13% perf-profile.children.cycles-pp.shmem_recalc_inode 0.44 ± 4% -0.1 0.36 ± 28% +0.4 0.79 ± 4% perf-profile.children.cycles-pp.__do_softirq 0.81 ± 17% -0.1 0.74 ± 30% +0.8 1.65 ± 12% perf-profile.children.cycles-pp.__folio_batch_release 0.26 ± 2% -0.1 0.19 ± 3% -0.0 0.23 ± 3% perf-profile.children.cycles-pp.filemap_get_entry 0.18 ± 5% -0.1 0.12 ± 5% -0.0 0.15 ± 6% perf-profile.children.cycles-pp.xas_find_conflict 0.55 ± 4% -0.1 0.49 ± 14% +0.3 0.82 ± 6% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context 0.55 ± 4% -0.1 0.49 ± 14% +0.3 0.82 ± 6% perf-profile.children.cycles-pp.perf_event_task_tick 0.28 ± 4% -0.1 0.22 ± 8% +0.0 0.28 ± 12% perf-profile.children.cycles-pp.execve 0.28 ± 4% -0.1 0.22 ± 8% +0.0 0.28 ± 12% perf-profile.children.cycles-pp.__x64_sys_execve 0.28 ± 4% -0.1 0.22 ± 8% +0.0 0.28 ± 12% perf-profile.children.cycles-pp.do_execveat_common 0.29 ± 3% -0.1 0.24 ± 8% -0.0 0.26 ± 11% perf-profile.children.cycles-pp.asm_sysvec_call_function_single 0.63 ± 17% -0.1 0.58 ± 30% +0.6 1.27 ± 12% perf-profile.children.cycles-pp.__filemap_remove_folio 0.16 ± 5% -0.1 0.11 ± 8% -0.0 0.16 ± 3% perf-profile.children.cycles-pp.error_entry 0.25 ± 7% -0.0 0.20 ± 22% +0.1 0.37 ± 9% perf-profile.children.cycles-pp._raw_spin_trylock 0.15 ± 5% -0.0 0.10 ± 10% -0.0 0.12 ± 8% perf-profile.children.cycles-pp.inode_add_bytes 0.14 ± 5% -0.0 0.09 ± 8% -0.0 0.12 ± 13% perf-profile.children.cycles-pp.__percpu_counter_limited_add 0.07 ± 6% -0.0 0.02 ± 99% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.__folio_throttle_swaprate 0.40 ± 4% -0.0 0.36 ± 13% +0.2 0.60 ± 8% perf-profile.children.cycles-pp.__intel_pmu_enable_all 0.06 ± 7% -0.0 0.02 ±142% +0.0 0.09 ± 4% perf-profile.children.cycles-pp.read 0.39 ± 17% -0.0 0.34 ± 30% +0.4 0.77 ± 12% perf-profile.children.cycles-pp.free_unref_page_list 0.10 -0.0 0.06 ± 13% -0.0 0.10 ± 11% perf-profile.children.cycles-pp.security_vm_enough_memory_mm 0.18 ± 7% -0.0 0.14 ± 13% +0.1 0.24 ± 11% perf-profile.children.cycles-pp._raw_spin_lock_irq 0.16 ± 5% -0.0 0.12 -0.0 0.15 ± 4% perf-profile.children.cycles-pp.handle_pte_fault 0.17 ± 7% -0.0 0.12 ± 4% +0.0 0.17 ± 4% perf-profile.children.cycles-pp.xas_start 0.14 ± 6% -0.0 0.10 ± 3% -0.0 0.12 ± 6% perf-profile.children.cycles-pp.__pte_offset_map 0.26 ± 8% -0.0 0.22 ± 33% +0.2 0.48 ± 4% perf-profile.children.cycles-pp.rebalance_domains 0.31 ± 16% -0.0 0.27 ± 28% +0.3 0.62 ± 11% perf-profile.children.cycles-pp.find_lock_entries 0.07 ± 5% -0.0 0.03 ± 70% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.policy_nodemask 0.06 ± 9% -0.0 0.02 ±141% +0.0 0.09 ± 13% perf-profile.children.cycles-pp.lapic_next_deadline 0.16 ± 4% -0.0 0.13 ± 12% -0.0 0.13 ± 4% perf-profile.children.cycles-pp.folio_mark_accessed 0.19 ± 4% -0.0 0.16 ± 8% -0.0 0.19 ± 10% perf-profile.children.cycles-pp.bprm_execve 0.11 ± 9% -0.0 0.08 ± 6% -0.0 0.10 ± 12% perf-profile.children.cycles-pp.down_read_trylock 0.16 ± 6% -0.0 0.13 ± 5% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.percpu_counter_add_batch 0.11 ± 6% -0.0 0.08 ± 6% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.up_read 0.15 ± 7% -0.0 0.12 ± 13% +0.0 0.20 ± 8% perf-profile.children.cycles-pp.folio_unlock 0.10 ± 4% -0.0 0.07 ± 6% +0.0 0.11 ± 13% perf-profile.children.cycles-pp.__libc_fork 0.07 ± 6% -0.0 0.04 ± 45% +0.0 0.10 ± 7% perf-profile.children.cycles-pp.ksys_read 0.10 ± 3% -0.0 0.07 ± 11% +0.0 0.10 ± 11% perf-profile.children.cycles-pp.kernel_clone 0.09 ± 5% -0.0 0.06 ± 7% +0.0 0.10 ± 7% perf-profile.children.cycles-pp.mem_cgroup_update_lru_size 0.07 ± 15% -0.0 0.05 ± 72% +0.0 0.11 ± 12% perf-profile.children.cycles-pp.rcu_pending 0.14 ± 17% -0.0 0.11 ± 32% +0.1 0.26 ± 11% perf-profile.children.cycles-pp.xas_clear_mark 0.25 ± 17% -0.0 0.22 ± 30% +0.2 0.50 ± 12% perf-profile.children.cycles-pp.free_unref_page_commit 0.17 ± 3% -0.0 0.14 ± 23% +0.1 0.29 ± 6% perf-profile.children.cycles-pp.load_balance 0.08 ± 8% -0.0 0.06 ± 11% -0.0 0.08 ± 12% perf-profile.children.cycles-pp.path_openat 0.09 ± 5% -0.0 0.06 ± 11% +0.0 0.09 ± 12% perf-profile.children.cycles-pp.__x64_sys_openat 0.08 ± 8% -0.0 0.06 ± 11% +0.0 0.08 ± 16% perf-profile.children.cycles-pp.do_filp_open 0.07 -0.0 0.04 ± 45% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.vfs_read 0.10 ± 6% -0.0 0.08 ± 6% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.pte_offset_map_nolock 0.09 ± 4% -0.0 0.06 ± 7% +0.0 0.09 ± 9% perf-profile.children.cycles-pp.__do_sys_clone 0.14 ± 10% -0.0 0.11 ± 15% +0.1 0.22 ± 14% perf-profile.children.cycles-pp.perf_rotate_context 0.13 ± 18% -0.0 0.10 ± 33% +0.1 0.23 ± 13% perf-profile.children.cycles-pp.irqtime_account_irq 0.08 ± 8% -0.0 0.06 ± 11% +0.0 0.09 ± 12% perf-profile.children.cycles-pp.do_sys_openat2 0.07 ± 5% -0.0 0.04 ± 45% +0.0 0.07 ± 10% perf-profile.children.cycles-pp.copy_process 0.07 ± 17% -0.0 0.04 ± 75% +0.1 0.13 ± 11% perf-profile.children.cycles-pp.filemap_free_folio 0.10 ± 20% -0.0 0.08 ± 29% +0.1 0.18 ± 15% perf-profile.children.cycles-pp.xas_init_marks 0.13 ± 5% -0.0 0.11 ± 23% +0.1 0.22 ± 8% perf-profile.children.cycles-pp.update_sd_lb_stats 0.16 ± 5% -0.0 0.14 ± 6% +0.0 0.16 ± 10% perf-profile.children.cycles-pp.exec_binprm 0.16 ± 4% -0.0 0.13 ± 21% +0.1 0.23 ± 8% perf-profile.children.cycles-pp.fbcon_redraw 0.16 ± 4% -0.0 0.13 ± 21% +0.1 0.23 ± 8% perf-profile.children.cycles-pp.con_scroll 0.16 ± 4% -0.0 0.13 ± 21% +0.1 0.23 ± 8% perf-profile.children.cycles-pp.fbcon_scroll 0.16 ± 4% -0.0 0.13 ± 21% +0.1 0.23 ± 8% perf-profile.children.cycles-pp.lf 0.10 ± 6% -0.0 0.08 ± 7% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.__vm_enough_memory 0.16 ± 4% -0.0 0.14 ± 6% +0.0 0.16 ± 10% perf-profile.children.cycles-pp.search_binary_handler 0.09 ± 5% -0.0 0.07 ± 7% -0.0 0.07 ± 6% perf-profile.children.cycles-pp._compound_head 0.08 -0.0 0.06 ± 9% -0.0 0.07 ± 9% perf-profile.children.cycles-pp.__irqentry_text_end 0.09 ± 4% -0.0 0.07 ± 14% +0.0 0.12 ± 11% perf-profile.children.cycles-pp.__schedule 0.15 ± 5% -0.0 0.13 ± 7% -0.0 0.14 ± 8% perf-profile.children.cycles-pp.xas_create 0.06 ± 11% -0.0 0.04 ± 75% +0.0 0.09 ± 14% perf-profile.children.cycles-pp.trigger_load_balance 0.15 ± 5% -0.0 0.13 ± 20% +0.1 0.21 ± 8% perf-profile.children.cycles-pp.bit_putcs 0.16 ± 3% -0.0 0.14 ± 20% +0.1 0.24 ± 8% perf-profile.children.cycles-pp.vt_console_print 0.24 ± 6% -0.0 0.22 ± 34% +0.2 0.44 ± 5% perf-profile.children.cycles-pp.wait_for_xmitr 0.14 ± 3% -0.0 0.12 ± 24% +0.1 0.23 ± 8% perf-profile.children.cycles-pp.find_busiest_group 0.15 ± 4% -0.0 0.14 ± 8% +0.0 0.16 ± 9% perf-profile.children.cycles-pp.load_elf_binary 0.09 ± 11% -0.0 0.07 ± 26% +0.0 0.13 ± 11% perf-profile.children.cycles-pp.rcu_sched_clock_irq 0.15 ± 4% -0.0 0.13 ± 21% +0.1 0.22 ± 8% perf-profile.children.cycles-pp.fbcon_putcs 0.06 ± 8% -0.0 0.04 ± 72% +0.0 0.09 ± 6% perf-profile.children.cycles-pp.update_rq_clock_task 0.12 ± 4% -0.0 0.10 ± 22% +0.1 0.18 ± 9% perf-profile.children.cycles-pp.update_sg_lb_stats 0.18 ± 19% -0.0 0.17 ± 29% +0.2 0.36 ± 11% perf-profile.children.cycles-pp.free_pcppages_bulk 0.21 ± 7% -0.0 0.19 ± 5% +0.0 0.25 ± 6% perf-profile.children.cycles-pp.cgroup_rstat_updated 0.18 ± 9% -0.0 0.16 ± 19% +0.1 0.30 ± 5% perf-profile.children.cycles-pp.io_serial_out 0.58 ± 36% -0.0 0.56 ± 22% +0.4 1.00 ± 20% perf-profile.children.cycles-pp.ktime_get 0.12 ± 5% -0.0 0.10 ± 20% +0.1 0.18 ± 9% perf-profile.children.cycles-pp.fast_imageblit 0.12 ± 5% -0.0 0.10 ± 20% +0.1 0.18 ± 9% perf-profile.children.cycles-pp.sys_imageblit 0.06 ± 17% -0.0 0.05 ± 74% +0.1 0.13 ± 17% perf-profile.children.cycles-pp.truncate_cleanup_folio 0.08 ± 20% -0.0 0.06 ± 50% +0.1 0.16 ± 8% perf-profile.children.cycles-pp.free_unref_page_prepare 0.07 ± 5% -0.0 0.06 ± 11% +0.0 0.10 ± 13% perf-profile.children.cycles-pp.schedule 0.08 ± 16% -0.0 0.07 ± 16% +0.1 0.13 ± 23% perf-profile.children.cycles-pp.ktime_get_update_offsets_now 0.12 ± 5% -0.0 0.10 ± 23% +0.1 0.18 ± 9% perf-profile.children.cycles-pp.drm_fbdev_generic_defio_imageblit 0.16 ± 19% -0.0 0.14 ± 30% +0.1 0.30 ± 11% perf-profile.children.cycles-pp.__free_one_page 0.12 ± 4% -0.0 0.10 ± 3% -0.0 0.10 ± 7% perf-profile.children.cycles-pp.kmem_cache_alloc_lru 0.48 ± 12% -0.0 0.46 ± 15% +0.4 0.88 ± 11% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler 6.18 ± 6% -0.0 6.17 ± 17% +3.5 9.70 ± 5% perf-profile.children.cycles-pp.do_syscall_64 0.08 ± 5% -0.0 0.07 ± 21% +0.1 0.15 ± 12% perf-profile.children.cycles-pp.rcu_core 0.11 ± 11% -0.0 0.10 ± 15% +0.0 0.14 ± 5% perf-profile.children.cycles-pp.memcpy_orig 0.25 ± 6% -0.0 0.24 ± 22% +0.2 0.43 ± 13% perf-profile.children.cycles-pp.delay_tsc 0.06 ± 7% -0.0 0.06 ± 9% +0.0 0.09 ± 9% perf-profile.children.cycles-pp.__mod_zone_page_state 6.18 ± 6% -0.0 6.18 ± 17% +3.5 9.70 ± 5% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 0.02 ± 99% -0.0 0.02 ±141% +0.0 0.06 ± 6% perf-profile.children.cycles-pp.update_irq_load_avg 0.02 ± 99% -0.0 0.02 ±142% +0.1 0.09 ± 12% perf-profile.children.cycles-pp.run_rebalance_domains 0.02 ± 99% -0.0 0.02 ±142% +0.1 0.09 ± 12% perf-profile.children.cycles-pp.update_blocked_averages 0.03 ±100% -0.0 0.02 ±141% +0.1 0.09 ± 18% perf-profile.children.cycles-pp.irq_enter_rcu 0.07 ± 11% -0.0 0.06 ± 19% +0.1 0.13 ± 17% perf-profile.children.cycles-pp.rcu_do_batch 0.03 ± 70% -0.0 0.03 ±100% +0.1 0.09 ± 17% perf-profile.children.cycles-pp.uncharge_folio 0.08 ± 17% -0.0 0.08 ± 27% +0.1 0.16 ± 14% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list 0.01 ±223% +0.0 0.01 ±223% +0.1 0.06 ± 16% perf-profile.children.cycles-pp.uncharge_batch 0.00 +0.0 0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-pp.sched_clock 0.00 +0.0 0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp.native_sched_clock 0.00 +0.0 0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp.sched_clock_cpu 0.00 +0.0 0.00 +0.1 0.06 ± 19% perf-profile.children.cycles-pp.__slab_free 0.00 +0.0 0.00 +0.1 0.06 ± 11% perf-profile.children.cycles-pp.read_tsc 0.00 +0.0 0.00 +0.1 0.07 ± 10% perf-profile.children.cycles-pp.get_pfnblock_flags_mask 0.20 ± 18% +0.0 0.21 ± 30% +0.2 0.44 ± 13% perf-profile.children.cycles-pp.filemap_unaccount_folio 0.01 ±223% +0.0 0.02 ±141% +0.1 0.06 ± 16% perf-profile.children.cycles-pp.kmem_cache_free 0.05 ± 8% +0.0 0.08 ± 8% -0.0 0.03 ±100% perf-profile.children.cycles-pp.propagate_protected_usage 0.54 ± 2% +0.0 0.57 ± 4% -0.1 0.45 ± 5% perf-profile.children.cycles-pp.try_charge_memcg 0.25 ± 2% +0.0 0.30 ± 4% -0.0 0.22 ± 9% perf-profile.children.cycles-pp.page_counter_try_charge 0.02 ±141% +0.0 0.06 ± 7% +0.0 0.06 ± 9% perf-profile.children.cycles-pp.mod_objcg_state 0.00 +0.1 0.07 ± 14% +0.0 0.01 ±223% perf-profile.children.cycles-pp.tlb_finish_mmu 1.25 +0.5 1.72 ± 5% -0.2 1.09 ± 5% perf-profile.children.cycles-pp.unmap_vmas 1.24 +0.5 1.71 ± 5% -0.2 1.08 ± 5% perf-profile.children.cycles-pp.zap_pte_range 1.24 +0.5 1.71 ± 5% -0.2 1.08 ± 5% perf-profile.children.cycles-pp.unmap_page_range 1.24 +0.5 1.71 ± 5% -0.2 1.08 ± 5% perf-profile.children.cycles-pp.zap_pmd_range 1.21 +0.5 1.69 ± 5% -0.2 1.04 ± 5% perf-profile.children.cycles-pp.__munmap 1.22 +0.5 1.71 ± 5% -0.2 1.06 ± 5% perf-profile.children.cycles-pp.__vm_munmap 1.21 +0.5 1.70 ± 5% -0.2 1.05 ± 5% perf-profile.children.cycles-pp.__x64_sys_munmap 1.25 +0.5 1.74 ± 5% -0.2 1.08 ± 4% perf-profile.children.cycles-pp.do_vmi_align_munmap 1.25 +0.5 1.74 ± 5% -0.2 1.09 ± 5% perf-profile.children.cycles-pp.do_vmi_munmap 1.22 +0.5 1.72 ± 5% -0.2 1.06 ± 6% perf-profile.children.cycles-pp.unmap_region 0.85 ± 2% +0.6 1.44 ± 5% -0.1 0.79 ± 2% perf-profile.children.cycles-pp.lru_add_fn 0.60 ± 3% +0.6 1.20 ± 4% -0.1 0.54 ± 7% perf-profile.children.cycles-pp.page_remove_rmap 3.30 ± 3% +1.9 5.20 -0.3 3.02 perf-profile.children.cycles-pp.finish_fault 3.04 ± 4% +2.0 5.01 -0.2 2.79 ± 2% perf-profile.children.cycles-pp.set_pte_range 2.85 ± 4% +2.0 4.87 -0.2 2.61 ± 2% perf-profile.children.cycles-pp.folio_add_file_rmap_range 1.97 ± 5% +2.9 4.88 ± 2% -0.3 1.66 ± 2% perf-profile.children.cycles-pp.mem_cgroup_commit_charge 3.69 ± 2% +3.1 6.80 ± 2% -0.4 3.32 ± 3% perf-profile.children.cycles-pp.shmem_add_to_page_cache 7.74 ± 6% +3.9 11.69 ± 2% -1.3 6.48 ± 2% perf-profile.children.cycles-pp.__mem_cgroup_charge 0.80 ± 4% +4.0 4.85 ± 3% -0.1 0.74 ± 5% perf-profile.children.cycles-pp.__count_memcg_events 6.12 ± 3% +6.1 12.18 -0.3 5.85 ± 4% perf-profile.children.cycles-pp.__mod_lruvec_page_state 2.99 ± 3% +6.6 9.56 ± 2% +0.0 3.03 ± 2% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 61.44 +6.7 68.11 ± 3% -7.1 54.29 perf-profile.children.cycles-pp.do_access 1.58 ± 9% +7.1 8.72 ± 16% +0.0 1.59 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 1.45 ± 9% +7.2 8.63 ± 16% -0.0 1.45 ± 10% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 1.53 ± 9% +7.2 8.72 ± 16% -0.0 1.51 ± 10% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 2.98 ± 5% +7.7 10.67 ± 14% -0.1 2.84 ± 5% perf-profile.children.cycles-pp.folio_add_lru 2.86 ± 6% +7.8 10.63 ± 14% -0.1 2.74 ± 5% perf-profile.children.cycles-pp.folio_batch_move_lru 49.12 +8.3 57.47 ± 3% -5.5 43.65 perf-profile.children.cycles-pp.asm_exc_page_fault 34.19 +8.5 42.68 ± 4% -4.1 30.13 perf-profile.children.cycles-pp.__do_fault 34.15 +8.5 42.65 ± 4% -4.1 30.09 perf-profile.children.cycles-pp.shmem_fault 33.99 +8.6 42.54 ± 4% -4.0 29.95 perf-profile.children.cycles-pp.shmem_get_folio_gfp 43.06 +8.8 51.84 ± 4% -4.9 38.17 perf-profile.children.cycles-pp.__handle_mm_fault 42.43 +8.9 51.37 ± 4% -4.8 37.59 perf-profile.children.cycles-pp.do_fault 42.38 +9.0 51.34 ± 4% -4.8 37.55 perf-profile.children.cycles-pp.do_read_fault 45.26 +9.5 54.78 ± 4% -5.1 40.16 perf-profile.children.cycles-pp.exc_page_fault 45.15 +9.5 54.69 ± 4% -5.1 40.05 perf-profile.children.cycles-pp.do_user_addr_fault 43.91 +9.9 53.80 ± 4% -5.0 38.95 perf-profile.children.cycles-pp.handle_mm_fault 17.31 ± 2% +13.8 31.07 ± 5% -2.1 15.22 ± 2% perf-profile.children.cycles-pp.shmem_alloc_and_add_folio 12.24 -4.5 7.76 ± 3% -1.4 10.87 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp 17.96 -3.3 14.66 ± 4% +0.6 18.55 perf-profile.self.cycles-pp.acpi_safe_halt 10.95 -3.2 7.74 -1.1 9.82 ± 2% perf-profile.self.cycles-pp.do_rw_once 5.96 -1.4 4.58 ± 2% -0.7 5.29 perf-profile.self.cycles-pp.do_access 2.40 -0.8 1.64 -0.2 2.16 ± 3% perf-profile.self.cycles-pp.next_uptodate_folio 3.92 -0.6 3.36 ± 5% -0.5 3.39 ± 2% perf-profile.self.cycles-pp.clear_page_erms 4.40 ± 6% -0.5 3.95 ± 3% -0.8 3.65 ± 2% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm 1.52 ± 2% -0.4 1.10 ± 2% -0.2 1.36 ± 2% perf-profile.self.cycles-pp.filemap_map_pages 6.86 -0.4 6.47 ± 5% -0.8 6.07 ± 3% perf-profile.self.cycles-pp.native_irq_return_iret 1.02 ± 2% -0.3 0.70 ± 4% -0.1 0.92 ± 2% perf-profile.self.cycles-pp.sync_regs 0.50 ± 7% -0.2 0.27 ± 5% -0.1 0.41 ± 9% perf-profile.self.cycles-pp.shmem_inode_acct_blocks 1.78 ± 6% -0.2 1.58 ± 22% +1.4 3.14 ± 4% perf-profile.self.cycles-pp.memcpy_toio 0.63 ± 5% -0.2 0.46 ± 3% +0.0 0.67 ± 5% perf-profile.self.cycles-pp._raw_spin_lock 0.42 ± 2% -0.1 0.27 ± 2% -0.0 0.37 ± 4% perf-profile.self.cycles-pp.rmqueue_bulk 0.52 -0.1 0.38 ± 4% -0.0 0.50 ± 4% perf-profile.self.cycles-pp.__mod_node_page_state 0.56 ± 2% -0.1 0.42 -0.1 0.50 ± 4% perf-profile.self.cycles-pp.___perf_sw_event 0.31 ± 3% -0.1 0.20 ± 2% -0.1 0.24 ± 4% perf-profile.self.cycles-pp.shmem_add_to_page_cache 0.38 ± 4% -0.1 0.28 -0.0 0.35 ± 4% perf-profile.self.cycles-pp.__handle_mm_fault 0.36 ± 4% -0.1 0.26 ± 2% +0.0 0.39 ± 4% perf-profile.self.cycles-pp.xas_descend 0.30 ± 2% -0.1 0.22 ± 2% -0.0 0.27 ± 4% perf-profile.self.cycles-pp.mas_walk 1.39 ± 5% -0.1 1.31 ± 22% +1.0 2.42 ± 6% perf-profile.self.cycles-pp.io_serial_in 0.33 ± 3% -0.1 0.26 ± 10% -0.0 0.29 ± 5% perf-profile.self.cycles-pp.lru_add_fn 0.44 ± 9% -0.1 0.38 ± 17% +0.2 0.65 ± 6% perf-profile.self.cycles-pp.release_pages 0.20 ± 3% -0.1 0.14 ± 5% -0.0 0.18 ± 5% perf-profile.self.cycles-pp.asm_exc_page_fault 0.21 ± 5% -0.1 0.15 ± 6% -0.0 0.18 ± 2% perf-profile.self.cycles-pp.get_page_from_freelist 0.26 ± 9% -0.1 0.20 ± 15% +0.1 0.34 ± 8% perf-profile.self.cycles-pp.xas_store 0.16 ± 7% -0.1 0.11 ± 6% -0.0 0.14 ± 7% perf-profile.self.cycles-pp.__perf_sw_event 0.18 ± 2% -0.1 0.13 ± 5% -0.0 0.16 ± 6% perf-profile.self.cycles-pp.__alloc_pages 0.22 ± 4% -0.1 0.17 ± 4% -0.0 0.22 ± 3% perf-profile.self.cycles-pp.handle_mm_fault 0.20 ± 8% -0.1 0.14 ± 5% +0.0 0.22 ± 5% perf-profile.self.cycles-pp.xas_find 0.15 ± 6% -0.0 0.10 ± 7% -0.0 0.15 ± 3% perf-profile.self.cycles-pp.error_entry 0.25 ± 8% -0.0 0.20 ± 24% +0.1 0.37 ± 10% perf-profile.self.cycles-pp._raw_spin_trylock 0.40 ± 4% -0.0 0.36 ± 13% +0.2 0.60 ± 8% perf-profile.self.cycles-pp.__intel_pmu_enable_all 0.17 ± 2% -0.0 0.12 ± 6% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.__dquot_alloc_space 0.17 ± 6% -0.0 0.13 ± 10% +0.0 0.22 ± 10% perf-profile.self.cycles-pp._raw_spin_lock_irq 0.22 ± 4% -0.0 0.18 ± 9% +0.0 0.25 ± 3% perf-profile.self.cycles-pp.xas_load 0.23 ± 4% -0.0 0.19 ± 10% -0.0 0.20 ± 5% perf-profile.self.cycles-pp.zap_pte_range 0.12 ± 7% -0.0 0.08 ± 10% -0.0 0.11 ± 15% perf-profile.self.cycles-pp.__percpu_counter_limited_add 0.14 ± 3% -0.0 0.09 ± 7% -0.0 0.13 ± 10% perf-profile.self.cycles-pp.rmqueue 0.15 ± 2% -0.0 0.11 ± 6% -0.0 0.13 ± 7% perf-profile.self.cycles-pp.do_user_addr_fault 0.12 ± 7% -0.0 0.08 ± 5% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.folio_add_lru 0.15 ± 6% -0.0 0.10 ± 9% +0.0 0.16 ± 5% perf-profile.self.cycles-pp.__mod_lruvec_state 0.16 ± 7% -0.0 0.12 ± 4% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.xas_start 0.30 ± 10% -0.0 0.26 ± 29% +0.2 0.52 ± 10% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt 0.06 ± 7% -0.0 0.02 ± 99% -0.0 0.06 perf-profile.self.cycles-pp.finish_fault 0.16 ± 4% -0.0 0.12 ± 12% -0.0 0.13 ± 7% perf-profile.self.cycles-pp.folio_mark_accessed 0.11 ± 8% -0.0 0.08 ± 6% -0.0 0.10 ± 14% perf-profile.self.cycles-pp.__pte_offset_map_lock 0.13 ± 6% -0.0 0.09 ± 4% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.__pte_offset_map 0.06 ± 9% -0.0 0.02 ±141% +0.0 0.09 ± 13% perf-profile.self.cycles-pp.lapic_next_deadline 0.11 ± 9% -0.0 0.08 ± 6% -0.0 0.10 ± 12% perf-profile.self.cycles-pp.down_read_trylock 0.12 ± 3% -0.0 0.09 ± 5% -0.0 0.11 ± 6% perf-profile.self.cycles-pp.do_read_fault 0.14 ± 8% -0.0 0.12 ± 14% +0.0 0.19 ± 9% perf-profile.self.cycles-pp.folio_unlock 0.16 ± 4% -0.0 0.12 ± 4% -0.0 0.14 ± 4% perf-profile.self.cycles-pp.percpu_counter_add_batch 0.09 ± 5% -0.0 0.06 ± 7% +0.0 0.10 ± 9% perf-profile.self.cycles-pp.mem_cgroup_update_lru_size 0.11 ± 6% -0.0 0.08 ± 5% +0.0 0.13 ± 6% perf-profile.self.cycles-pp.shmem_alloc_and_add_folio 0.06 ± 14% -0.0 0.03 ±101% +0.1 0.12 ± 15% perf-profile.self.cycles-pp.free_unref_page_commit 0.08 ± 8% -0.0 0.05 -0.0 0.06 ± 8% perf-profile.self.cycles-pp.xas_find_conflict 0.10 ± 6% -0.0 0.07 ± 6% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.up_read 0.12 ± 4% -0.0 0.09 ± 7% -0.0 0.11 ± 8% perf-profile.self.cycles-pp.folio_add_file_rmap_range 0.25 ± 15% -0.0 0.22 ± 29% +0.2 0.49 ± 12% perf-profile.self.cycles-pp.find_lock_entries 0.12 ± 4% -0.0 0.10 ± 6% +0.0 0.14 ± 5% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.13 ± 6% -0.0 0.10 ± 9% -0.0 0.11 ± 10% perf-profile.self.cycles-pp.page_remove_rmap 0.09 ± 4% -0.0 0.07 ± 7% -0.0 0.09 ± 5% perf-profile.self.cycles-pp.exc_page_fault 0.22 ± 6% -0.0 0.19 ± 17% +0.1 0.34 ± 5% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context 0.08 -0.0 0.06 ± 8% -0.0 0.07 ± 11% perf-profile.self.cycles-pp.__irqentry_text_end 0.13 ± 18% -0.0 0.10 ± 31% +0.1 0.24 ± 12% perf-profile.self.cycles-pp.xas_clear_mark 0.19 ± 5% -0.0 0.17 ± 5% +0.0 0.21 ± 6% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.09 ± 6% -0.0 0.07 ± 5% -0.0 0.09 ± 12% perf-profile.self.cycles-pp.set_pte_range 0.05 ± 46% -0.0 0.03 ±102% +0.1 0.11 ± 15% perf-profile.self.cycles-pp.free_unref_page_list 0.07 ± 5% -0.0 0.05 ± 7% -0.0 0.06 ± 6% perf-profile.self.cycles-pp._compound_head 0.08 -0.0 0.06 ± 9% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.lock_vma_under_rcu 0.06 ± 17% -0.0 0.04 ± 75% +0.1 0.13 ± 13% perf-profile.self.cycles-pp.filemap_free_folio 0.06 ± 9% -0.0 0.04 ± 73% +0.0 0.09 ± 15% perf-profile.self.cycles-pp.trigger_load_balance 0.10 ± 23% -0.0 0.09 ± 35% +0.1 0.20 ± 16% perf-profile.self.cycles-pp.irqtime_account_irq 0.06 ± 9% -0.0 0.04 ± 45% +0.0 0.08 ± 8% perf-profile.self.cycles-pp.__mod_zone_page_state 0.08 ± 22% -0.0 0.06 ± 51% +0.1 0.14 ± 14% perf-profile.self.cycles-pp.filemap_remove_folio 0.55 ± 38% -0.0 0.53 ± 24% +0.4 0.95 ± 21% perf-profile.self.cycles-pp.ktime_get 0.28 -0.0 0.27 ± 6% -0.1 0.23 ± 6% perf-profile.self.cycles-pp.try_charge_memcg 0.18 ± 9% -0.0 0.16 ± 19% +0.1 0.30 ± 5% perf-profile.self.cycles-pp.io_serial_out 0.12 ± 5% -0.0 0.10 ± 20% +0.1 0.18 ± 9% perf-profile.self.cycles-pp.fast_imageblit 0.09 ± 5% -0.0 0.07 ± 21% +0.1 0.14 ± 9% perf-profile.self.cycles-pp.update_sg_lb_stats 0.06 ± 16% -0.0 0.05 ± 74% +0.1 0.12 ± 15% perf-profile.self.cycles-pp.truncate_cleanup_folio 0.14 ± 19% -0.0 0.13 ± 32% +0.1 0.28 ± 11% perf-profile.self.cycles-pp.__free_one_page 0.25 ± 6% -0.0 0.24 ± 22% +0.2 0.43 ± 13% perf-profile.self.cycles-pp.delay_tsc 0.07 ± 16% -0.0 0.06 ± 17% +0.0 0.12 ± 25% perf-profile.self.cycles-pp.ktime_get_update_offsets_now 0.03 ±102% -0.0 0.02 ±142% +0.1 0.09 ± 13% perf-profile.self.cycles-pp.__filemap_remove_folio 0.02 ± 99% -0.0 0.02 ±141% +0.0 0.06 ± 6% perf-profile.self.cycles-pp.update_irq_load_avg 0.02 ± 99% -0.0 0.02 ±141% +0.0 0.07 ± 8% perf-profile.self.cycles-pp.menu_select 0.02 ±142% -0.0 0.02 ±141% +0.1 0.08 ± 13% perf-profile.self.cycles-pp.free_unref_page_prepare 0.00 +0.0 0.00 +0.1 0.06 ± 9% perf-profile.self.cycles-pp.native_sched_clock 0.00 +0.0 0.00 +0.1 0.06 ± 19% perf-profile.self.cycles-pp.__slab_free 0.00 +0.0 0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.read_tsc 0.00 +0.0 0.00 +0.1 0.07 ± 10% perf-profile.self.cycles-pp.get_pfnblock_flags_mask 0.02 ±141% +0.0 0.02 ±142% +0.1 0.08 ± 16% perf-profile.self.cycles-pp.uncharge_folio 0.05 ± 8% +0.0 0.08 ± 8% -0.0 0.03 ±100% perf-profile.self.cycles-pp.propagate_protected_usage 1.31 ± 6% +0.1 1.43 ± 2% -0.2 1.07 ± 3% perf-profile.self.cycles-pp.mem_cgroup_commit_charge 2.93 ± 4% +0.4 3.35 ± 3% -0.2 2.71 ± 6% perf-profile.self.cycles-pp.__mod_lruvec_page_state 0.77 ± 7% +1.5 2.23 ± 3% -0.1 0.67 ± 3% perf-profile.self.cycles-pp.__mem_cgroup_charge 0.75 ± 4% +4.0 4.80 ± 3% -0.1 0.70 ± 5% perf-profile.self.cycles-pp.__count_memcg_events 2.83 ± 3% +6.6 9.40 ± 2% +0.0 2.84 ± 2% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 1.45 ± 9% +7.2 8.63 ± 16% -0.0 1.45 ± 10% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath [-- Attachment #3: will-it-scale-tlb_flush2 --] [-- Type: text/plain, Size: 38804 bytes --] ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale commit: e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code 8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg 0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated() e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 4.05 -1.2 2.81 +0.0 4.06 mpstat.cpu.all.usr% 118438 ± 14% -27.8% 85543 ± 57% -47.1% 62659 ± 72% numa-meminfo.node0.AnonHugePages 193.83 ± 6% +69.3% 328.17 ± 8% +0.5% 194.83 ± 7% perf-c2c.DRAM.local 1216 ± 8% +27.1% 1546 ± 6% +8.2% 1316 ± 8% perf-c2c.DRAM.remote 150.33 ± 13% -40.0% 90.17 ± 13% +10.9% 166.67 ± 8% perf-c2c.HITM.remote 0.04 -25.0% 0.03 +0.0% 0.04 turbostat.IPC 316.16 -1.5% 311.47 -0.3% 315.25 turbostat.PkgWatt 30.54 +4.9% 32.04 -0.5% 30.38 turbostat.RAMWatt 2132437 -32.3% 1444430 +0.9% 2151460 will-it-scale.52.processes 41008 -32.3% 27776 +0.9% 41373 will-it-scale.per_process_ops 2132437 -32.3% 1444430 +0.9% 2151460 will-it-scale.workload 3.113e+08 ± 3% -31.7% 2.125e+08 ± 4% +2.1% 3.18e+08 ± 2% numa-numastat.node0.local_node 3.114e+08 ± 3% -31.7% 2.126e+08 ± 4% +2.1% 3.18e+08 ± 2% numa-numastat.node0.numa_hit 3.322e+08 ± 2% -32.5% 2.243e+08 ± 3% -0.3% 3.312e+08 ± 3% numa-numastat.node1.local_node 3.323e+08 ± 2% -32.5% 2.243e+08 ± 3% -0.3% 3.312e+08 ± 3% numa-numastat.node1.numa_hit 3.114e+08 ± 3% -31.7% 2.126e+08 ± 4% +2.1% 3.18e+08 ± 2% numa-vmstat.node0.numa_hit 3.113e+08 ± 3% -31.7% 2.125e+08 ± 4% +2.1% 3.18e+08 ± 2% numa-vmstat.node0.numa_local 3.323e+08 ± 2% -32.5% 2.243e+08 ± 3% -0.3% 3.312e+08 ± 3% numa-vmstat.node1.numa_hit 3.322e+08 ± 2% -32.5% 2.243e+08 ± 3% -0.3% 3.312e+08 ± 3% numa-vmstat.node1.numa_local 0.00 ± 19% -61.1% 0.00 ± 31% +16.7% 0.00 ± 14% perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64 217.07 ± 11% -46.4% 116.39 ± 23% -1.8% 213.18 ± 8% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64 218.50 ± 6% +19.1% 260.33 ± 4% +7.2% 234.17 ± 5% perf-sched.wait_and_delay.count.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise 217.06 ± 11% -46.4% 116.38 ± 23% -1.8% 213.18 ± 8% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64 7758 ± 24% +15.6% 8968 ± 43% +113.6% 16574 ± 24% proc-vmstat.numa_hint_faults_local 6.436e+08 -32.1% 4.369e+08 +0.9% 6.493e+08 proc-vmstat.numa_hit 6.435e+08 -32.1% 4.368e+08 +0.9% 6.492e+08 proc-vmstat.numa_local 6.432e+08 -32.1% 4.368e+08 +0.9% 6.489e+08 proc-vmstat.pgalloc_normal 1.286e+09 -32.1% 8.726e+08 +0.9% 1.297e+09 proc-vmstat.pgfault 6.432e+08 -32.1% 4.367e+08 +0.9% 6.488e+08 proc-vmstat.pgfree 170696 ± 8% +3.4% 176515 ± 8% +3.2% 176206 ± 8% sched_debug.cpu.clock.avg 170703 ± 8% +3.4% 176522 ± 8% +3.2% 176212 ± 8% sched_debug.cpu.clock.max 170689 ± 8% +3.4% 176508 ± 8% +3.2% 176198 ± 8% sched_debug.cpu.clock.min 169431 ± 8% +3.4% 175248 ± 8% +3.2% 174916 ± 8% sched_debug.cpu.clock_task.avg 169630 ± 8% +3.4% 175429 ± 8% +3.2% 175098 ± 8% sched_debug.cpu.clock_task.max 162542 ± 8% +3.5% 168260 ± 8% +3.4% 168099 ± 9% sched_debug.cpu.clock_task.min 170690 ± 8% +3.4% 176508 ± 8% +3.2% 176199 ± 8% sched_debug.cpu_clk 170117 ± 8% +3.4% 175938 ± 8% +3.2% 175626 ± 8% sched_debug.ktime 171259 ± 8% +3.4% 177078 ± 8% +3.2% 176768 ± 8% sched_debug.sched_clk 4.06 +80.8% 7.34 -4.8% 3.86 perf-stat.i.MPKI 4.066e+09 -23.3% 3.12e+09 +3.5% 4.207e+09 perf-stat.i.branch-instructions 0.57 -0.0 0.55 -0.0 0.57 perf-stat.i.branch-miss-rate% 23478297 -25.0% 17605102 +3.3% 24242314 perf-stat.i.branch-misses 17.25 +7.0 24.27 +0.7 17.95 perf-stat.i.cache-miss-rate% 82715093 ± 2% +35.9% 1.124e+08 -1.8% 81201463 perf-stat.i.cache-misses 4.795e+08 ± 2% -3.4% 4.63e+08 -5.6% 4.525e+08 perf-stat.i.cache-references 7.14 +32.9% 9.49 -3.0% 6.92 perf-stat.i.cpi 134.85 -1.2% 133.29 -0.2% 134.53 perf-stat.i.cpu-migrations 1760 ± 2% -26.5% 1294 +1.8% 1792 perf-stat.i.cycles-between-cache-misses 0.26 -0.0 0.24 -0.0 0.25 perf-stat.i.dTLB-load-miss-rate% 13461491 -31.7% 9190211 +0.9% 13582086 perf-stat.i.dTLB-load-misses 5.141e+09 -24.1% 3.902e+09 +3.6% 5.327e+09 perf-stat.i.dTLB-loads 0.45 -0.0 0.44 -0.0 0.45 perf-stat.i.dTLB-store-miss-rate% 12934403 -32.2% 8773143 +0.9% 13056838 perf-stat.i.dTLB-store-misses 2.841e+09 -29.9% 1.992e+09 +2.7% 2.917e+09 perf-stat.i.dTLB-stores 14.76 +1.4 16.18 ± 4% +2.2 16.92 perf-stat.i.iTLB-load-miss-rate% 7454399 ± 2% -22.7% 5760387 ± 4% +16.4% 8674584 perf-stat.i.iTLB-load-misses 43026423 -30.6% 29840650 -1.0% 42585377 perf-stat.i.iTLB-loads 2.042e+10 -24.7% 1.538e+10 +3.1% 2.104e+10 perf-stat.i.instructions 2745 -2.5% 2677 ± 4% -11.4% 2432 perf-stat.i.instructions-per-iTLB-miss 0.14 -24.6% 0.11 +3.1% 0.14 perf-stat.i.ipc 815.65 -20.2% 651.03 -1.1% 807.03 perf-stat.i.metric.K/sec 120.43 -24.3% 91.11 +3.0% 124.05 perf-stat.i.metric.M/sec 4264808 -32.2% 2892980 +0.9% 4302236 perf-stat.i.minor-faults 11007315 ± 2% +39.7% 15375516 -2.9% 10691798 ± 2% perf-stat.i.node-load-misses 1459152 ± 6% +45.1% 2116827 ± 5% -5.0% 1386160 ± 5% perf-stat.i.node-loads 7872989 ± 2% -26.2% 5812458 -3.4% 7608281 ± 2% perf-stat.i.node-store-misses 4264808 -32.2% 2892980 +0.9% 4302236 perf-stat.i.page-faults 4.05 +80.4% 7.31 -4.8% 3.86 perf-stat.overall.MPKI 0.58 -0.0 0.57 -0.0 0.58 perf-stat.overall.branch-miss-rate% 17.25 +7.0 24.27 +0.7 17.95 perf-stat.overall.cache-miss-rate% 7.13 +32.7% 9.46 -3.0% 6.91 perf-stat.overall.cpi 1759 ± 2% -26.5% 1294 +1.8% 1792 perf-stat.overall.cycles-between-cache-misses 0.26 -0.0 0.23 -0.0 0.25 perf-stat.overall.dTLB-load-miss-rate% 0.45 -0.0 0.44 -0.0 0.45 perf-stat.overall.dTLB-store-miss-rate% 14.77 +1.4 16.18 ± 4% +2.2 16.92 perf-stat.overall.iTLB-load-miss-rate% 2739 -2.4% 2674 ± 4% -11.4% 2426 perf-stat.overall.instructions-per-iTLB-miss 0.14 -24.7% 0.11 +3.1% 0.14 perf-stat.overall.ipc 2882666 +11.2% 3206246 +2.1% 2944234 perf-stat.overall.path-length 4.052e+09 -23.3% 3.11e+09 +3.5% 4.193e+09 perf-stat.ps.branch-instructions 23421504 -25.0% 17574476 +3.2% 24179002 perf-stat.ps.branch-misses 82419384 ± 2% +35.9% 1.12e+08 -1.8% 80913267 perf-stat.ps.cache-misses 4.778e+08 ± 2% -3.4% 4.614e+08 -5.6% 4.509e+08 perf-stat.ps.cache-references 134.44 -1.1% 132.98 -0.2% 134.17 perf-stat.ps.cpu-migrations 13415064 -31.7% 9160067 +0.9% 13535797 perf-stat.ps.dTLB-load-misses 5.124e+09 -24.1% 3.89e+09 +3.6% 5.31e+09 perf-stat.ps.dTLB-loads 12889609 -32.2% 8744145 +1.0% 13012111 perf-stat.ps.dTLB-store-misses 2.831e+09 -29.9% 1.986e+09 +2.7% 2.907e+09 perf-stat.ps.dTLB-stores 7428050 ± 2% -22.7% 5741276 ± 4% +16.4% 8644862 perf-stat.ps.iTLB-load-misses 42877049 -30.6% 29741122 -1.0% 42438686 perf-stat.ps.iTLB-loads 2.035e+10 -24.7% 1.533e+10 +3.1% 2.097e+10 perf-stat.ps.instructions 4250034 -32.2% 2883410 +0.9% 4287486 perf-stat.ps.minor-faults 10968228 ± 2% +39.7% 15322266 -2.9% 10654062 ± 2% perf-stat.ps.node-load-misses 1454274 ± 6% +45.1% 2109746 ± 5% -5.0% 1381519 ± 5% perf-stat.ps.node-loads 7845298 ± 2% -26.2% 5792864 -3.4% 7581789 ± 2% perf-stat.ps.node-store-misses 4250034 -32.2% 2883410 +0.9% 4287486 perf-stat.ps.page-faults 6.147e+12 -24.7% 4.631e+12 +3.0% 6.334e+12 perf-stat.total.instructions 26.77 -1.8 24.93 ± 3% +0.5 27.32 ± 5% perf-profile.calltrace.cycles-pp.intel_idle_ibrs.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 26.75 -1.8 24.92 ± 2% +0.4 27.17 ± 5% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary 26.75 -1.8 24.92 ± 2% +0.4 27.18 ± 5% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 26.84 -1.8 25.00 ± 3% +0.6 27.39 ± 5% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 26.75 -1.8 24.92 ± 2% +0.4 27.18 ± 5% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 26.75 -1.8 24.92 ± 2% +0.4 27.18 ± 5% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 26.75 -1.8 24.92 ± 2% +0.4 27.18 ± 5% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify 27.05 -1.8 25.29 ± 3% +0.4 27.48 ± 5% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 13.02 ± 2% -1.4 11.60 ± 4% -0.4 12.62 ± 2% perf-profile.calltrace.cycles-pp.testcase 5.54 ± 5% -1.0 4.52 ± 3% -0.5 5.06 ± 2% perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single 1.37 ± 2% -0.9 0.51 ± 58% +0.0 1.38 ± 3% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__madvise 10.38 ± 3% -0.8 9.54 ± 2% -0.4 9.97 ± 2% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 2.38 ± 2% -0.8 1.63 ± 3% -0.1 2.29 ± 4% perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush 4.02 ± 3% -0.7 3.32 ± 3% -0.3 3.76 ± 2% perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu 1.92 ± 4% -0.4 1.49 ± 2% -0.0 1.88 ± 2% perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.testcase 1.36 ± 2% -0.4 0.99 -0.0 1.36 ± 3% perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase 1.30 ± 10% -0.4 0.94 ± 6% -0.1 1.16 ± 5% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault 1.50 ± 11% -0.3 1.19 ± 5% -0.2 1.29 ± 8% perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu 1.13 ± 3% -0.3 0.83 -0.0 1.13 ± 2% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise 0.71 ± 3% -0.3 0.43 ± 44% +0.0 0.71 ± 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__madvise 1.02 ± 3% -0.3 0.75 -0.0 1.01 ± 2% perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior 0.97 ± 3% -0.3 0.72 -0.0 0.96 ± 2% perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single 0.77 ± 2% -0.2 0.58 ± 2% -0.0 0.75 ± 3% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise 0.71 ± 2% -0.1 0.60 ± 3% -0.0 0.69 ± 4% perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages 1.20 +0.1 1.34 -0.1 1.12 perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise 1.10 ± 2% +0.2 1.28 -0.1 1.03 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise 1.04 ± 2% +0.2 1.24 -0.1 0.98 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior 0.83 +0.2 1.07 ± 2% -0.0 0.82 perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single 0.81 ± 2% +0.3 1.08 -0.0 0.76 ± 2% perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single 0.88 ± 10% +0.3 1.16 ± 4% -0.1 0.77 ± 5% perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault 0.71 ± 2% +0.3 1.00 -0.0 0.68 perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range 0.76 ± 3% +0.3 1.09 ± 2% -0.0 0.75 ± 2% perf-profile.calltrace.cycles-pp.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.73 ± 3% +0.3 1.07 ± 2% -0.0 0.72 ± 2% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault.handle_mm_fault 0.00 +0.6 0.55 ± 2% +0.0 0.00 perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_commit_charge.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault 6.60 ± 4% +0.6 7.18 ± 3% -0.4 6.22 ± 2% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 6.54 ± 4% +0.6 7.13 ± 3% -0.4 6.17 ± 2% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.00 +0.7 0.74 ± 3% +0.0 0.00 perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single 0.00 +0.8 0.79 ± 2% +0.0 0.00 perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range 0.00 +0.8 0.79 ± 3% +0.0 0.00 perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.lru_add_fn.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain 0.00 +0.8 0.80 ± 3% +0.0 0.00 perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault 5.80 ± 5% +0.8 6.60 ± 3% -0.4 5.41 ± 2% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.00 +0.8 0.82 +0.0 0.00 perf-profile.calltrace.cycles-pp.__count_memcg_events.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush 0.69 ± 4% +0.9 1.59 ± 2% -0.0 0.66 ± 3% perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 30.43 +1.1 31.57 -0.3 30.08 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise 29.22 +1.5 30.69 -0.3 28.88 perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise 29.05 +1.5 30.56 -0.4 28.69 perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior 22.56 ± 2% +2.3 24.87 +0.1 22.70 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single 22.36 ± 2% +2.3 24.70 +0.1 22.51 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu 22.11 ± 2% +2.4 24.55 +0.2 22.27 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush 22.70 +2.6 25.35 +0.4 23.12 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single 22.38 +2.7 25.08 +0.4 22.80 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain 24.10 +2.7 26.82 +0.4 24.51 ± 2% perf-profile.calltrace.cycles-pp.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise 24.09 +2.7 26.82 +0.4 24.51 ± 2% perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise 24.07 +2.7 26.79 +0.4 24.48 ± 2% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior 22.14 +2.8 24.93 +0.4 22.56 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu 59.76 +2.9 62.64 -0.0 59.73 ± 2% perf-profile.calltrace.cycles-pp.__madvise 57.63 +3.5 61.10 -0.0 57.59 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise 57.27 +3.6 60.85 -0.0 57.24 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise 56.41 +3.8 60.20 -0.0 56.39 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise 56.37 +3.8 60.17 -0.0 56.34 ± 2% perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise 55.94 +3.9 59.88 -0.0 55.92 ± 2% perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe 55.85 +4.0 59.82 -0.0 55.83 ± 2% perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64 26.75 -1.8 24.92 ± 2% +0.4 27.18 ± 5% perf-profile.children.cycles-pp.start_secondary 26.98 -1.8 25.22 ± 3% +0.4 27.40 ± 5% perf-profile.children.cycles-pp.intel_idle_ibrs 27.05 -1.8 25.29 ± 3% +0.4 27.48 ± 5% perf-profile.children.cycles-pp.cpu_startup_entry 27.05 -1.8 25.29 ± 3% +0.4 27.48 ± 5% perf-profile.children.cycles-pp.do_idle 27.05 -1.8 25.29 ± 3% +0.4 27.48 ± 5% perf-profile.children.cycles-pp.secondary_startup_64_no_verify 27.05 -1.8 25.29 ± 3% +0.4 27.48 ± 5% perf-profile.children.cycles-pp.cpuidle_enter 27.05 -1.8 25.29 ± 3% +0.4 27.48 ± 5% perf-profile.children.cycles-pp.cpuidle_enter_state 27.05 -1.8 25.29 ± 3% +0.4 27.48 ± 5% perf-profile.children.cycles-pp.cpuidle_idle_call 13.66 ± 2% -1.3 12.38 -0.4 13.26 ± 2% perf-profile.children.cycles-pp.testcase 5.55 ± 5% -1.0 4.52 ± 3% -0.5 5.06 ± 2% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list 2.39 ± 2% -0.8 1.63 ± 3% -0.1 2.29 ± 4% perf-profile.children.cycles-pp.page_counter_uncharge 4.03 ± 3% -0.7 3.32 ± 3% -0.3 3.76 ± 2% perf-profile.children.cycles-pp.uncharge_batch 1.96 ± 4% -0.4 1.52 ± 2% -0.0 1.91 ± 2% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode 1.30 -0.4 0.94 ± 2% +0.0 1.32 ± 2% perf-profile.children.cycles-pp.error_entry 1.36 ± 2% -0.4 0.99 -0.0 1.36 ± 3% perf-profile.children.cycles-pp.__irqentry_text_end 1.30 ± 10% -0.4 0.94 ± 6% -0.1 1.16 ± 5% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm 1.51 ± 11% -0.3 1.19 ± 5% -0.2 1.29 ± 8% perf-profile.children.cycles-pp.uncharge_folio 1.14 ± 3% -0.3 0.84 -0.0 1.14 ± 2% perf-profile.children.cycles-pp.flush_tlb_mm_range 1.02 ± 3% -0.3 0.75 -0.0 1.02 ± 2% perf-profile.children.cycles-pp.flush_tlb_func 0.98 ± 3% -0.3 0.72 -0.0 0.96 ± 2% perf-profile.children.cycles-pp.native_flush_tlb_one_user 0.73 ± 2% -0.2 0.52 ± 2% -0.0 0.72 ± 3% perf-profile.children.cycles-pp.syscall_return_via_sysret 0.69 ± 2% -0.2 0.50 ± 2% +0.0 0.70 ± 4% perf-profile.children.cycles-pp.native_irq_return_iret 0.79 ± 2% -0.2 0.60 ± 2% -0.0 0.77 ± 3% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.51 ± 2% -0.1 0.38 ± 2% +0.0 0.52 ± 4% perf-profile.children.cycles-pp.sync_regs 0.41 ± 3% -0.1 0.29 ± 3% -0.0 0.41 ± 3% perf-profile.children.cycles-pp.__perf_sw_event 0.44 ± 2% -0.1 0.32 ± 2% +0.0 0.44 ± 4% perf-profile.children.cycles-pp.vma_alloc_folio 0.72 ± 2% -0.1 0.61 ± 3% -0.0 0.71 ± 3% perf-profile.children.cycles-pp.propagate_protected_usage 0.39 -0.1 0.28 ± 2% -0.0 0.39 ± 4% perf-profile.children.cycles-pp.alloc_pages_mpol 0.35 ± 3% -0.1 0.25 ± 3% -0.0 0.35 ± 4% perf-profile.children.cycles-pp.__alloc_pages 0.34 ± 2% -0.1 0.24 ± 4% +0.0 0.34 ± 4% perf-profile.children.cycles-pp.___perf_sw_event 0.30 ± 3% -0.1 0.21 ± 5% -0.0 0.30 perf-profile.children.cycles-pp.lock_vma_under_rcu 0.32 ± 2% -0.1 0.24 -0.0 0.32 ± 4% perf-profile.children.cycles-pp.entry_SYSCALL_64 0.12 ± 4% -0.1 0.03 ± 70% -0.0 0.10 ± 6% perf-profile.children.cycles-pp.down_read 0.25 ± 3% -0.1 0.18 ± 4% +0.0 0.26 perf-profile.children.cycles-pp.mas_walk 0.25 ± 3% -0.1 0.18 ± 2% +0.0 0.25 ± 3% perf-profile.children.cycles-pp.get_page_from_freelist 0.17 ± 4% -0.1 0.11 ± 3% -0.0 0.17 ± 6% perf-profile.children.cycles-pp.__pte_offset_map_lock 0.14 ± 3% -0.0 0.10 ± 3% +0.0 0.14 ± 7% perf-profile.children.cycles-pp.clear_page_erms 0.17 ± 2% -0.0 0.12 ± 3% +0.0 0.17 ± 4% perf-profile.children.cycles-pp.find_vma_prev 0.13 ± 2% -0.0 0.09 -0.0 0.12 ± 4% perf-profile.children.cycles-pp.percpu_counter_add_batch 0.11 ± 4% -0.0 0.07 ± 10% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.__cond_resched 0.13 ± 2% -0.0 0.10 ± 7% +0.0 0.15 ± 5% perf-profile.children.cycles-pp.free_pages_and_swap_cache 0.06 ± 7% -0.0 0.03 ± 70% +0.0 0.07 ± 7% perf-profile.children.cycles-pp.unmap_vmas 0.11 ± 3% -0.0 0.08 ± 6% +0.0 0.11 ± 6% perf-profile.children.cycles-pp.free_unref_page_list 0.06 -0.0 0.03 ± 70% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.exit_to_user_mode_prepare 0.09 ± 7% -0.0 0.06 ± 6% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.free_swap_cache 0.09 ± 7% -0.0 0.07 ± 7% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.__munmap 0.09 ± 8% -0.0 0.06 ± 6% +0.0 0.09 ± 8% perf-profile.children.cycles-pp._raw_spin_lock 0.09 ± 5% -0.0 0.06 ± 6% +0.0 0.09 ± 7% perf-profile.children.cycles-pp.handle_pte_fault 0.08 ± 8% -0.0 0.06 ± 6% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.do_vmi_munmap 0.07 ± 6% -0.0 0.05 ± 8% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.rmqueue 0.08 ± 4% -0.0 0.06 ± 6% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.__mod_lruvec_state 0.07 ± 9% -0.0 0.05 ± 7% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.unmap_region 0.08 ± 8% -0.0 0.06 ± 6% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.do_vmi_align_munmap 0.08 ± 8% -0.0 0.06 ± 6% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.__vm_munmap 0.08 ± 8% -0.0 0.06 ± 6% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.__x64_sys_munmap 0.08 ± 5% -0.0 0.07 ± 7% +0.0 0.09 ± 8% perf-profile.children.cycles-pp.try_charge_memcg 1.27 +0.1 1.40 -0.1 1.19 perf-profile.children.cycles-pp.unmap_page_range 1.17 +0.2 1.32 -0.1 1.10 perf-profile.children.cycles-pp.zap_pmd_range 1.12 +0.2 1.29 -0.1 1.05 perf-profile.children.cycles-pp.zap_pte_range 0.84 +0.2 1.07 ± 2% -0.0 0.82 perf-profile.children.cycles-pp.lru_add_fn 0.81 ± 2% +0.3 1.08 -0.0 0.76 ± 2% perf-profile.children.cycles-pp.page_remove_rmap 0.89 ± 10% +0.3 1.16 ± 4% -0.1 0.78 ± 6% perf-profile.children.cycles-pp.mem_cgroup_commit_charge 0.77 ± 3% +0.3 1.09 ± 2% -0.0 0.75 perf-profile.children.cycles-pp.folio_add_new_anon_rmap 6.62 ± 4% +0.6 7.19 ± 3% -0.4 6.24 ± 2% perf-profile.children.cycles-pp.exc_page_fault 6.56 ± 4% +0.6 7.14 ± 3% -0.4 6.18 ± 2% perf-profile.children.cycles-pp.do_user_addr_fault 1.44 ± 2% +0.6 2.08 ± 2% -0.0 1.40 perf-profile.children.cycles-pp.__mod_lruvec_page_state 5.80 ± 5% +0.8 6.61 ± 3% -0.4 5.43 ± 2% perf-profile.children.cycles-pp.handle_mm_fault 30.44 +1.1 31.58 -0.3 30.09 perf-profile.children.cycles-pp.tlb_finish_mmu 29.23 +1.5 30.69 -0.3 28.88 perf-profile.children.cycles-pp.tlb_batch_pages_flush 29.19 +1.5 30.66 -0.4 28.84 perf-profile.children.cycles-pp.release_pages 1.63 ± 5% +1.5 3.13 ± 2% -0.1 1.56 ± 3% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 1.32 ± 4% +1.6 2.97 ± 2% -0.1 1.26 ± 2% perf-profile.children.cycles-pp.__count_memcg_events 24.12 +2.7 26.84 +0.4 24.54 ± 2% perf-profile.children.cycles-pp.lru_add_drain 24.12 +2.7 26.84 +0.4 24.53 ± 2% perf-profile.children.cycles-pp.lru_add_drain_cpu 24.09 +2.7 26.81 +0.4 24.50 ± 2% perf-profile.children.cycles-pp.folio_batch_move_lru 59.80 +2.9 62.68 -0.0 59.78 ± 2% perf-profile.children.cycles-pp.__madvise 57.82 +3.4 61.26 -0.0 57.78 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 57.44 +3.5 60.99 -0.0 57.40 ± 2% perf-profile.children.cycles-pp.do_syscall_64 56.41 +3.8 60.20 -0.0 56.39 ± 2% perf-profile.children.cycles-pp.__x64_sys_madvise 56.37 +3.8 60.17 -0.0 56.35 ± 2% perf-profile.children.cycles-pp.do_madvise 55.94 +3.9 59.88 -0.0 55.92 ± 2% perf-profile.children.cycles-pp.madvise_vma_behavior 55.85 +4.0 59.82 -0.0 55.84 ± 2% perf-profile.children.cycles-pp.zap_page_range_single 45.26 +5.0 50.23 +0.6 45.82 ± 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 44.75 +5.0 49.80 +0.6 45.32 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 44.26 +5.2 49.50 +0.6 44.84 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 26.98 -1.8 25.22 ± 3% +0.4 27.40 ± 5% perf-profile.self.cycles-pp.intel_idle_ibrs 1.67 ± 3% -0.6 1.02 ± 3% -0.1 1.59 ± 5% perf-profile.self.cycles-pp.page_counter_uncharge 1.92 ± 5% -0.4 1.49 ± 2% -0.1 1.87 ± 2% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode 1.47 ± 2% -0.4 1.06 ± 2% +0.0 1.48 ± 4% perf-profile.self.cycles-pp.testcase 1.36 ± 2% -0.4 0.99 -0.0 1.36 ± 3% perf-profile.self.cycles-pp.__irqentry_text_end 1.30 -0.4 0.94 +0.0 1.32 ± 2% perf-profile.self.cycles-pp.error_entry 1.30 ± 10% -0.4 0.94 ± 6% -0.1 1.16 ± 5% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm 1.18 ± 8% -0.3 0.86 ± 6% -0.2 1.02 ± 6% perf-profile.self.cycles-pp.uncharge_batch 1.50 ± 11% -0.3 1.19 ± 5% -0.2 1.28 ± 8% perf-profile.self.cycles-pp.uncharge_folio 0.98 ± 3% -0.3 0.72 -0.0 0.96 ± 2% perf-profile.self.cycles-pp.native_flush_tlb_one_user 0.71 ± 2% -0.2 0.51 ± 2% -0.0 0.70 ± 3% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.69 ± 2% -0.2 0.50 ± 2% +0.0 0.70 ± 3% perf-profile.self.cycles-pp.native_irq_return_iret 0.50 ± 4% -0.2 0.30 ± 5% -0.0 0.49 ± 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.75 ± 2% -0.2 0.56 ± 2% -0.0 0.73 ± 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.51 ± 2% -0.1 0.38 ± 2% +0.0 0.52 ± 4% perf-profile.self.cycles-pp.sync_regs 0.35 ± 3% -0.1 0.23 ± 2% -0.0 0.35 ± 5% perf-profile.self.cycles-pp.folio_batch_move_lru 0.36 ± 5% -0.1 0.24 ± 2% +0.0 0.36 ± 3% perf-profile.self.cycles-pp.lru_add_fn 0.39 ± 2% -0.1 0.27 ± 2% -0.0 0.39 ± 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.72 ± 2% -0.1 0.61 ± 3% -0.0 0.70 ± 3% perf-profile.self.cycles-pp.propagate_protected_usage 0.45 -0.1 0.34 ± 2% -0.0 0.44 ± 2% perf-profile.self.cycles-pp.release_pages 0.54 ± 4% -0.1 0.45 ± 4% -0.0 0.53 ± 5% perf-profile.self.cycles-pp.__mod_lruvec_page_state 0.30 ± 2% -0.1 0.21 ± 3% +0.0 0.30 ± 3% perf-profile.self.cycles-pp.___perf_sw_event 0.52 ± 5% -0.1 0.43 ± 5% -0.0 0.50 ± 4% perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave 0.28 ± 3% -0.1 0.21 -0.0 0.28 ± 3% perf-profile.self.cycles-pp.entry_SYSCALL_64 0.25 ± 3% -0.1 0.18 ± 4% +0.0 0.25 perf-profile.self.cycles-pp.mas_walk 0.24 ± 2% -0.1 0.17 ± 4% +0.0 0.24 ± 2% perf-profile.self.cycles-pp.__handle_mm_fault 0.16 ± 4% -0.1 0.10 ± 9% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.zap_pte_range 0.14 ± 4% -0.0 0.10 ± 4% +0.0 0.14 ± 7% perf-profile.self.cycles-pp.clear_page_erms 0.08 ± 6% -0.0 0.03 ± 70% -0.0 0.07 perf-profile.self.cycles-pp.__cond_resched 0.13 -0.0 0.09 -0.0 0.12 ± 4% perf-profile.self.cycles-pp.percpu_counter_add_batch 0.14 ± 5% -0.0 0.11 ± 3% +0.0 0.15 ± 6% perf-profile.self.cycles-pp.handle_mm_fault 0.11 ± 3% -0.0 0.08 ± 6% +0.0 0.11 ± 8% perf-profile.self.cycles-pp.do_user_addr_fault 0.08 ± 6% -0.0 0.04 ± 44% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.__perf_sw_event 0.07 ± 10% -0.0 0.04 ± 44% -0.0 0.07 ± 5% perf-profile.self.cycles-pp.tlb_finish_mmu 0.08 ± 7% -0.0 0.05 ± 8% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.lock_vma_under_rcu 0.09 ± 7% -0.0 0.06 ± 6% +0.0 0.09 ± 5% perf-profile.self.cycles-pp.free_swap_cache 0.09 ± 8% -0.0 0.06 ± 6% -0.0 0.08 ± 8% perf-profile.self.cycles-pp._raw_spin_lock 0.07 ± 7% -0.0 0.04 ± 44% +0.0 0.07 ± 5% perf-profile.self.cycles-pp.asm_exc_page_fault 0.10 ± 3% -0.0 0.08 ± 6% -0.0 0.08 ± 4% perf-profile.self.cycles-pp.page_remove_rmap 0.08 ± 6% -0.0 0.05 ± 8% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.flush_tlb_mm_range 0.08 ± 7% -0.0 0.06 ± 6% -0.0 0.07 ± 5% perf-profile.self.cycles-pp.unmap_page_range 0.08 ± 6% -0.0 0.06 ± 9% +0.0 0.08 ± 7% perf-profile.self.cycles-pp.do_anonymous_page 0.08 ± 5% -0.0 0.06 ± 6% +0.0 0.08 ± 8% perf-profile.self.cycles-pp.__alloc_pages 0.08 ± 6% -0.0 0.06 ± 8% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.do_madvise 0.07 ± 10% -0.0 0.05 ± 8% +0.0 0.08 ± 8% perf-profile.self.cycles-pp.up_read 1.58 ± 6% +1.5 3.09 ± 2% -0.1 1.51 ± 3% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 1.27 ± 5% +1.7 2.93 ± 2% -0.0 1.22 ± 2% perf-profile.self.cycles-pp.__count_memcg_events 44.25 +5.2 49.50 +0.6 44.84 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath [-- Attachment #4: will-it-scale-fallocate1 --] [-- Type: text/plain, Size: 33789 bytes --] ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/fallocate1/will-it-scale commit: e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code 8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg 0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated() e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 763.24 -1.2% 754.22 -2.0% 748.22 turbostat.PkgWatt 4560 ± 2% -19.4% 3673 +7.1% 4882 ± 2% vmstat.system.cs 0.03 ± 3% -0.0 0.02 -0.0 0.03 ± 2% mpstat.cpu.all.soft% 0.13 ± 2% -0.0 0.10 ± 2% -0.0 0.13 mpstat.cpu.all.usr% 293.00 ± 10% +54.7% 453.17 ± 12% +7.6% 315.33 ± 3% perf-c2c.DRAM.local 3720 ± 3% +41.1% 5251 ± 3% +0.8% 3752 ± 3% perf-c2c.DRAM.remote 325.67 ± 5% -21.5% 255.50 ± 7% +2.5% 333.83 ± 3% perf-c2c.HITM.remote 5426049 -33.8% 3590953 +3.3% 5605429 will-it-scale.224.processes 24222 -33.8% 16030 +3.3% 25023 will-it-scale.per_process_ops 5426049 -33.8% 3590953 +3.3% 5605429 will-it-scale.workload 148965 ± 9% +4.5% 155664 ± 20% -14.2% 127883 ± 10% numa-meminfo.node0.Slab 41751 ± 62% -31.7% 28502 ±122% -66.6% 13962 ±108% numa-meminfo.node1.Active 41727 ± 62% -31.7% 28502 ±122% -66.6% 13948 ±108% numa-meminfo.node1.Active(anon) 69062 ± 38% -19.5% 55596 ± 63% -39.1% 42090 ± 35% numa-meminfo.node1.Shmem 355193 ± 3% +16.1% 412516 ± 4% -3.3% 343648 ± 4% sched_debug.cfs_rq:/.avg_vruntime.stddev 355191 ± 3% +16.1% 412513 ± 4% -3.2% 343648 ± 4% sched_debug.cfs_rq:/.min_vruntime.stddev 89.04 ± 9% +15.9% 103.22 ± 9% +3.2% 91.93 ± 11% sched_debug.cfs_rq:/.runnable_avg.stddev 4289 -13.9% 3693 +4.9% 4498 sched_debug.cpu.nr_switches.avg 2259 ± 3% -25.1% 1693 ± 2% +5.7% 2388 ± 5% sched_debug.cpu.nr_switches.min 44536 -5.9% 41918 +1.5% 45191 proc-vmstat.nr_slab_reclaimable 3.257e+09 -33.9% 2.153e+09 +3.3% 3.366e+09 proc-vmstat.numa_hit 3.256e+09 -33.9% 2.152e+09 +3.3% 3.365e+09 proc-vmstat.numa_local 10269 ± 45% +87.3% 19237 ± 14% +83.8% 18876 ± 39% proc-vmstat.numa_pages_migrated 3.257e+09 -33.9% 2.153e+09 +3.3% 3.365e+09 proc-vmstat.pgalloc_normal 3.257e+09 -33.9% 2.153e+09 +3.3% 3.365e+09 proc-vmstat.pgfree 10269 ± 45% +87.3% 19237 ± 14% +83.8% 18876 ± 39% proc-vmstat.pgmigrate_success 7.906e+08 ± 4% -32.9% 5.303e+08 ± 2% +3.5% 8.181e+08 ± 4% numa-numastat.node0.local_node 7.909e+08 ± 4% -32.9% 5.305e+08 ± 2% +3.5% 8.184e+08 ± 4% numa-numastat.node0.numa_hit 8.069e+08 ± 3% -33.6% 5.361e+08 ± 2% +6.0% 8.552e+08 ± 2% numa-numastat.node1.local_node 8.072e+08 ± 3% -33.6% 5.363e+08 ± 2% +6.0% 8.556e+08 ± 2% numa-numastat.node1.numa_hit 101456 -21.4% 79695 ± 38% -33.4% 67613 ± 38% numa-numastat.node1.other_node 8.276e+08 -34.1% 5.457e+08 ± 2% +2.8% 8.508e+08 numa-numastat.node2.local_node 8.278e+08 -34.1% 5.459e+08 ± 2% +2.8% 8.511e+08 numa-numastat.node2.numa_hit 8.31e+08 -35.0% 5.403e+08 +1.1% 8.406e+08 ± 3% numa-numastat.node3.local_node 8.314e+08 -35.0% 5.404e+08 +1.2% 8.409e+08 ± 3% numa-numastat.node3.numa_hit 7.909e+08 ± 4% -32.9% 5.305e+08 ± 2% +3.5% 8.184e+08 ± 4% numa-vmstat.node0.numa_hit 7.906e+08 ± 4% -32.9% 5.303e+08 ± 2% +3.5% 8.181e+08 ± 4% numa-vmstat.node0.numa_local 10428 ± 62% -31.6% 7130 ±122% -66.6% 3486 ±108% numa-vmstat.node1.nr_active_anon 17331 ± 38% -19.0% 14042 ± 63% -37.6% 10816 ± 33% numa-vmstat.node1.nr_shmem 10428 ± 62% -31.6% 7130 ±122% -66.6% 3486 ±108% numa-vmstat.node1.nr_zone_active_anon 8.072e+08 ± 3% -33.6% 5.363e+08 ± 2% +6.0% 8.556e+08 ± 2% numa-vmstat.node1.numa_hit 8.069e+08 ± 3% -33.6% 5.361e+08 ± 2% +6.0% 8.552e+08 ± 2% numa-vmstat.node1.numa_local 101455 -21.4% 79693 ± 38% -33.4% 67613 ± 38% numa-vmstat.node1.numa_other 8.278e+08 -34.1% 5.459e+08 ± 2% +2.8% 8.511e+08 numa-vmstat.node2.numa_hit 8.276e+08 -34.1% 5.457e+08 ± 2% +2.8% 8.508e+08 numa-vmstat.node2.numa_local 8.314e+08 -35.0% 5.404e+08 +1.2% 8.409e+08 ± 3% numa-vmstat.node3.numa_hit 8.31e+08 -35.0% 5.403e+08 +1.1% 8.406e+08 ± 3% numa-vmstat.node3.numa_local 0.10 ± 8% +135.1% 0.24 ± 10% +32.0% 0.13 ± 23% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 0.04 ± 11% +42.4% 0.06 ± 16% +13.6% 0.04 ± 20% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64 0.06 ± 33% +112.1% 0.14 ± 25% -32.0% 0.04 ± 47% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read 0.06 ± 46% +447.4% 0.31 ± 92% +8.2% 0.06 ± 38% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.09 ± 33% +82.8% 0.16 ± 25% -21.4% 0.07 ± 53% perf-sched.sch_delay.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read 0.03 ± 6% +32.9% 0.04 ± 7% -6.1% 0.03 ± 11% perf-sched.total_sch_delay.average.ms 139.63 ± 4% +21.7% 169.99 ± 3% -9.8% 125.97 ± 3% perf-sched.total_wait_and_delay.average.ms 31780 ± 8% -19.0% 25751 ± 14% -5.8% 29937 ± 14% perf-sched.total_wait_and_delay.count.ms 139.60 ± 4% +21.7% 169.95 ± 3% -9.8% 125.94 ± 3% perf-sched.total_wait_time.average.ms 0.18 ± 6% +19.2% 0.22 ± 21% -14.5% 0.16 ± 11% perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate 3.52 ± 5% +13.4% 3.99 ± 2% -0.3% 3.51 ± 4% perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 0.45 ±223% +821.8% 4.15 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 305.95 ± 7% +44.4% 441.73 ± 4% -14.7% 260.96 ± 4% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 6913 ± 6% -16.5% 5771 ± 13% -0.4% 6884 ± 13% perf-sched.wait_and_delay.count.__cond_resched.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64 1974 ± 11% -42.6% 1132 ± 16% -16.3% 1651 ± 17% perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate 1602 ± 7% +2.7% 1646 ± 13% -14.3% 1373 ± 12% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 9474 ± 11% -33.5% 6303 ± 13% +0.5% 9524 ± 15% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 2.19 ±223% +770.9% 19.04 ± 63% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 1233 ± 30% +163.1% 3245 ± 26% +0.9% 1243 ± 30% perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.18 ± 6% +19.2% 0.22 ± 21% -14.5% 0.16 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate 104.96 ± 11% +50.8% 158.31 ± 14% -22.0% 81.88 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.02 ±186% +985.0% 0.18 ± 33% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.exit_mmap.__mmput.exit_mm 3.41 ± 5% +9.8% 3.75 ± 3% -1.2% 3.37 ± 4% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 2.38 ± 6% +65.7% 3.95 ± 9% +1.4% 2.42 ± 10% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 305.93 ± 7% +44.4% 441.71 ± 4% -14.7% 260.94 ± 4% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.07 ± 12% +59.5% 0.11 ± 27% +28.5% 0.09 ± 44% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open 361.49 ± 10% +163.6% 952.71 ± 24% -10.2% 324.59 ± 12% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.02 ±186% +1370.0% 0.24 ± 20% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.exit_mmap.__mmput.exit_mm 1233 ± 30% +163.1% 3245 ± 26% +0.9% 1243 ± 30% perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 1.51 +70.8% 2.58 -5.4% 1.43 ± 2% perf-stat.i.MPKI 1.364e+10 -19.8% 1.094e+10 +6.4% 1.451e+10 perf-stat.i.branch-instructions 0.29 -0.0 0.25 -0.0 0.27 perf-stat.i.branch-miss-rate% 39037567 -29.6% 27478165 +1.8% 39751409 perf-stat.i.branch-misses 26.72 +7.0 33.67 +0.7 27.42 perf-stat.i.cache-miss-rate% 97210743 +34.0% 1.302e+08 ± 2% +0.3% 97544414 ± 2% perf-stat.i.cache-misses 3.641e+08 +6.3% 3.868e+08 ± 2% -2.2% 3.559e+08 ± 2% perf-stat.i.cache-references 4452 ± 2% -20.0% 3561 +7.5% 4784 ± 2% perf-stat.i.context-switches 13.13 +27.5% 16.74 -5.7% 12.38 perf-stat.i.cpi 270.31 -1.6% 265.85 +0.4% 271.43 perf-stat.i.cpu-migrations 8711 -25.3% 6504 -0.3% 8685 ± 2% perf-stat.i.cycles-between-cache-misses 1.66e+10 -21.1% 1.31e+10 +6.8% 1.774e+10 perf-stat.i.dTLB-loads 7.758e+09 -31.1% 5.343e+09 +6.3% 8.251e+09 perf-stat.i.dTLB-stores 12549015 -38.5% 7719822 +0.5% 12615766 perf-stat.i.iTLB-load-misses 6.454e+10 -21.6% 5.06e+10 +6.1% 6.846e+10 perf-stat.i.instructions 5208 +29.5% 6745 +5.4% 5491 perf-stat.i.instructions-per-iTLB-miss 0.08 -21.6% 0.06 +6.1% 0.08 perf-stat.i.ipc 0.36 ± 6% -24.6% 0.27 ± 25% -23.7% 0.27 ± 25% perf-stat.i.major-faults 86.32 ± 2% +27.6% 110.14 +2.1% 88.10 perf-stat.i.metric.K/sec 171.24 -22.4% 132.88 +6.5% 182.34 perf-stat.i.metric.M/sec 14793159 +36.3% 20167559 +0.9% 14924992 perf-stat.i.node-load-misses 1101912 ± 7% +48.5% 1636628 ± 4% +2.6% 1130608 ± 9% perf-stat.i.node-loads 3340101 ± 2% -19.8% 2679120 +6.9% 3571816 perf-stat.i.node-store-misses 84773 ± 5% -20.6% 67339 ± 6% +2.0% 86484 ± 5% perf-stat.i.node-stores 1.51 +70.8% 2.57 -5.4% 1.42 ± 2% perf-stat.overall.MPKI 0.29 -0.0 0.25 -0.0 0.27 perf-stat.overall.branch-miss-rate% 26.69 +6.9 33.63 +0.7 27.39 perf-stat.overall.cache-miss-rate% 13.12 +27.5% 16.73 -5.7% 12.37 perf-stat.overall.cpi 8709 -25.3% 6503 ± 2% -0.3% 8682 ± 2% perf-stat.overall.cycles-between-cache-misses 5146 +27.5% 6563 +5.5% 5430 perf-stat.overall.instructions-per-iTLB-miss 0.08 -21.6% 0.06 +6.1% 0.08 perf-stat.overall.ipc 3581676 +18.4% 4239733 +2.6% 3673713 perf-stat.overall.path-length 1.359e+10 -19.8% 1.091e+10 +6.4% 1.446e+10 perf-stat.ps.branch-instructions 38876130 -29.7% 27341584 +1.8% 39577054 perf-stat.ps.branch-misses 96879835 +34.0% 1.298e+08 ± 2% +0.3% 97215764 ± 2% perf-stat.ps.cache-misses 3.63e+08 +6.3% 3.859e+08 ± 2% -2.2% 3.549e+08 ± 2% perf-stat.ps.cache-references 4434 ± 2% -20.0% 3547 +7.4% 4764 ± 2% perf-stat.ps.context-switches 268.37 -1.9% 263.37 +0.3% 269.14 perf-stat.ps.cpu-migrations 1.655e+10 -21.1% 1.305e+10 +6.8% 1.768e+10 perf-stat.ps.dTLB-loads 7.733e+09 -31.1% 5.325e+09 +6.3% 8.223e+09 perf-stat.ps.dTLB-stores 12499097 -38.5% 7684522 +0.5% 12563331 perf-stat.ps.iTLB-load-misses 6.433e+10 -21.6% 5.044e+10 +6.1% 6.823e+10 perf-stat.ps.instructions 0.34 ± 6% -25.9% 0.25 ± 25% -23.8% 0.26 ± 25% perf-stat.ps.major-faults 14743590 +36.3% 20098836 +0.9% 14874764 perf-stat.ps.node-load-misses 1098750 ± 7% +48.7% 1633532 ± 4% +2.7% 1128235 ± 9% perf-stat.ps.node-loads 3328886 ± 2% -19.8% 2670192 +6.9% 3559593 perf-stat.ps.node-store-misses 84559 ± 5% -20.6% 67163 ± 6% +1.9% 86147 ± 5% perf-stat.ps.node-stores 1.943e+13 -21.7% 1.522e+13 +6.0% 2.059e+13 perf-stat.total.instructions 9.91 ± 10% -3.8 6.10 ± 4% -1.4 8.53 ± 11% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate 4.47 ± 10% -2.3 2.19 ± 4% -0.6 3.84 ± 11% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate 58.11 -2.1 56.01 -1.1 57.01 perf-profile.calltrace.cycles-pp.fallocate64 58.02 -2.1 55.95 -1.1 56.91 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fallocate64 58.00 -2.1 55.94 -1.1 56.90 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64 57.96 -2.1 55.91 -1.1 56.85 perf-profile.calltrace.cycles-pp.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64 57.92 -2.0 55.89 -1.1 56.82 perf-profile.calltrace.cycles-pp.vfs_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64 57.82 -2.0 55.83 -1.1 56.72 perf-profile.calltrace.cycles-pp.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe 57.47 -1.8 55.62 -1.1 56.40 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64 57.30 -1.8 55.53 -1.1 56.22 perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate 2.17 ± 4% -1.0 1.14 ± 3% -0.1 2.06 ± 4% perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.__folio_batch_release.shmem_undo_range.shmem_setattr 3.08 ± 9% -0.9 2.19 ± 4% -0.4 2.64 ± 11% perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate 1.29 ± 6% -0.7 0.54 ± 4% -0.1 1.16 ± 6% perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_list.release_pages.__folio_batch_release.shmem_undo_range 0.88 ± 2% -0.3 0.59 ± 2% +0.0 0.90 ± 2% perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.__folio_batch_release.shmem_undo_range 1.66 -0.0 1.63 ± 3% +0.0 1.69 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.__folio_batch_release.shmem_undo_range.shmem_setattr.notify_change 1.64 -0.0 1.62 ± 3% +0.0 1.68 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu 1.66 -0.0 1.63 ± 3% +0.0 1.69 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.__folio_batch_release.shmem_undo_range.shmem_setattr 0.80 +0.1 0.86 ± 2% -0.0 0.78 ± 2% perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp 0.61 ± 2% +0.1 0.74 ± 2% -0.0 0.58 ± 3% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio 1.65 ± 2% +0.2 1.85 +0.0 1.67 ± 2% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.filemap_unaccount_folio.__filemap_remove_folio.filemap_remove_folio 1.44 ± 3% +0.4 1.79 ± 3% -0.1 1.34 ± 4% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp 0.08 ±223% +0.5 0.60 ± 2% +0.1 0.17 ±141% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.release_pages.__folio_batch_release.shmem_undo_range.shmem_setattr 0.00 +0.9 0.86 ± 4% +0.0 0.00 perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp 41.70 +2.1 43.82 ± 2% +1.1 42.82 ± 2% perf-profile.calltrace.cycles-pp.ftruncate64 41.68 +2.1 43.81 ± 2% +1.1 42.80 ± 2% perf-profile.calltrace.cycles-pp.do_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64 41.68 +2.1 43.81 ± 2% +1.1 42.80 ± 2% perf-profile.calltrace.cycles-pp.do_truncate.do_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64 41.68 +2.1 43.80 ± 2% +1.1 42.80 ± 2% perf-profile.calltrace.cycles-pp.notify_change.do_truncate.do_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe 41.69 +2.1 43.82 ± 2% +1.1 42.81 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64 41.69 +2.1 43.82 ± 2% +1.1 42.81 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64 41.67 +2.1 43.80 ± 2% +1.1 42.80 ± 2% perf-profile.calltrace.cycles-pp.shmem_setattr.notify_change.do_truncate.do_sys_ftruncate.do_syscall_64 41.67 +2.1 43.80 ± 2% +1.1 42.79 ± 2% perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_setattr.notify_change.do_truncate.do_sys_ftruncate 38.67 +2.3 40.97 ± 2% +1.0 39.68 ± 2% perf-profile.calltrace.cycles-pp.__folio_batch_release.shmem_undo_range.shmem_setattr.notify_change.do_truncate 36.98 +2.3 39.32 ± 2% +1.0 37.96 ± 2% perf-profile.calltrace.cycles-pp.release_pages.__folio_batch_release.shmem_undo_range.shmem_setattr.notify_change 44.10 +2.4 46.47 ± 2% +0.4 44.48 perf-profile.calltrace.cycles-pp.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate 44.04 +2.4 46.42 ± 2% +0.4 44.42 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate 42.89 +2.4 45.32 ± 2% +0.4 43.29 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp 42.87 +2.4 45.31 ± 2% +0.4 43.27 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio 42.84 +2.4 45.29 ± 2% +0.4 43.24 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru 33.96 +3.4 37.31 ± 2% +1.1 35.02 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.__folio_batch_release.shmem_undo_range.shmem_setattr 33.94 +3.4 37.30 ± 2% +1.1 35.00 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.__folio_batch_release.shmem_undo_range 33.92 +3.4 37.28 ± 2% +1.1 34.98 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.__folio_batch_release 9.93 ± 10% -3.8 6.10 ± 4% -1.4 8.54 ± 11% perf-profile.children.cycles-pp.__mem_cgroup_charge 4.48 ± 10% -2.3 2.20 ± 4% -0.6 3.84 ± 11% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm 58.14 -2.1 56.03 -1.1 57.04 perf-profile.children.cycles-pp.fallocate64 57.96 -2.1 55.91 -1.1 56.85 perf-profile.children.cycles-pp.__x64_sys_fallocate 57.92 -2.0 55.89 -1.1 56.82 perf-profile.children.cycles-pp.vfs_fallocate 57.82 -2.0 55.83 -1.1 56.73 perf-profile.children.cycles-pp.shmem_fallocate 57.53 -1.8 55.69 -1.1 56.44 perf-profile.children.cycles-pp.shmem_get_folio_gfp 57.36 -1.8 55.60 -1.1 56.27 perf-profile.children.cycles-pp.shmem_alloc_and_add_folio 2.18 ± 4% -1.0 1.14 ± 3% -0.1 2.07 ± 4% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list 3.09 ± 9% -0.9 2.19 ± 4% -0.4 2.64 ± 11% perf-profile.children.cycles-pp.mem_cgroup_commit_charge 1.29 ± 6% -0.7 0.54 ± 4% -0.1 1.16 ± 6% perf-profile.children.cycles-pp.uncharge_folio 0.88 ± 2% -0.3 0.59 ± 2% +0.0 0.90 ± 2% perf-profile.children.cycles-pp.uncharge_batch 0.36 -0.1 0.22 ± 2% +0.0 0.36 perf-profile.children.cycles-pp.shmem_alloc_folio 0.36 ± 2% -0.1 0.23 ± 2% +0.0 0.37 perf-profile.children.cycles-pp.xas_store 0.32 ± 2% -0.1 0.20 ± 2% +0.0 0.32 perf-profile.children.cycles-pp.alloc_pages_mpol 0.27 ± 2% -0.1 0.16 ± 4% -0.0 0.27 ± 2% perf-profile.children.cycles-pp.shmem_inode_acct_blocks 0.27 -0.1 0.17 ± 3% +0.0 0.28 perf-profile.children.cycles-pp.__alloc_pages 0.37 ± 4% -0.1 0.29 +0.0 0.40 perf-profile.children.cycles-pp.page_counter_uncharge 0.18 -0.1 0.11 ± 4% +0.0 0.18 ± 3% perf-profile.children.cycles-pp.get_page_from_freelist 0.16 ± 3% -0.1 0.09 +0.0 0.16 ± 4% perf-profile.children.cycles-pp.xas_load 0.18 ± 2% -0.1 0.12 ± 4% +0.0 0.18 ± 2% perf-profile.children.cycles-pp.__mod_lruvec_state 0.14 ± 2% -0.1 0.09 +0.0 0.15 ± 3% perf-profile.children.cycles-pp._raw_spin_lock 0.18 ± 3% -0.1 0.13 ± 4% +0.0 0.19 ± 3% perf-profile.children.cycles-pp.try_charge_memcg 0.09 ± 10% -0.0 0.04 ± 73% +0.0 0.09 ± 12% perf-profile.children.cycles-pp._raw_spin_lock_irq 0.12 -0.0 0.07 +0.0 0.12 perf-profile.children.cycles-pp.__dquot_alloc_space 0.11 -0.0 0.06 ± 6% +0.0 0.11 perf-profile.children.cycles-pp.filemap_get_entry 0.13 ± 2% -0.0 0.09 ± 4% +0.0 0.14 perf-profile.children.cycles-pp.__mod_node_page_state 0.10 ± 3% -0.0 0.06 ± 9% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.xas_descend 0.12 -0.0 0.08 +0.0 0.12 ± 4% perf-profile.children.cycles-pp.free_unref_page_list 0.11 ± 3% -0.0 0.07 +0.0 0.12 ± 4% perf-profile.children.cycles-pp.rmqueue 0.10 ± 35% -0.0 0.06 -0.0 0.08 ± 8% perf-profile.children.cycles-pp.cgroup_rstat_updated 0.10 -0.0 0.06 ± 7% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.xas_clear_mark 0.18 -0.0 0.14 ± 3% +0.0 0.18 ± 2% perf-profile.children.cycles-pp.find_lock_entries 0.16 ± 4% -0.0 0.13 ± 2% +0.0 0.17 ± 6% perf-profile.children.cycles-pp.propagate_protected_usage 0.10 -0.0 0.07 ± 5% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.truncate_cleanup_folio 0.08 ± 4% -0.0 0.05 ± 7% +0.0 0.08 perf-profile.children.cycles-pp.xas_init_marks 0.09 ± 4% -0.0 0.06 ± 7% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.page_counter_try_charge 0.18 ± 2% -0.0 0.16 -0.0 0.18 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.14 ± 2% -0.0 0.13 ± 2% -0.0 0.14 perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.14 ± 2% -0.0 0.13 ± 2% -0.0 0.14 perf-profile.children.cycles-pp.hrtimer_interrupt 0.09 -0.0 0.08 +0.0 0.09 perf-profile.children.cycles-pp.tick_sched_handle 0.00 +0.0 0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.mem_cgroup_update_lru_size 0.82 +0.1 0.87 -0.0 0.79 ± 2% perf-profile.children.cycles-pp.lru_add_fn 99.81 +0.1 99.89 +0.0 99.81 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 99.79 +0.1 99.88 +0.0 99.80 perf-profile.children.cycles-pp.do_syscall_64 0.51 +0.5 0.98 ± 3% +0.0 0.53 ± 5% perf-profile.children.cycles-pp.__count_memcg_events 4.21 +0.8 5.01 -0.1 4.12 perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 41.68 +2.1 43.81 ± 2% +1.1 42.80 ± 2% perf-profile.children.cycles-pp.do_sys_ftruncate 41.70 +2.1 43.82 ± 2% +1.1 42.82 ± 2% perf-profile.children.cycles-pp.ftruncate64 41.68 +2.1 43.81 ± 2% +1.1 42.80 ± 2% perf-profile.children.cycles-pp.do_truncate 41.68 +2.1 43.80 ± 2% +1.1 42.80 ± 2% perf-profile.children.cycles-pp.notify_change 41.67 +2.1 43.80 ± 2% +1.1 42.80 ± 2% perf-profile.children.cycles-pp.shmem_setattr 41.67 +2.1 43.81 ± 2% +1.1 42.80 ± 2% perf-profile.children.cycles-pp.shmem_undo_range 38.67 +2.3 40.98 ± 2% +1.0 39.68 ± 2% perf-profile.children.cycles-pp.__folio_batch_release 37.07 +2.3 39.39 ± 2% +1.0 38.05 ± 2% perf-profile.children.cycles-pp.release_pages 45.77 +2.4 48.14 +0.4 46.17 perf-profile.children.cycles-pp.folio_batch_move_lru 44.14 +2.4 46.52 ± 2% +0.4 44.51 perf-profile.children.cycles-pp.folio_add_lru 78.55 +5.8 84.34 +1.5 80.04 perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 78.52 +5.8 84.31 +1.5 80.00 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 78.48 +5.8 84.29 +1.5 79.96 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 4.47 ± 10% -2.3 2.19 ± 4% -0.6 3.83 ± 11% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm 2.67 ± 11% -1.3 1.32 ± 4% -0.5 2.22 ± 12% perf-profile.self.cycles-pp.mem_cgroup_commit_charge 1.28 ± 6% -0.7 0.54 ± 4% -0.1 1.16 ± 6% perf-profile.self.cycles-pp.uncharge_folio 2.18 ± 11% -0.6 1.58 ± 5% -0.3 1.86 ± 12% perf-profile.self.cycles-pp.__mem_cgroup_charge 1.16 ± 8% -0.3 0.84 ± 10% +0.1 1.24 ± 14% perf-profile.self.cycles-pp.__mod_lruvec_page_state 0.38 ± 7% -0.2 0.18 ± 5% -0.0 0.35 ± 6% perf-profile.self.cycles-pp.uncharge_batch 0.24 ± 4% -0.1 0.16 ± 2% +0.0 0.24 perf-profile.self.cycles-pp.folio_batch_move_lru 0.18 ± 3% -0.1 0.11 ± 3% -0.0 0.18 perf-profile.self.cycles-pp.xas_store 0.19 -0.1 0.12 ± 3% -0.0 0.18 ± 2% perf-profile.self.cycles-pp.release_pages 0.14 ± 5% -0.1 0.08 ± 5% -0.0 0.14 ± 2% perf-profile.self.cycles-pp.lru_add_fn 0.23 ± 5% -0.1 0.17 ± 2% +0.0 0.25 ± 2% perf-profile.self.cycles-pp.page_counter_uncharge 0.11 -0.1 0.06 -0.0 0.10 ± 4% perf-profile.self.cycles-pp.shmem_fallocate 0.13 -0.0 0.08 ± 5% +0.0 0.13 ± 2% perf-profile.self.cycles-pp.__mod_node_page_state 0.14 ± 3% -0.0 0.09 +0.0 0.14 perf-profile.self.cycles-pp._raw_spin_lock 0.09 ± 5% -0.0 0.05 ± 7% +0.0 0.09 ± 5% perf-profile.self.cycles-pp.xas_descend 0.10 ± 3% -0.0 0.06 +0.0 0.10 ± 4% perf-profile.self.cycles-pp.shmem_add_to_page_cache 0.09 ± 37% -0.0 0.05 ± 7% -0.0 0.07 ± 5% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.16 ± 4% -0.0 0.13 ± 2% +0.0 0.17 ± 6% perf-profile.self.cycles-pp.propagate_protected_usage 0.09 -0.0 0.06 +0.0 0.09 ± 5% perf-profile.self.cycles-pp.xas_clear_mark 0.09 ± 5% -0.0 0.06 ± 7% +0.0 0.09 ± 5% perf-profile.self.cycles-pp.try_charge_memcg 0.15 -0.0 0.12 ± 3% +0.0 0.15 ± 3% perf-profile.self.cycles-pp.find_lock_entries 0.07 ± 7% -0.0 0.05 +0.0 0.07 ± 6% perf-profile.self.cycles-pp.page_counter_try_charge 0.00 +0.0 0.00 +0.1 0.06 ± 8% perf-profile.self.cycles-pp.mem_cgroup_update_lru_size 0.50 ± 3% +0.5 0.97 ± 3% +0.0 0.52 ± 5% perf-profile.self.cycles-pp.__count_memcg_events 4.14 ± 2% +0.8 4.97 -0.1 4.06 perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 78.48 +5.8 84.29 +1.5 79.96 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression 2024-01-24 8:26 ` Oliver Sang @ 2024-01-24 9:11 ` Yosry Ahmed 0 siblings, 0 replies; 6+ messages in thread From: Yosry Ahmed @ 2024-01-24 9:11 UTC (permalink / raw) To: Oliver Sang Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner, Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen, Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song, Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups, linux-mm, ying.huang, feng.tang, fengwei.yin On Wed, Jan 24, 2024 at 12:26 AM Oliver Sang <oliver.sang@intel.com> wrote: > > hi, Yosry Ahmed, > > On Mon, Jan 22, 2024 at 11:42:04PM -0800, Yosry Ahmed wrote: > > > > Oliver, would you be able to test if the attached patch helps? It's > > > > based on 8d59d2214c236. > > > > > > the patch failed to compile: > > > > > > build_errors: > > > - "mm/memcontrol.c:731:38: error: 'x' undeclared (first use in this function)" > > > > Apologizes, apparently I sent the patch with some pending diff in my > > tree that I hadn't committed. Please find a fixed patch attached. > > the regression disappears after applying the patch. > > Tested-by: kernel test robot <oliver.sang@intel.com> Awesome! Thanks for testing. I will formalize the patch and send it out for review. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-24 9:12 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-22 8:39 [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression kernel test robot 2024-01-22 21:39 ` Yosry Ahmed 2024-01-23 7:21 ` Oliver Sang 2024-01-23 7:42 ` Yosry Ahmed 2024-01-24 8:26 ` Oliver Sang 2024-01-24 9:11 ` Yosry Ahmed
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox