* [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression
@ 2024-05-17 5:56 kernel test robot
2024-05-17 23:38 ` Yosry Ahmed
2024-05-18 6:28 ` Shakeel Butt
0 siblings, 2 replies; 15+ messages in thread
From: kernel test robot @ 2024-05-17 5:56 UTC (permalink / raw)
To: Shakeel Butt
Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
fengwei.yin, oliver.sang
Hello,
kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on:
commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:
nr_task: 100%
mode: process
test: page_fault2
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405171353.b56b845-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240517/202405171353.b56b845-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale
commit:
59142d87ab ("memcg: reduce memory size of mem_cgroup_events_index")
70a64b7919 ("memcg: dynamically allocate lruvec_stats")
59142d87ab03b8ff 70a64b7919cbd6c12306051ff28
---------------- ---------------------------
%stddev %change %stddev
\ | \
7.14 -0.8 6.32 mpstat.cpu.all.usr%
245257 ± 7% -13.8% 211354 ± 4% sched_debug.cfs_rq:/.avg_vruntime.stddev
245258 ± 7% -13.8% 211353 ± 4% sched_debug.cfs_rq:/.min_vruntime.stddev
21099 ± 5% -14.9% 17946 ± 5% perf-c2c.DRAM.local
4025 ± 2% +29.1% 5197 ± 3% perf-c2c.HITM.local
105.17 ± 8% -12.7% 91.83 ± 6% perf-c2c.HITM.remote
9538291 -11.9% 8402170 will-it-scale.104.processes
91713 -11.9% 80789 will-it-scale.per_process_ops
9538291 -11.9% 8402170 will-it-scale.workload
1.438e+09 -11.2% 1.276e+09 numa-numastat.node0.local_node
1.44e+09 -11.3% 1.278e+09 numa-numastat.node0.numa_hit
83001 ± 15% -68.9% 25774 ± 34% numa-numastat.node0.other_node
1.453e+09 -12.5% 1.271e+09 numa-numastat.node1.local_node
1.454e+09 -12.5% 1.272e+09 numa-numastat.node1.numa_hit
24752 ± 51% +230.9% 81910 ± 10% numa-numastat.node1.other_node
1.44e+09 -11.3% 1.278e+09 numa-vmstat.node0.numa_hit
1.438e+09 -11.3% 1.276e+09 numa-vmstat.node0.numa_local
83001 ± 15% -68.9% 25774 ± 34% numa-vmstat.node0.numa_other
1.454e+09 -12.5% 1.272e+09 numa-vmstat.node1.numa_hit
1.453e+09 -12.5% 1.271e+09 numa-vmstat.node1.numa_local
24752 ± 51% +230.9% 81910 ± 10% numa-vmstat.node1.numa_other
14952 -3.2% 14468 proc-vmstat.nr_mapped
2.894e+09 -11.9% 2.55e+09 proc-vmstat.numa_hit
2.891e+09 -11.9% 2.548e+09 proc-vmstat.numa_local
2.88e+09 -11.8% 2.539e+09 proc-vmstat.pgalloc_normal
2.869e+09 -11.9% 2.529e+09 proc-vmstat.pgfault
2.88e+09 -11.8% 2.539e+09 proc-vmstat.pgfree
17.51 -2.6% 17.05 perf-stat.i.MPKI
9.457e+09 -9.2% 8.585e+09 perf-stat.i.branch-instructions
45022022 -8.2% 41340795 perf-stat.i.branch-misses
84.38 -4.9 79.51 perf-stat.i.cache-miss-rate%
8.353e+08 -12.1% 7.345e+08 perf-stat.i.cache-misses
9.877e+08 -6.7% 9.216e+08 perf-stat.i.cache-references
6.06 +10.8% 6.72 perf-stat.i.cpi
136.25 -1.2% 134.59 perf-stat.i.cpu-migrations
348.56 +13.9% 396.93 perf-stat.i.cycles-between-cache-misses
4.763e+10 -9.7% 4.302e+10 perf-stat.i.instructions
0.17 -9.6% 0.15 perf-stat.i.ipc
182.56 -11.9% 160.88 perf-stat.i.metric.K/sec
9494393 -11.9% 8368012 perf-stat.i.minor-faults
9494393 -11.9% 8368012 perf-stat.i.page-faults
17.54 -2.6% 17.08 perf-stat.overall.MPKI
0.47 +0.0 0.48 perf-stat.overall.branch-miss-rate%
84.57 -4.9 79.71 perf-stat.overall.cache-miss-rate%
6.07 +10.8% 6.73 perf-stat.overall.cpi
346.33 +13.8% 393.97 perf-stat.overall.cycles-between-cache-misses
0.16 -9.7% 0.15 perf-stat.overall.ipc
1503802 +2.6% 1542599 perf-stat.overall.path-length
9.424e+09 -9.2% 8.553e+09 perf-stat.ps.branch-instructions
44739120 -8.3% 41034189 perf-stat.ps.branch-misses
8.326e+08 -12.1% 7.321e+08 perf-stat.ps.cache-misses
9.846e+08 -6.7% 9.185e+08 perf-stat.ps.cache-references
134.98 -1.3% 133.26 perf-stat.ps.cpu-migrations
4.747e+10 -9.7% 4.286e+10 perf-stat.ps.instructions
9463902 -11.9% 8339836 perf-stat.ps.minor-faults
9463902 -11.9% 8339836 perf-stat.ps.page-faults
1.434e+13 -9.6% 1.296e+13 perf-stat.total.instructions
64.15 -2.4 61.72 perf-profile.calltrace.cycles-pp.testcase
58.30 -1.9 56.41 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
52.64 -1.4 51.28 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
52.50 -1.3 51.16 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
50.81 -1.0 49.86 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
49.86 -0.8 49.02 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
9.27 -0.8 8.45 ± 3% perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
49.21 -0.8 48.43 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
5.15 -0.5 4.68 perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
3.24 -0.5 2.77 perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.82 -0.3 0.51 perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
1.68 -0.3 1.42 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
2.52 -0.2 2.28 perf-profile.calltrace.cycles-pp.error_entry.testcase
1.50 ± 2% -0.2 1.30 perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
1.85 -0.1 1.70 ± 3% perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
0.68 -0.1 0.55 ± 2% perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
1.55 -0.1 1.44 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault
0.55 -0.1 0.43 ± 44% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc
1.07 -0.1 0.98 perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault
0.90 -0.1 0.81 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
0.89 -0.0 0.86 perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault
1.00 +0.1 1.05 perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
3.85 +0.2 4.10 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
3.85 +0.2 4.10 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
3.85 +0.2 4.10 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
3.82 +0.3 4.07 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
3.68 +0.3 3.94 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
0.83 +0.3 1.10 ± 2% perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault.__handle_mm_fault
0.00 +0.5 0.54 perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range
0.00 +0.7 0.66 perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
32.87 +0.7 33.62 perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
29.54 +2.3 31.80 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
29.54 +2.3 31.80 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
29.53 +2.3 31.80 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
30.66 +2.3 32.93 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
30.66 +2.3 32.93 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
30.66 +2.3 32.93 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
30.66 +2.3 32.93 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
29.26 +2.3 31.60 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
28.41 +2.4 30.78 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu
34.56 +2.5 37.08 perf-profile.calltrace.cycles-pp.__munmap
34.56 +2.5 37.08 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
34.56 +2.5 37.08 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
34.55 +2.5 37.07 perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
34.55 +2.5 37.08 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
34.55 +2.5 37.08 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
34.55 +2.5 37.08 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
34.55 +2.5 37.08 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
31.41 +2.8 34.20 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
31.42 +2.8 34.23 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
31.38 +2.8 34.19 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
65.26 -2.5 62.73 perf-profile.children.cycles-pp.testcase
56.09 -1.7 54.41 perf-profile.children.cycles-pp.asm_exc_page_fault
52.66 -1.4 51.30 perf-profile.children.cycles-pp.exc_page_fault
52.52 -1.3 51.18 perf-profile.children.cycles-pp.do_user_addr_fault
50.83 -1.0 49.88 perf-profile.children.cycles-pp.handle_mm_fault
49.87 -0.8 49.02 perf-profile.children.cycles-pp.__handle_mm_fault
9.35 -0.8 8.53 ± 3% perf-profile.children.cycles-pp.copy_page
49.23 -0.8 48.45 perf-profile.children.cycles-pp.do_fault
5.15 -0.5 4.68 perf-profile.children.cycles-pp.__irqentry_text_end
3.27 -0.5 2.80 perf-profile.children.cycles-pp.folio_prealloc
0.82 -0.3 0.52 perf-profile.children.cycles-pp.lock_vma_under_rcu
0.57 -0.3 0.32 perf-profile.children.cycles-pp.mas_walk
1.69 -0.3 1.43 perf-profile.children.cycles-pp.vma_alloc_folio_noprof
2.54 -0.2 2.30 perf-profile.children.cycles-pp.error_entry
1.52 ± 2% -0.2 1.31 perf-profile.children.cycles-pp.__mem_cgroup_charge
0.95 -0.2 0.79 ± 4% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
1.87 -0.2 1.72 ± 3% perf-profile.children.cycles-pp.__pte_offset_map_lock
0.60 ± 4% -0.1 0.46 ± 6% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
0.70 -0.1 0.56 ± 2% perf-profile.children.cycles-pp.lru_add_fn
1.57 -0.1 1.45 ± 3% perf-profile.children.cycles-pp._raw_spin_lock
1.16 -0.1 1.04 perf-profile.children.cycles-pp.native_irq_return_iret
1.12 -0.1 1.01 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
0.44 -0.1 0.35 perf-profile.children.cycles-pp.get_vma_policy
0.94 -0.1 0.85 perf-profile.children.cycles-pp.sync_regs
0.96 -0.1 0.87 perf-profile.children.cycles-pp.__perf_sw_event
0.43 -0.1 0.34 ± 2% perf-profile.children.cycles-pp.free_unref_folios
0.21 ± 3% -0.1 0.13 ± 3% perf-profile.children.cycles-pp._compound_head
0.75 -0.1 0.68 perf-profile.children.cycles-pp.___perf_sw_event
0.31 -0.1 0.25 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
0.94 -0.0 0.90 perf-profile.children.cycles-pp.__alloc_pages_noprof
0.41 ± 4% -0.0 0.37 ± 4% perf-profile.children.cycles-pp.mem_cgroup_commit_charge
0.44 ± 5% -0.0 0.40 ± 5% perf-profile.children.cycles-pp.__count_memcg_events
0.17 ± 2% -0.0 0.13 ± 4% perf-profile.children.cycles-pp.uncharge_batch
0.57 -0.0 0.53 ± 2% perf-profile.children.cycles-pp.get_page_from_freelist
0.13 ± 2% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.__mod_zone_page_state
0.19 ± 3% -0.0 0.16 ± 6% perf-profile.children.cycles-pp.cgroup_rstat_updated
0.15 ± 2% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.free_unref_page_commit
0.10 ± 3% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
0.08 -0.0 0.05 perf-profile.children.cycles-pp.policy_nodemask
0.13 ± 3% -0.0 0.10 ± 3% perf-profile.children.cycles-pp.page_counter_uncharge
0.32 ± 3% -0.0 0.30 ± 2% perf-profile.children.cycles-pp.__mod_node_page_state
0.17 ± 2% -0.0 0.15 ± 3% perf-profile.children.cycles-pp.percpu_counter_add_batch
0.16 ± 2% -0.0 0.14 ± 2% perf-profile.children.cycles-pp.shmem_get_policy
0.16 -0.0 0.14 ± 2% perf-profile.children.cycles-pp.handle_pte_fault
0.16 ± 4% -0.0 0.14 ± 4% perf-profile.children.cycles-pp.__pte_offset_map
0.09 -0.0 0.07 ± 5% perf-profile.children.cycles-pp.get_pfnblock_flags_mask
0.12 ± 3% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.uncharge_folio
0.36 -0.0 0.34 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.10 ± 3% -0.0 0.08 ± 5% perf-profile.children.cycles-pp.pte_offset_map_nolock
0.30 -0.0 0.28 ± 2% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.09 ± 4% -0.0 0.08 perf-profile.children.cycles-pp.down_read_trylock
0.08 -0.0 0.07 ± 5% perf-profile.children.cycles-pp.folio_unlock
0.40 +0.0 0.43 perf-profile.children.cycles-pp.__mod_lruvec_state
1.02 +0.0 1.06 perf-profile.children.cycles-pp.zap_present_ptes
0.47 +0.2 0.67 perf-profile.children.cycles-pp.folio_remove_rmap_ptes
3.87 +0.3 4.12 perf-profile.children.cycles-pp.tlb_finish_mmu
1.17 +0.5 1.71 ± 2% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
32.88 +0.8 33.63 perf-profile.children.cycles-pp.set_pte_range
29.54 +2.3 31.80 perf-profile.children.cycles-pp.tlb_flush_mmu
30.66 +2.3 32.93 perf-profile.children.cycles-pp.zap_pte_range
30.66 +2.3 32.94 perf-profile.children.cycles-pp.unmap_page_range
30.66 +2.3 32.94 perf-profile.children.cycles-pp.zap_pmd_range
30.66 +2.3 32.94 perf-profile.children.cycles-pp.unmap_vmas
33.41 +2.5 35.92 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
33.40 +2.5 35.92 perf-profile.children.cycles-pp.free_pages_and_swap_cache
34.56 +2.5 37.08 perf-profile.children.cycles-pp.__munmap
34.56 +2.5 37.08 perf-profile.children.cycles-pp.__vm_munmap
34.56 +2.5 37.08 perf-profile.children.cycles-pp.__x64_sys_munmap
34.56 +2.5 37.09 perf-profile.children.cycles-pp.do_vmi_munmap
34.56 +2.5 37.09 perf-profile.children.cycles-pp.do_vmi_align_munmap
34.67 +2.5 37.20 perf-profile.children.cycles-pp.do_syscall_64
34.67 +2.5 37.20 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
34.56 +2.5 37.09 perf-profile.children.cycles-pp.unmap_region
33.22 +2.6 35.80 perf-profile.children.cycles-pp.folios_put_refs
32.12 +2.6 34.75 perf-profile.children.cycles-pp.__page_cache_release
61.97 +3.3 65.27 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
61.94 +3.3 65.26 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
61.98 +3.3 65.30 perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
9.32 -0.8 8.49 ± 3% perf-profile.self.cycles-pp.copy_page
5.15 -0.5 4.68 perf-profile.self.cycles-pp.__irqentry_text_end
0.56 -0.3 0.31 perf-profile.self.cycles-pp.mas_walk
2.58 -0.2 2.33 perf-profile.self.cycles-pp.testcase
2.53 -0.2 2.30 perf-profile.self.cycles-pp.error_entry
0.60 ± 4% -0.2 0.44 ± 6% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
0.85 -0.1 0.71 ± 4% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
1.54 -0.1 1.43 ± 3% perf-profile.self.cycles-pp._raw_spin_lock
1.15 -0.1 1.04 perf-profile.self.cycles-pp.native_irq_return_iret
0.94 -0.1 0.85 perf-profile.self.cycles-pp.sync_regs
0.20 ± 3% -0.1 0.13 ± 3% perf-profile.self.cycles-pp._compound_head
0.27 ± 3% -0.1 0.20 ± 3% perf-profile.self.cycles-pp.free_pages_and_swap_cache
0.26 -0.1 0.18 ± 2% perf-profile.self.cycles-pp.get_vma_policy
0.26 -0.1 0.19 ± 2% perf-profile.self.cycles-pp.__page_cache_release
0.16 -0.1 0.09 ± 5% perf-profile.self.cycles-pp.vma_alloc_folio_noprof
0.28 ± 2% -0.1 0.22 ± 3% perf-profile.self.cycles-pp.zap_present_ptes
0.66 -0.1 0.60 perf-profile.self.cycles-pp.___perf_sw_event
0.32 -0.1 0.27 ± 5% perf-profile.self.cycles-pp.lru_add_fn
0.47 -0.0 0.43 ± 2% perf-profile.self.cycles-pp.__handle_mm_fault
0.16 ± 4% -0.0 0.12 perf-profile.self.cycles-pp.lock_vma_under_rcu
0.20 -0.0 0.16 ± 4% perf-profile.self.cycles-pp.free_unref_folios
0.30 -0.0 0.26 perf-profile.self.cycles-pp.handle_mm_fault
0.10 ± 4% -0.0 0.07 perf-profile.self.cycles-pp.zap_pte_range
0.09 ± 5% -0.0 0.06 ± 6% perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
0.14 ± 2% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.mem_cgroup_commit_charge
0.14 ± 3% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.folio_remove_rmap_ptes
0.12 ± 4% -0.0 0.09 ± 7% perf-profile.self.cycles-pp.__mod_zone_page_state
0.10 ± 4% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.alloc_pages_mpol_noprof
0.11 -0.0 0.08 ± 5% perf-profile.self.cycles-pp.free_unref_page_commit
0.22 ± 2% -0.0 0.19 perf-profile.self.cycles-pp.__pte_offset_map_lock
0.21 -0.0 0.18 ± 2% perf-profile.self.cycles-pp.__perf_sw_event
0.21 -0.0 0.18 ± 2% perf-profile.self.cycles-pp.do_user_addr_fault
0.31 ± 2% -0.0 0.29 perf-profile.self.cycles-pp.__mod_node_page_state
0.16 ± 2% -0.0 0.14 ± 5% perf-profile.self.cycles-pp.cgroup_rstat_updated
0.17 ± 2% -0.0 0.15 ± 2% perf-profile.self.cycles-pp.percpu_counter_add_batch
0.11 -0.0 0.09 ± 4% perf-profile.self.cycles-pp.page_counter_uncharge
0.09 -0.0 0.07 perf-profile.self.cycles-pp.get_pfnblock_flags_mask
0.28 ± 2% -0.0 0.26 ± 2% perf-profile.self.cycles-pp.xas_load
0.16 ± 2% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.get_page_from_freelist
0.12 -0.0 0.10 ± 3% perf-profile.self.cycles-pp.uncharge_folio
0.16 ± 4% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.__pte_offset_map
0.20 ± 2% -0.0 0.19 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp
0.16 ± 3% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.shmem_get_policy
0.14 ± 3% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.do_fault
0.08 -0.0 0.07 ± 7% perf-profile.self.cycles-pp.folio_unlock
0.12 ± 3% -0.0 0.11 perf-profile.self.cycles-pp.folio_add_new_anon_rmap
0.09 -0.0 0.08 perf-profile.self.cycles-pp.down_read_trylock
0.07 -0.0 0.06 perf-profile.self.cycles-pp.folio_prealloc
0.38 ± 2% +0.0 0.42 ± 3% perf-profile.self.cycles-pp.filemap_get_entry
0.26 +0.1 0.36 perf-profile.self.cycles-pp.folios_put_refs
0.33 +0.1 0.44 ± 3% perf-profile.self.cycles-pp.folio_batch_move_lru
0.40 ± 5% +0.6 0.98 perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
61.94 +3.3 65.26 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-17 5:56 [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression kernel test robot @ 2024-05-17 23:38 ` Yosry Ahmed 2024-05-18 6:28 ` Shakeel Butt 1 sibling, 0 replies; 15+ messages in thread From: Yosry Ahmed @ 2024-05-17 23:38 UTC (permalink / raw) To: kernel test robot Cc: Shakeel Butt, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin On Thu, May 16, 2024 at 10:56 PM kernel test robot <oliver.sang@intel.com> wrote: > > > > Hello, > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on: > > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master I think we may want to go back to the approach of reordering the indices to separate memcg and non-memcg stats. If we really want to conserve the order in which the stats are exported to userspace, we can use a translation table on the read path instead of the update path. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-17 5:56 [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression kernel test robot 2024-05-17 23:38 ` Yosry Ahmed @ 2024-05-18 6:28 ` Shakeel Butt 2024-05-19 9:14 ` Oliver Sang 1 sibling, 1 reply; 15+ messages in thread From: Shakeel Butt @ 2024-05-18 6:28 UTC (permalink / raw) To: kernel test robot Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote: > > > Hello, > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on: > > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > Thanks for the report. Can you please run the same benchmark but with the full series (of 8 patches) or at least include the ff48c71c26aa ("memcg: reduce memory for the lruvec and memcg stats"). thanks, Shakeel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-18 6:28 ` Shakeel Butt @ 2024-05-19 9:14 ` Oliver Sang 2024-05-19 17:20 ` Shakeel Butt 0 siblings, 1 reply; 15+ messages in thread From: Oliver Sang @ 2024-05-19 9:14 UTC (permalink / raw) To: Shakeel Butt Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin, oliver.sang hi, Shakeel, On Fri, May 17, 2024 at 11:28:10PM -0700, Shakeel Butt wrote: > On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on: > > > > > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats") > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > Thanks for the report. Can you please run the same benchmark but with > the full series (of 8 patches) or at least include the ff48c71c26aa > ("memcg: reduce memory for the lruvec and memcg stats"). while this bisect, ff48c71c26aa has been checked. it has silimar data as 70a64b7919 (a little worse actually) 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 91713 -11.9% 80789 -13.2% 79612 will-it-scale.per_process_ops ok, we will run tests on tip of the series which should be below if I understand it correctly. * a94032b35e5f9 memcg: use proper type for mod_memcg_state > > thanks, > Shakeel > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-19 9:14 ` Oliver Sang @ 2024-05-19 17:20 ` Shakeel Butt 2024-05-20 2:43 ` Oliver Sang 0 siblings, 1 reply; 15+ messages in thread From: Shakeel Butt @ 2024-05-19 17:20 UTC (permalink / raw) To: Oliver Sang Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin On Sun, May 19, 2024 at 05:14:39PM +0800, Oliver Sang wrote: > hi, Shakeel, > > On Fri, May 17, 2024 at 11:28:10PM -0700, Shakeel Butt wrote: > > On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote: > > > > > > > > > Hello, > > > > > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on: > > > > > > > > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats") > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > > > > Thanks for the report. Can you please run the same benchmark but with > > the full series (of 8 patches) or at least include the ff48c71c26aa > > ("memcg: reduce memory for the lruvec and memcg stats"). > > while this bisect, ff48c71c26aa has been checked. it has silimar data as > 70a64b7919 (a little worse actually) > > 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 91713 -11.9% 80789 -13.2% 79612 will-it-scale.per_process_ops > > > ok, we will run tests on tip of the series which should be below if I understand > it correctly. > > * a94032b35e5f9 memcg: use proper type for mod_memcg_state > > Thanks a lot Oliver. One question: what is the filesystem mounted at /tmp on your test machine? I just wanted to make sure I run the test with minimal changes from your setup. > > > > thanks, > > Shakeel > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-19 17:20 ` Shakeel Butt @ 2024-05-20 2:43 ` Oliver Sang 2024-05-20 3:49 ` Shakeel Butt 0 siblings, 1 reply; 15+ messages in thread From: Oliver Sang @ 2024-05-20 2:43 UTC (permalink / raw) To: Shakeel Butt Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin, oliver.sang hi, Shakeel, On Sun, May 19, 2024 at 10:20:28AM -0700, Shakeel Butt wrote: > On Sun, May 19, 2024 at 05:14:39PM +0800, Oliver Sang wrote: > > hi, Shakeel, > > > > On Fri, May 17, 2024 at 11:28:10PM -0700, Shakeel Butt wrote: > > > On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote: > > > > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on: > > > > > > > > > > > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats") > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > > > > > > > Thanks for the report. Can you please run the same benchmark but with > > > the full series (of 8 patches) or at least include the ff48c71c26aa > > > ("memcg: reduce memory for the lruvec and memcg stats"). > > > > while this bisect, ff48c71c26aa has been checked. it has silimar data as > > 70a64b7919 (a little worse actually) > > > > 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 > > ---------------- --------------------------- --------------------------- > > %stddev %change %stddev %change %stddev > > \ | \ | \ > > 91713 -11.9% 80789 -13.2% 79612 will-it-scale.per_process_ops > > > > > > ok, we will run tests on tip of the series which should be below if I understand > > it correctly. > > > > * a94032b35e5f9 memcg: use proper type for mod_memcg_state > > > > > > Thanks a lot Oliver. One question: what is the filesystem mounted at > /tmp on your test machine? I just wanted to make sure I run the test > with minimal changes from your setup. we don't have specific partition for /tmp, just use tmpfs tmp on /tmp type tmpfs (rw,relatime) BTW, the test on a94032b35e5f9 finished, still have similar score to 70a64b7919 ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 91713 -11.9% 80789 -13.2% 79612 -13.0% 79833 will-it-scale.per_process_ops > > > > > > > thanks, > > > Shakeel > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-20 2:43 ` Oliver Sang @ 2024-05-20 3:49 ` Shakeel Butt 2024-05-21 2:43 ` Oliver Sang 0 siblings, 1 reply; 15+ messages in thread From: Shakeel Butt @ 2024-05-20 3:49 UTC (permalink / raw) To: Oliver Sang Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin On Mon, May 20, 2024 at 10:43:35AM +0800, Oliver Sang wrote: > hi, Shakeel, > > On Sun, May 19, 2024 at 10:20:28AM -0700, Shakeel Butt wrote: > > On Sun, May 19, 2024 at 05:14:39PM +0800, Oliver Sang wrote: > > > hi, Shakeel, > > > > > > On Fri, May 17, 2024 at 11:28:10PM -0700, Shakeel Butt wrote: > > > > On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote: > > > > > > > > > > > > > > > Hello, > > > > > > > > > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on: > > > > > > > > > > > > > > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats") > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > > > > > > > > > > Thanks for the report. Can you please run the same benchmark but with > > > > the full series (of 8 patches) or at least include the ff48c71c26aa > > > > ("memcg: reduce memory for the lruvec and memcg stats"). > > > > > > while this bisect, ff48c71c26aa has been checked. it has silimar data as > > > 70a64b7919 (a little worse actually) > > > > > > 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 > > > ---------------- --------------------------- --------------------------- > > > %stddev %change %stddev %change %stddev > > > \ | \ | \ > > > 91713 -11.9% 80789 -13.2% 79612 will-it-scale.per_process_ops > > > > > > > > > ok, we will run tests on tip of the series which should be below if I understand > > > it correctly. > > > > > > * a94032b35e5f9 memcg: use proper type for mod_memcg_state > > > > > > > > > > Thanks a lot Oliver. One question: what is the filesystem mounted at > > /tmp on your test machine? I just wanted to make sure I run the test > > with minimal changes from your setup. > > we don't have specific partition for /tmp, just use tmpfs > > tmp on /tmp type tmpfs (rw,relatime) > > > BTW, the test on a94032b35e5f9 finished, still have similar score to 70a64b7919 > > ========================================================================================= > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: > gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale > > 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929 > ---------------- --------------------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev %change %stddev > \ | \ | \ | \ > 91713 -11.9% 80789 -13.2% 79612 -13.0% 79833 will-it-scale.per_process_ops > Thanks again. I am not sure if you have a single node machine but if you have, can you try to repro this issue on such machine. At the moment, I don't have access to such machine but I will try to repro myself as well. Shakeel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-20 3:49 ` Shakeel Butt @ 2024-05-21 2:43 ` Oliver Sang 2024-05-22 4:18 ` Shakeel Butt 0 siblings, 1 reply; 15+ messages in thread From: Oliver Sang @ 2024-05-21 2:43 UTC (permalink / raw) To: Shakeel Butt Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin, oliver.sang hi, Shakeel, On Sun, May 19, 2024 at 08:49:33PM -0700, Shakeel Butt wrote: > On Mon, May 20, 2024 at 10:43:35AM +0800, Oliver Sang wrote: > > hi, Shakeel, > > > > On Sun, May 19, 2024 at 10:20:28AM -0700, Shakeel Butt wrote: > > > On Sun, May 19, 2024 at 05:14:39PM +0800, Oliver Sang wrote: > > > > hi, Shakeel, > > > > > > > > On Fri, May 17, 2024 at 11:28:10PM -0700, Shakeel Butt wrote: > > > > > On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote: > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on: > > > > > > > > > > > > > > > > > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats") > > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > > > > > > > > > > > > > Thanks for the report. Can you please run the same benchmark but with > > > > > the full series (of 8 patches) or at least include the ff48c71c26aa > > > > > ("memcg: reduce memory for the lruvec and memcg stats"). > > > > > > > > while this bisect, ff48c71c26aa has been checked. it has silimar data as > > > > 70a64b7919 (a little worse actually) > > > > > > > > 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 > > > > ---------------- --------------------------- --------------------------- > > > > %stddev %change %stddev %change %stddev > > > > \ | \ | \ > > > > 91713 -11.9% 80789 -13.2% 79612 will-it-scale.per_process_ops > > > > > > > > > > > > ok, we will run tests on tip of the series which should be below if I understand > > > > it correctly. > > > > > > > > * a94032b35e5f9 memcg: use proper type for mod_memcg_state > > > > > > > > > > > > > > Thanks a lot Oliver. One question: what is the filesystem mounted at > > > /tmp on your test machine? I just wanted to make sure I run the test > > > with minimal changes from your setup. > > > > we don't have specific partition for /tmp, just use tmpfs > > > > tmp on /tmp type tmpfs (rw,relatime) > > > > > > BTW, the test on a94032b35e5f9 finished, still have similar score to 70a64b7919 > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: > > gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale > > > > 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929 > > ---------------- --------------------------- --------------------------- --------------------------- > > %stddev %change %stddev %change %stddev %change %stddev > > \ | \ | \ | \ > > 91713 -11.9% 80789 -13.2% 79612 -13.0% 79833 will-it-scale.per_process_ops > > > > Thanks again. I am not sure if you have a single node machine but if you > have, can you try to repro this issue on such machine. At the moment, I > don't have access to such machine but I will try to repro myself as > well. we reported regression on a 2-node Skylake server. so I found a 1-node Skylake desktop (we don't have 1 node server) to check. model: Skylake nr_node: 1 nr_cpu: 36 memory: 32G brand: Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz but cannot reproduce this regression: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-d08/page_fault2/will-it-scale 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 136040 -0.2% 135718 -0.2% 135829 -0.1% 135881 will-it-scale.per_process_ops then I tried on 2-node servers with other models for model: Ice Lake nr_node: 2 nr_cpu: 64 memory: 256G brand: Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz similar regression ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/page_fault2/will-it-scale 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 240373 -14.4% 205702 -14.1% 206368 -12.9% 209394 will-it-scale.per_process_ops full data is as below [1] for model: Sapphire Rapids nr_node: 2 nr_cpu: 224 memory: 512G brand: Intel(R) Xeon(R) Platinum 8480CTDX the regression is smaller but still exists. ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/page_fault2/will-it-scale 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 78072 -3.4% 75386 -6.0% 73363 -5.6% 73683 will-it-scale.per_process_ops full data is as below [2] hope these data are useful. [1] ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/page_fault2/will-it-scale 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 0.27 ± 3% -0.0 0.24 ± 3% -0.0 0.23 ± 3% -0.0 0.24 ± 2% mpstat.cpu.all.irq% 3.83 -0.7 3.17 ± 2% -0.6 3.23 ± 3% -0.6 3.21 mpstat.cpu.all.usr% 62547 -10.1% 56227 -10.8% 55807 -8.9% 56984 perf-c2c.DRAM.local 194.40 ± 9% -11.5% 172.00 ± 4% -11.5% 172.00 ± 5% -13.9% 167.40 ± 2% perf-c2c.HITM.remote 15383898 -14.4% 13164951 -14.1% 13207631 -12.9% 13401271 will-it-scale.64.processes 240373 -14.4% 205702 -14.1% 206368 -12.9% 209394 will-it-scale.per_process_ops 15383898 -14.4% 13164951 -14.1% 13207631 -12.9% 13401271 will-it-scale.workload 2.359e+09 -12.9% 2.055e+09 -14.2% 2.023e+09 -12.8% 2.057e+09 numa-numastat.node0.local_node 2.359e+09 -12.9% 2.055e+09 -14.2% 2.023e+09 -12.8% 2.057e+09 numa-numastat.node0.numa_hit 2.346e+09 -16.1% 1.967e+09 -14.2% 2.013e+09 -13.2% 2.035e+09 ± 2% numa-numastat.node1.local_node 2.345e+09 -16.1% 1.967e+09 -14.2% 2.013e+09 -13.2% 2.036e+09 ± 2% numa-numastat.node1.numa_hit 567382 ± 8% +2.1% 579061 ± 10% -9.5% 513215 ± 5% +1.2% 574201 ± 9% numa-vmstat.node0.nr_anon_pages 2.36e+09 -12.9% 2.055e+09 -14.3% 2.023e+09 -12.9% 2.056e+09 numa-vmstat.node0.numa_hit 2.36e+09 -12.9% 2.055e+09 -14.3% 2.023e+09 -12.9% 2.056e+09 numa-vmstat.node0.numa_local 2.346e+09 -16.2% 1.966e+09 -14.2% 2.012e+09 -13.3% 2.035e+09 ± 2% numa-vmstat.node1.numa_hit 2.347e+09 -16.2% 1.967e+09 -14.2% 2.013e+09 -13.3% 2.034e+09 ± 2% numa-vmstat.node1.numa_local 1137116 -1.9% 1115597 -1.5% 1119624 -1.8% 1116759 proc-vmstat.nr_anon_pages 4575 +2.1% 4673 +2.1% 4671 +1.7% 4654 proc-vmstat.nr_page_table_pages 4.705e+09 -14.5% 4.022e+09 -14.2% 4.036e+09 -13.0% 4.093e+09 proc-vmstat.numa_hit 4.706e+09 -14.5% 4.023e+09 -14.2% 4.037e+09 -13.0% 4.092e+09 proc-vmstat.numa_local 4.645e+09 -14.3% 3.979e+09 -14.1% 3.991e+09 -12.8% 4.05e+09 proc-vmstat.pgalloc_normal 4.631e+09 -14.3% 3.967e+09 -14.1% 3.979e+09 -12.8% 4.038e+09 proc-vmstat.pgfault 4.643e+09 -14.3% 3.978e+09 -14.1% 3.99e+09 -12.8% 4.049e+09 proc-vmstat.pgfree 29780 ± 54% -49.0% 15173 ± 50% -87.2% 3818 ±199% -33.2% 19878 ±112% sched_debug.cfs_rq:/.left_deadline.avg 1905931 ± 54% -49.1% 971033 ± 50% -87.2% 244356 ±199% -33.2% 1272254 ±112% sched_debug.cfs_rq:/.left_deadline.max 236372 ± 54% -49.1% 120428 ± 50% -87.2% 30306 ±199% -33.2% 157784 ±112% sched_debug.cfs_rq:/.left_deadline.stddev 29779 ± 54% -49.0% 15172 ± 50% -87.2% 3818 ±199% -33.2% 19878 ±112% sched_debug.cfs_rq:/.left_vruntime.avg 1905916 ± 54% -49.1% 971025 ± 50% -87.2% 244349 ±199% -33.2% 1272236 ±112% sched_debug.cfs_rq:/.left_vruntime.max 236371 ± 54% -49.1% 120427 ± 50% -87.2% 30304 ±199% -33.2% 157782 ±112% sched_debug.cfs_rq:/.left_vruntime.stddev 12745 ± 8% +2.4% 13045 -9.7% 11510 ± 11% -6.0% 11984 ± 10% sched_debug.cfs_rq:/.load.min 253.83 ± 24% +56.9% 398.30 ± 27% +58.4% 402.13 ± 56% +23.8% 314.20 ± 23% sched_debug.cfs_rq:/.load_avg.max 22.93 ± 4% -12.2% 20.14 ± 17% -12.0% 20.17 ± 17% -18.5% 18.68 ± 15% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 22.93 ± 4% -13.0% 19.94 ± 16% -12.1% 20.16 ± 17% -19.9% 18.35 ± 14% sched_debug.cfs_rq:/.removed.util_avg.stddev 29779 ± 54% -49.0% 15172 ± 50% -87.2% 3818 ±199% -33.2% 19878 ±112% sched_debug.cfs_rq:/.right_vruntime.avg 1905916 ± 54% -49.1% 971025 ± 50% -87.2% 244349 ±199% -33.2% 1272236 ±112% sched_debug.cfs_rq:/.right_vruntime.max 236371 ± 54% -49.1% 120427 ± 50% -87.2% 30304 ±199% -33.2% 157782 ±112% sched_debug.cfs_rq:/.right_vruntime.stddev 149.50 ± 33% -81.3% 28.00 ±180% -71.2% 43.03 ±120% -70.9% 43.57 ±125% sched_debug.cfs_rq:/.util_est.min 1930 ± 4% -15.5% 1631 ± 7% -18.1% 1581 ± 5% -10.5% 1729 ± 16% sched_debug.cpu.nr_switches.min 0.79 ± 98% +89.1% 1.49 ± 48% +147.8% 1.96 ± 16% -12.4% 0.69 ± 91% sched_debug.rt_rq:.rt_time.avg 50.52 ± 98% +89.2% 95.60 ± 48% +147.8% 125.19 ± 17% -12.3% 44.29 ± 91% sched_debug.rt_rq:.rt_time.max 6.27 ± 98% +89.2% 11.86 ± 48% +147.8% 15.53 ± 17% -12.3% 5.49 ± 91% sched_debug.rt_rq:.rt_time.stddev 21.14 -10.1% 19.00 -10.1% 19.01 ± 2% -9.9% 19.05 perf-stat.i.MPKI 1.468e+10 -9.4% 1.33e+10 -9.0% 1.336e+10 -7.9% 1.351e+10 perf-stat.i.branch-instructions 14349180 -7.8% 13236560 -6.6% 13407521 -6.2% 13464962 perf-stat.i.branch-misses 69.58 -5.1 64.51 -4.8 64.81 -4.6 64.96 perf-stat.i.cache-miss-rate% 1.57e+09 -19.5% 1.263e+09 ± 2% -18.9% 1.273e+09 ± 3% -17.8% 1.291e+09 perf-stat.i.cache-misses 2.252e+09 -13.2% 1.955e+09 -12.9% 1.961e+09 -11.9% 1.985e+09 perf-stat.i.cache-references 3.00 +12.8% 3.39 +12.0% 3.36 +10.6% 3.32 perf-stat.i.cpi 99.00 -0.9% 98.11 -1.1% 97.90 -0.9% 98.13 perf-stat.i.cpu-migrations 143.06 +25.2% 179.10 ± 2% +24.5% 178.15 ± 3% +22.4% 175.18 perf-stat.i.cycles-between-cache-misses 7.403e+10 -10.4% 6.634e+10 -9.8% 6.679e+10 -8.7% 6.76e+10 perf-stat.i.instructions 0.34 -11.4% 0.30 -10.7% 0.30 -9.7% 0.30 perf-stat.i.ipc 478.41 -14.3% 410.14 -14.0% 411.31 -12.7% 417.50 perf-stat.i.metric.K/sec 15310132 -14.3% 13125768 -14.0% 13162999 -12.7% 13361235 perf-stat.i.minor-faults 15310132 -14.3% 13125768 -14.0% 13163000 -12.7% 13361235 perf-stat.i.page-faults 21.21 -28.4% 15.17 ± 50% -10.2% 19.05 ± 2% -28.3% 15.20 ± 50% perf-stat.overall.MPKI 0.10 -0.0 0.08 ± 50% +0.0 0.10 -0.0 0.08 ± 50% perf-stat.overall.branch-miss-rate% 69.71 -18.2 51.52 ± 50% -4.8 64.89 -17.9 51.83 ± 50% perf-stat.overall.cache-miss-rate% 3.01 -9.7% 2.72 ± 50% +11.9% 3.37 -11.4% 2.67 ± 50% perf-stat.overall.cpi 141.98 +1.0% 143.41 ± 50% +24.6% 176.94 ± 3% -1.2% 140.33 ± 50% perf-stat.overall.cycles-between-cache-misses 0.33 -29.1% 0.24 ± 50% -10.6% 0.30 -27.7% 0.24 ± 50% perf-stat.overall.ipc 1453908 -16.2% 1217875 ± 50% +4.9% 1524841 -16.2% 1218410 ± 50% perf-stat.overall.path-length 1.463e+10 -27.6% 1.059e+10 ± 50% -9.0% 1.332e+10 -26.4% 1.077e+10 ± 50% perf-stat.ps.branch-instructions 14253731 -25.8% 10569701 ± 50% -6.6% 13307817 -25.1% 10681742 ± 50% perf-stat.ps.branch-misses 1.565e+09 -36.0% 1.002e+09 ± 50% -18.9% 1.269e+09 ± 3% -34.6% 1.023e+09 ± 50% perf-stat.ps.cache-misses 2.245e+09 -30.7% 1.556e+09 ± 50% -12.9% 1.954e+09 -29.6% 1.579e+09 ± 50% perf-stat.ps.cache-references 98.42 -20.7% 78.08 ± 50% -1.0% 97.40 -20.6% 78.12 ± 50% perf-stat.ps.cpu-migrations 7.378e+10 -28.4% 5.281e+10 ± 50% -9.8% 6.656e+10 -27.0% 5.385e+10 ± 50% perf-stat.ps.instructions 15260342 -31.6% 10437993 ± 50% -14.0% 13119215 -30.3% 10633461 ± 50% perf-stat.ps.minor-faults 15260342 -31.6% 10437993 ± 50% -14.0% 13119215 -30.3% 10633461 ± 50% perf-stat.ps.page-faults 2.237e+13 -28.5% 1.599e+13 ± 50% -10.0% 2.014e+13 -27.2% 1.629e+13 ± 50% perf-stat.total.instructions 75.68 -6.2 69.50 -6.1 69.63 -5.4 70.26 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 72.31 -5.8 66.56 -5.6 66.68 -5.1 67.25 perf-profile.calltrace.cycles-pp.testcase 63.50 -4.4 59.13 -4.4 59.13 -3.9 59.64 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 63.32 -4.4 58.97 -4.4 58.97 -3.8 59.48 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 61.04 -4.1 56.99 -4.1 56.98 -3.6 57.49 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 21.29 -3.9 17.43 ± 3% -3.6 17.67 ± 3% -3.5 17.77 ± 2% perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 59.53 -3.8 55.69 -3.9 55.68 -3.3 56.21 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 58.35 -3.7 54.65 -3.7 54.65 -3.2 55.17 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 5.31 -0.9 4.40 ± 2% -0.9 4.44 ± 2% -0.8 4.50 perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 4.97 -0.8 4.13 ± 2% -0.8 4.15 ± 2% -0.8 4.21 perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault 4.40 -0.7 3.72 ± 3% -0.6 3.79 ± 3% -0.6 3.78 ± 2% perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 2.63 -0.4 2.23 ± 2% -0.4 2.26 ± 2% -0.3 2.29 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase 1.82 -0.4 1.44 ± 2% -0.4 1.47 ± 2% -0.3 1.49 perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 2.21 -0.3 1.89 ± 2% -0.3 1.88 ± 2% -0.3 1.90 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault 2.01 -0.3 1.69 ± 4% -0.2 1.76 ± 5% -0.3 1.73 ± 2% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault 1.80 -0.3 1.52 ± 2% -0.3 1.52 ± 2% -0.3 1.54 perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault 1.74 -0.2 1.50 ± 3% -0.2 1.51 ± 3% -0.2 1.52 ± 2% perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 1.55 -0.2 1.31 ± 2% -0.2 1.30 ± 2% -0.2 1.33 perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault 1.60 -0.2 1.37 ± 3% -0.2 1.39 ± 3% -0.2 1.39 ± 2% perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.29 -0.2 1.08 ± 3% -0.2 1.14 ± 4% -0.2 1.11 ± 3% perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault 1.42 -0.2 1.21 ± 3% -0.2 1.23 ± 3% -0.2 1.24 ± 2% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault 1.50 -0.2 1.31 ± 2% -0.1 1.41 ± 2% -0.1 1.36 ± 3% perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault.__handle_mm_fault 1.12 -0.2 0.93 ± 3% -0.2 0.93 ± 2% -0.2 0.95 ± 2% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc 0.92 -0.1 0.78 ± 4% -0.1 0.80 ± 3% -0.1 0.81 ± 3% perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault 0.74 -0.1 0.61 ± 2% -0.1 0.65 ± 2% -0.1 0.64 perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range 0.98 -0.1 0.86 ± 2% -0.1 0.87 ± 2% -0.1 0.87 ± 2% perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.72 ± 2% -0.1 0.61 ± 2% -0.1 0.61 ± 2% -0.1 0.60 ± 3% perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.63 ± 2% -0.1 0.53 -0.1 0.53 ± 2% -0.2 0.41 ± 50% perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault 1.15 -0.1 1.05 -0.1 1.08 -0.1 1.07 perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 0.66 -0.1 0.56 ± 2% -0.1 0.56 ± 2% -0.1 0.56 ± 2% perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.64 -0.1 0.55 ± 4% -0.1 0.54 ± 3% -0.1 0.56 ± 2% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof 0.66 -0.1 0.58 ± 2% -0.1 0.59 ± 3% -0.1 0.58 ± 2% perf-profile.calltrace.cycles-pp.mas_walk.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 2.71 +0.7 3.39 +0.7 3.36 +0.6 3.31 ± 2% perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap 2.71 +0.7 3.39 +0.7 3.36 +0.6 3.31 ± 2% perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap 2.71 +0.7 3.39 +0.7 3.37 +0.6 3.31 ± 2% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 2.65 +0.7 3.34 +0.7 3.32 +0.6 3.26 ± 2% perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region 2.44 +0.7 3.15 +0.7 3.13 +0.6 3.07 ± 2% perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu 24.39 +2.2 26.56 ± 5% +1.8 26.19 ± 4% +2.1 26.54 ± 3% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 22.46 +2.4 24.88 ± 5% +2.0 24.41 ± 5% +2.3 24.81 ± 4% perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault 22.25 +2.5 24.70 ± 5% +2.0 24.24 ± 5% +2.4 24.63 ± 4% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault 20.38 +2.5 22.90 ± 6% +2.0 22.42 ± 5% +2.5 22.84 ± 4% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 20.37 +2.5 22.89 ± 6% +2.0 22.41 ± 5% +2.5 22.83 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range 20.30 +2.5 22.83 ± 6% +2.0 22.35 ± 5% +2.5 22.77 ± 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma 22.59 +5.3 27.93 +5.3 27.84 +4.7 27.29 ± 2% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 22.59 +5.3 27.93 +5.3 27.84 +4.7 27.29 ± 2% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 22.59 +5.3 27.93 +5.3 27.84 +4.7 27.29 ± 2% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap 22.58 +5.3 27.92 +5.3 27.83 +4.7 27.28 ± 2% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 20.59 +5.8 26.34 +5.6 26.22 +5.1 25.64 ± 2% perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 20.59 +5.8 26.34 +5.6 26.22 +5.1 25.64 ± 2% perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range 20.56 +5.8 26.32 +5.6 26.20 +5.1 25.62 ± 2% perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range 20.07 +5.9 25.95 +5.8 25.83 +5.2 25.23 ± 3% perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range 18.73 +6.0 24.73 +5.9 24.63 +5.3 24.01 ± 3% perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu 25.34 +6.0 31.36 +5.9 31.24 +5.3 30.64 ± 2% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 25.34 +6.0 31.36 +5.9 31.24 +5.3 30.64 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 25.34 +6.0 31.36 +5.9 31.24 +5.3 30.64 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 25.34 +6.0 31.36 +5.9 31.24 +5.3 30.64 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 25.34 +6.0 31.36 +5.9 31.24 +5.3 30.64 ± 2% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 25.33 +6.0 31.36 +5.9 31.24 +5.3 30.64 ± 2% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 25.34 +6.0 31.37 +5.9 31.25 +5.3 30.65 ± 2% perf-profile.calltrace.cycles-pp.__munmap 25.33 +6.0 31.36 +5.9 31.24 +5.3 30.64 ± 2% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 20.35 +6.7 27.09 +6.6 26.96 +5.9 26.29 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache 20.36 +6.7 27.11 +6.6 26.98 +5.9 26.30 ± 3% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages 20.28 +6.8 27.04 +6.6 26.91 +6.0 26.24 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs 74.49 -6.0 68.46 -5.9 68.59 -5.3 69.18 perf-profile.children.cycles-pp.testcase 71.15 -5.5 65.63 -5.4 65.72 -4.8 66.30 perf-profile.children.cycles-pp.asm_exc_page_fault 63.55 -4.4 59.16 -4.4 59.17 -3.9 59.68 perf-profile.children.cycles-pp.exc_page_fault 63.38 -4.4 59.03 -4.4 59.03 -3.8 59.54 perf-profile.children.cycles-pp.do_user_addr_fault 61.10 -4.1 57.04 -4.1 57.03 -3.6 57.54 perf-profile.children.cycles-pp.handle_mm_fault 21.32 -3.9 17.45 ± 3% -3.6 17.70 ± 3% -3.5 17.80 ± 2% perf-profile.children.cycles-pp.copy_page 59.57 -3.9 55.72 -3.9 55.72 -3.3 56.24 perf-profile.children.cycles-pp.__handle_mm_fault 58.44 -3.7 54.74 -3.7 54.74 -3.2 55.25 perf-profile.children.cycles-pp.do_fault 5.36 -0.9 4.44 ± 2% -0.9 4.48 ± 2% -0.8 4.54 perf-profile.children.cycles-pp.__pte_offset_map_lock 5.02 -0.9 4.16 ± 2% -0.8 4.19 ± 2% -0.8 4.25 perf-profile.children.cycles-pp._raw_spin_lock 4.45 -0.7 3.76 ± 3% -0.6 3.83 ± 3% -0.6 3.82 ± 2% perf-profile.children.cycles-pp.folio_prealloc 2.64 -0.4 2.24 ± 2% -0.4 2.27 ± 2% -0.3 2.30 perf-profile.children.cycles-pp.sync_regs 1.89 -0.4 1.49 ± 2% -0.4 1.52 ± 2% -0.3 1.55 perf-profile.children.cycles-pp.zap_present_ptes 2.42 -0.4 2.04 ± 2% -0.3 2.08 ± 3% -0.3 2.09 ± 2% perf-profile.children.cycles-pp.native_irq_return_iret 2.24 -0.3 1.91 ± 2% -0.3 1.91 ± 2% -0.3 1.93 perf-profile.children.cycles-pp.vma_alloc_folio_noprof 2.07 -0.3 1.74 ± 3% -0.3 1.80 ± 5% -0.3 1.77 ± 2% perf-profile.children.cycles-pp.__mem_cgroup_charge 1.89 -0.3 1.61 ± 2% -0.3 1.60 ± 2% -0.3 1.62 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof 2.04 -0.3 1.77 ± 2% -0.1 1.90 ± 2% -0.2 1.83 ± 3% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio 1.64 -0.3 1.39 ± 2% -0.3 1.39 ± 2% -0.2 1.41 perf-profile.children.cycles-pp.__alloc_pages_noprof 1.77 -0.2 1.52 ± 3% -0.2 1.53 ± 3% -0.2 1.54 ± 2% perf-profile.children.cycles-pp.__do_fault 1.62 -0.2 1.39 ± 3% -0.2 1.41 ± 3% -0.2 1.41 ± 2% perf-profile.children.cycles-pp.shmem_fault 1.32 -0.2 1.10 ± 3% -0.2 1.16 ± 4% -0.2 1.13 ± 2% perf-profile.children.cycles-pp.mem_cgroup_commit_charge 1.42 -0.2 1.21 ± 2% -0.2 1.20 ± 2% -0.2 1.19 ± 2% perf-profile.children.cycles-pp.__perf_sw_event 1.47 -0.2 1.27 ± 3% -0.2 1.28 ± 3% -0.2 1.29 ± 2% perf-profile.children.cycles-pp.shmem_get_folio_gfp 1.13 ± 2% -0.2 0.93 ± 4% -0.1 1.06 ± 2% -0.1 1.03 ± 3% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 1.17 -0.2 0.98 ± 2% -0.2 0.98 ± 2% -0.2 1.00 ± 2% perf-profile.children.cycles-pp.get_page_from_freelist 1.25 -0.2 1.06 ± 2% -0.2 1.06 ± 2% -0.2 1.05 ± 2% perf-profile.children.cycles-pp.___perf_sw_event 0.84 -0.2 0.67 ± 3% -0.2 0.68 ± 4% -0.2 0.69 ± 2% perf-profile.children.cycles-pp.__mod_lruvec_state 0.61 -0.2 0.44 ± 3% -0.2 0.43 ± 3% -0.2 0.46 ± 2% perf-profile.children.cycles-pp._compound_head 0.65 -0.1 0.51 ± 2% -0.1 0.53 ± 4% -0.1 0.53 ± 2% perf-profile.children.cycles-pp.__mod_node_page_state 0.94 -0.1 0.80 ± 4% -0.1 0.82 ± 4% -0.1 0.82 ± 3% perf-profile.children.cycles-pp.filemap_get_entry 1.02 -0.1 0.89 ± 2% -0.1 0.90 ± 3% -0.1 0.90 ± 2% perf-profile.children.cycles-pp.lock_vma_under_rcu 0.76 -0.1 0.63 ± 2% -0.1 0.67 ± 2% -0.1 0.66 perf-profile.children.cycles-pp.folio_remove_rmap_ptes 1.20 -0.1 1.10 -0.1 1.13 -0.1 1.11 perf-profile.children.cycles-pp.lru_add_fn 0.69 -0.1 0.59 ± 4% -0.1 0.58 ± 2% -0.1 0.60 ± 2% perf-profile.children.cycles-pp.rmqueue 0.47 -0.1 0.38 ± 2% -0.1 0.37 ± 2% -0.1 0.38 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios 0.59 -0.1 0.49 ± 2% -0.1 0.49 -0.1 0.50 perf-profile.children.cycles-pp.free_unref_folios 0.54 -0.1 0.45 ± 4% -0.1 0.46 ± 3% -0.1 0.47 ± 3% perf-profile.children.cycles-pp.xas_load 0.67 -0.1 0.58 ± 3% -0.1 0.60 ± 3% -0.1 0.59 ± 2% perf-profile.children.cycles-pp.mas_walk 0.63 ± 3% -0.1 0.55 ± 3% -0.0 0.61 ± 4% -0.1 0.55 ± 3% perf-profile.children.cycles-pp.__count_memcg_events 0.27 ± 3% -0.1 0.21 ± 3% -0.1 0.21 ± 3% -0.1 0.21 perf-profile.children.cycles-pp.uncharge_batch 0.38 -0.1 0.32 ± 5% -0.0 0.33 -0.0 0.33 perf-profile.children.cycles-pp.try_charge_memcg 0.22 ± 3% -0.1 0.17 ± 4% -0.1 0.17 ± 4% -0.1 0.17 ± 2% perf-profile.children.cycles-pp.page_counter_uncharge 0.32 -0.1 0.27 -0.0 0.28 -0.1 0.26 perf-profile.children.cycles-pp.cgroup_rstat_updated 0.26 ± 3% -0.0 0.21 ± 4% -0.0 0.22 ± 2% -0.0 0.23 ± 5% perf-profile.children.cycles-pp.__pte_offset_map 0.30 -0.0 0.26 ± 2% -0.0 0.26 -0.0 0.26 ± 3% perf-profile.children.cycles-pp.handle_pte_fault 0.28 -0.0 0.24 ± 2% -0.0 0.25 ± 3% -0.0 0.25 perf-profile.children.cycles-pp.error_entry 0.31 -0.0 0.27 -0.0 0.26 ± 5% -0.0 0.26 perf-profile.children.cycles-pp.percpu_counter_add_batch 0.31 ± 2% -0.0 0.27 ± 6% -0.0 0.27 ± 4% -0.0 0.27 ± 3% perf-profile.children.cycles-pp.get_vma_policy 0.22 -0.0 0.19 ± 2% -0.0 0.19 ± 2% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.free_unref_page_commit 0.22 ± 2% -0.0 0.19 ± 3% -0.0 0.19 -0.0 0.20 ± 2% perf-profile.children.cycles-pp.folio_add_new_anon_rmap 0.26 ± 2% -0.0 0.22 ± 9% -0.0 0.22 ± 4% -0.0 0.23 ± 5% perf-profile.children.cycles-pp._raw_spin_trylock 0.28 ± 2% -0.0 0.25 ± 3% -0.0 0.25 -0.0 0.25 ± 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.32 ± 2% -0.0 0.29 ± 4% -0.0 0.28 ± 2% -0.0 0.29 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.26 ± 3% -0.0 0.22 ± 4% -0.0 0.22 ± 3% -0.0 0.23 perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.25 ± 3% -0.0 0.21 ± 4% -0.0 0.21 ± 2% -0.0 0.22 perf-profile.children.cycles-pp.hrtimer_interrupt 0.22 ± 2% -0.0 0.19 ± 3% -0.0 0.19 ± 2% -0.0 0.19 ± 3% perf-profile.children.cycles-pp.pte_offset_map_nolock 0.17 ± 2% -0.0 0.14 ± 4% -0.0 0.14 ± 5% -0.0 0.15 ± 3% perf-profile.children.cycles-pp.folio_unlock 0.14 ± 2% -0.0 0.11 -0.0 0.11 ± 3% -0.0 0.11 perf-profile.children.cycles-pp.__mod_zone_page_state 0.19 ± 2% -0.0 0.16 ± 2% -0.0 0.17 ± 2% -0.0 0.17 ± 3% perf-profile.children.cycles-pp.down_read_trylock 0.18 -0.0 0.15 ± 3% -0.0 0.15 ± 4% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.__rmqueue_pcplist 0.14 ± 2% -0.0 0.11 ± 6% -0.0 0.11 ± 8% -0.0 0.12 ± 6% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode 0.14 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 5% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.perf_exclude_event 0.19 ± 2% -0.0 0.17 ± 4% -0.0 0.17 ± 2% -0.0 0.17 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.16 ± 2% -0.0 0.14 -0.0 0.13 ± 3% -0.0 0.14 ± 2% perf-profile.children.cycles-pp.uncharge_folio 0.12 ± 3% -0.0 0.10 ± 5% -0.0 0.09 ± 5% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.get_pfnblock_flags_mask 0.18 ± 3% -0.0 0.16 ± 4% -0.0 0.16 ± 3% -0.0 0.16 ± 3% perf-profile.children.cycles-pp.tick_nohz_handler 0.13 ± 3% -0.0 0.10 ± 4% -0.0 0.10 ± 4% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.page_counter_try_charge 0.16 -0.0 0.14 ± 2% -0.0 0.14 ± 4% -0.0 0.14 ± 2% perf-profile.children.cycles-pp.folio_put 0.18 ± 2% -0.0 0.16 ± 3% -0.0 0.16 ± 3% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.__cond_resched 0.18 ± 2% -0.0 0.16 ± 5% -0.0 0.16 ± 4% -0.0 0.16 ± 2% perf-profile.children.cycles-pp.up_read 0.14 -0.0 0.12 -0.0 0.12 -0.0 0.12 ± 3% perf-profile.children.cycles-pp.policy_nodemask 0.16 ± 2% -0.0 0.14 ± 3% -0.0 0.14 ± 2% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.update_process_times 0.11 ± 3% -0.0 0.09 ± 8% -0.0 0.09 ± 4% -0.0 0.09 ± 4% perf-profile.children.cycles-pp.xas_start 0.13 ± 3% -0.0 0.11 -0.0 0.11 ± 3% -0.0 0.11 perf-profile.children.cycles-pp.access_error 0.09 ± 4% -0.0 0.08 ± 5% -0.0 0.08 ± 5% -0.0 0.08 perf-profile.children.cycles-pp.__irqentry_text_end 0.07 ± 5% -0.0 0.05 ± 9% -0.0 0.06 ± 6% -0.0 0.06 perf-profile.children.cycles-pp.vm_normal_page 0.06 ± 7% -0.0 0.05 ± 7% -0.0 0.05 -0.0 0.05 ± 7% perf-profile.children.cycles-pp.__tlb_remove_folio_pages_size 0.08 -0.0 0.07 ± 5% -0.0 0.07 ± 5% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.memcg_check_events 0.12 ± 3% -0.0 0.11 ± 6% -0.0 0.11 ± 4% -0.0 0.11 ± 3% perf-profile.children.cycles-pp.perf_swevent_event 0.06 -0.0 0.05 ± 7% -0.0 0.05 -0.0 0.05 perf-profile.children.cycles-pp.pte_alloc_one 0.06 -0.0 0.05 ± 7% -0.0 0.05 -0.0 0.05 ± 7% perf-profile.children.cycles-pp.irqentry_enter 0.06 -0.0 0.05 ± 7% -0.0 0.05 -0.0 0.05 ± 7% perf-profile.children.cycles-pp.vmf_anon_prepare 0.05 +0.0 0.06 ± 8% +0.0 0.06 +0.0 0.06 ± 8% perf-profile.children.cycles-pp.write 0.05 +0.0 0.06 +0.0 0.06 +0.0 0.06 perf-profile.children.cycles-pp.perf_mmap__push 0.19 ± 2% +0.2 0.40 ± 6% +0.2 0.37 ± 7% +0.2 0.35 ± 4% perf-profile.children.cycles-pp.mem_cgroup_update_lru_size 2.72 +0.7 3.40 +0.7 3.38 +0.6 3.32 ± 2% perf-profile.children.cycles-pp.tlb_finish_mmu 24.44 +2.2 26.60 ± 5% +1.8 26.23 ± 4% +2.1 26.58 ± 3% perf-profile.children.cycles-pp.set_pte_range 22.47 +2.4 24.89 ± 5% +2.0 24.42 ± 5% +2.3 24.81 ± 4% perf-profile.children.cycles-pp.folio_add_lru_vma 22.31 +2.5 24.77 ± 5% +2.0 24.30 ± 5% +2.4 24.70 ± 4% perf-profile.children.cycles-pp.folio_batch_move_lru 22.59 +5.3 27.93 +5.2 27.84 +4.7 27.29 ± 2% perf-profile.children.cycles-pp.zap_pmd_range 22.59 +5.3 27.93 +5.3 27.84 +4.7 27.29 ± 2% perf-profile.children.cycles-pp.unmap_page_range 22.59 +5.3 27.93 +5.3 27.84 +4.7 27.29 ± 2% perf-profile.children.cycles-pp.zap_pte_range 22.59 +5.3 27.93 +5.3 27.84 +4.7 27.29 ± 2% perf-profile.children.cycles-pp.unmap_vmas 20.59 +5.8 26.34 +5.6 26.22 +5.1 25.64 ± 2% perf-profile.children.cycles-pp.tlb_flush_mmu 25.34 +6.0 31.36 +5.9 31.24 +5.3 30.64 ± 2% perf-profile.children.cycles-pp.__x64_sys_munmap 25.34 +6.0 31.36 +5.9 31.24 +5.3 30.64 ± 2% perf-profile.children.cycles-pp.__vm_munmap 25.34 +6.0 31.37 +5.9 31.25 +5.3 30.65 ± 2% perf-profile.children.cycles-pp.__munmap 25.33 +6.0 31.36 +5.9 31.24 +5.3 30.64 ± 2% perf-profile.children.cycles-pp.unmap_region 25.34 +6.0 31.37 +5.9 31.25 +5.3 30.65 ± 2% perf-profile.children.cycles-pp.do_vmi_align_munmap 25.34 +6.0 31.37 +5.9 31.25 +5.3 30.65 ± 2% perf-profile.children.cycles-pp.do_vmi_munmap 25.46 +6.0 31.49 +5.9 31.37 +5.3 30.77 ± 2% perf-profile.children.cycles-pp.do_syscall_64 25.46 +6.0 31.49 +5.9 31.37 +5.3 30.77 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 23.30 +6.4 29.74 +6.3 29.59 +5.7 28.96 ± 2% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages 23.29 +6.4 29.73 +6.3 29.58 +5.7 28.95 ± 2% perf-profile.children.cycles-pp.free_pages_and_swap_cache 23.00 +6.5 29.52 +6.4 29.38 +5.7 28.73 ± 2% perf-profile.children.cycles-pp.folios_put_refs 21.22 +6.7 27.93 +6.6 27.81 +5.9 27.13 ± 3% perf-profile.children.cycles-pp.__page_cache_release 40.79 +9.3 50.07 ± 2% +8.7 49.46 ± 2% +8.4 49.20 perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 40.78 +9.3 50.06 ± 2% +8.7 49.44 ± 2% +8.4 49.19 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 40.64 +9.3 49.96 ± 2% +8.7 49.34 ± 2% +8.4 49.09 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 21.23 -3.9 17.38 ± 3% -3.6 17.63 ± 3% -3.5 17.73 ± 2% perf-profile.self.cycles-pp.copy_page 4.99 -0.8 4.14 ± 2% -0.8 4.17 ± 2% -0.8 4.22 perf-profile.self.cycles-pp._raw_spin_lock 5.21 -0.8 4.45 ± 2% -0.7 4.49 ± 2% -0.7 4.53 perf-profile.self.cycles-pp.testcase 2.63 -0.4 2.24 ± 2% -0.4 2.26 ± 2% -0.3 2.29 perf-profile.self.cycles-pp.sync_regs 2.42 -0.4 2.04 ± 2% -0.3 2.08 ± 3% -0.3 2.09 ± 2% perf-profile.self.cycles-pp.native_irq_return_iret 0.58 ± 2% -0.2 0.42 ± 3% -0.2 0.40 ± 2% -0.1 0.43 ± 3% perf-profile.self.cycles-pp._compound_head 0.93 ± 2% -0.2 0.77 ± 5% -0.0 0.89 ± 2% -0.1 0.86 ± 3% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 1.00 -0.1 0.85 -0.2 0.85 ± 2% -0.2 0.83 ± 2% perf-profile.self.cycles-pp.___perf_sw_event 0.93 ± 2% -0.1 0.78 ± 3% -0.1 0.79 ± 4% -0.1 0.80 ± 3% perf-profile.self.cycles-pp.mem_cgroup_commit_charge 0.61 -0.1 0.48 ± 3% -0.1 0.50 ± 4% -0.1 0.50 ± 3% perf-profile.self.cycles-pp.__mod_node_page_state 0.51 -0.1 0.38 -0.1 0.38 ± 2% -0.1 0.40 ± 2% perf-profile.self.cycles-pp.free_pages_and_swap_cache 0.80 -0.1 0.70 ± 2% -0.1 0.69 ± 3% -0.1 0.70 ± 2% perf-profile.self.cycles-pp.__handle_mm_fault 0.61 ± 2% -0.1 0.51 -0.1 0.51 ± 2% -0.1 0.51 perf-profile.self.cycles-pp.lru_add_fn 0.47 -0.1 0.38 -0.1 0.38 -0.1 0.39 ± 2% perf-profile.self.cycles-pp.get_page_from_freelist 0.45 -0.1 0.37 ± 2% -0.1 0.37 ± 2% -0.1 0.38 perf-profile.self.cycles-pp.zap_present_ptes 0.44 -0.1 0.36 ± 4% -0.1 0.37 ± 4% -0.1 0.38 ± 3% perf-profile.self.cycles-pp.xas_load 0.65 -0.1 0.57 ± 2% -0.1 0.58 ± 2% -0.1 0.58 ± 2% perf-profile.self.cycles-pp.mas_walk 0.46 -0.1 0.39 ± 2% -0.1 0.40 ± 2% -0.1 0.41 ± 3% perf-profile.self.cycles-pp.handle_mm_fault 0.44 -0.1 0.38 ± 2% -0.1 0.38 ± 2% -0.1 0.39 perf-profile.self.cycles-pp.shmem_get_folio_gfp 0.52 ± 3% -0.1 0.46 ± 3% -0.0 0.51 ± 6% -0.1 0.46 ± 5% perf-profile.self.cycles-pp.__count_memcg_events 0.89 ± 2% -0.1 0.84 -0.0 0.88 ± 3% -0.1 0.83 ± 3% perf-profile.self.cycles-pp.__lruvec_stat_mod_folio 0.32 -0.1 0.26 -0.1 0.26 -0.0 0.27 perf-profile.self.cycles-pp.__page_cache_release 0.39 -0.1 0.34 ± 4% -0.0 0.35 ± 4% -0.0 0.35 ± 3% perf-profile.self.cycles-pp.filemap_get_entry 0.20 ± 4% -0.1 0.15 ± 5% -0.1 0.15 ± 3% -0.0 0.15 ± 2% perf-profile.self.cycles-pp.page_counter_uncharge 0.24 -0.0 0.19 -0.0 0.20 ± 2% -0.0 0.20 perf-profile.self.cycles-pp.folio_remove_rmap_ptes 0.34 ± 3% -0.0 0.29 ± 2% -0.0 0.29 ± 2% -0.0 0.29 ± 3% perf-profile.self.cycles-pp.__alloc_pages_noprof 0.27 -0.0 0.23 ± 3% -0.0 0.23 ± 3% -0.0 0.23 ± 2% perf-profile.self.cycles-pp.free_unref_folios 0.27 ± 3% -0.0 0.23 ± 2% -0.0 0.23 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.rmqueue 0.30 -0.0 0.26 -0.0 0.26 -0.0 0.26 perf-profile.self.cycles-pp.do_user_addr_fault 0.26 -0.0 0.22 ± 2% -0.0 0.22 ± 2% -0.0 0.22 ± 4% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.23 ± 3% -0.0 0.19 ± 4% -0.0 0.20 ± 5% -0.0 0.20 ± 3% perf-profile.self.cycles-pp.__pte_offset_map_lock 0.22 ± 3% -0.0 0.19 ± 2% -0.0 0.19 ± 3% -0.0 0.20 ± 4% perf-profile.self.cycles-pp.__pte_offset_map 0.29 -0.0 0.26 -0.0 0.25 ± 5% -0.0 0.25 perf-profile.self.cycles-pp.percpu_counter_add_batch 0.19 ± 2% -0.0 0.16 ± 2% -0.0 0.16 ± 4% -0.0 0.16 ± 4% perf-profile.self.cycles-pp.__mod_lruvec_state 0.21 ± 3% -0.0 0.17 ± 2% -0.0 0.18 ± 2% -0.0 0.19 ± 4% perf-profile.self.cycles-pp.finish_fault 0.25 -0.0 0.21 -0.0 0.21 ± 3% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.error_entry 0.24 -0.0 0.21 ± 3% -0.0 0.22 -0.0 0.22 perf-profile.self.cycles-pp.try_charge_memcg 0.21 ± 2% -0.0 0.18 ± 4% -0.0 0.18 ± 2% -0.0 0.19 ± 2% perf-profile.self.cycles-pp.folio_add_new_anon_rmap 0.22 -0.0 0.19 ± 2% -0.0 0.19 ± 2% -0.0 0.19 ± 2% perf-profile.self.cycles-pp.set_pte_range 0.24 ± 3% -0.0 0.21 ± 7% -0.0 0.20 ± 4% -0.0 0.21 ± 6% perf-profile.self.cycles-pp._raw_spin_trylock 0.06 -0.0 0.03 ± 81% -0.0 0.04 ± 50% -0.0 0.05 perf-profile.self.cycles-pp.vm_normal_page 0.23 ± 2% -0.0 0.20 ± 2% -0.0 0.20 ± 2% -0.0 0.20 ± 2% perf-profile.self.cycles-pp.do_fault 0.18 -0.0 0.15 ± 2% -0.0 0.15 ± 2% -0.0 0.15 ± 2% perf-profile.self.cycles-pp.free_unref_page_commit 0.15 ± 2% -0.0 0.12 -0.0 0.12 ± 6% -0.0 0.13 ± 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.13 ± 3% -0.0 0.10 ± 4% -0.0 0.11 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.__mem_cgroup_charge 0.18 -0.0 0.15 ± 2% -0.0 0.16 ± 3% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.down_read_trylock 0.11 ± 3% -0.0 0.08 ± 4% -0.0 0.09 -0.0 0.09 ± 5% perf-profile.self.cycles-pp.__mod_zone_page_state 0.19 ± 2% -0.0 0.17 ± 2% -0.0 0.16 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.folio_add_lru_vma 0.19 ± 2% -0.0 0.17 ± 8% -0.0 0.17 ± 3% -0.0 0.17 ± 3% perf-profile.self.cycles-pp.get_vma_policy 0.16 ± 2% -0.0 0.13 ± 3% -0.0 0.13 ± 5% -0.0 0.14 ± 2% perf-profile.self.cycles-pp.folio_unlock 0.12 ± 3% -0.0 0.10 ± 6% -0.0 0.10 ± 6% -0.0 0.10 perf-profile.self.cycles-pp.perf_exclude_event 0.19 ± 2% -0.0 0.17 -0.0 0.17 ± 2% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.asm_exc_page_fault 0.15 ± 2% -0.0 0.13 ± 3% -0.0 0.13 ± 3% -0.0 0.13 perf-profile.self.cycles-pp.folio_put 0.14 ± 2% -0.0 0.12 -0.0 0.12 ± 3% -0.0 0.12 perf-profile.self.cycles-pp.__rmqueue_pcplist 0.17 ± 2% -0.0 0.14 ± 5% -0.0 0.14 ± 2% -0.0 0.15 ± 3% perf-profile.self.cycles-pp.__perf_sw_event 0.10 ± 3% -0.0 0.08 ± 7% -0.0 0.08 ± 11% -0.0 0.09 ± 5% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode 0.15 ± 2% -0.0 0.13 -0.0 0.13 ± 3% -0.0 0.13 ± 3% perf-profile.self.cycles-pp.uncharge_folio 0.12 ± 3% -0.0 0.10 -0.0 0.10 ± 3% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.alloc_pages_mpol_noprof 0.11 ± 3% -0.0 0.09 ± 8% -0.0 0.09 ± 4% -0.0 0.09 perf-profile.self.cycles-pp.page_counter_try_charge 0.17 ± 4% -0.0 0.15 ± 4% -0.0 0.15 ± 2% -0.0 0.15 perf-profile.self.cycles-pp.lock_vma_under_rcu 0.17 ± 2% -0.0 0.15 ± 3% -0.0 0.16 ± 3% -0.0 0.15 ± 3% perf-profile.self.cycles-pp.up_read 0.11 -0.0 0.09 ± 4% -0.0 0.09 ± 5% -0.0 0.09 ± 5% perf-profile.self.cycles-pp.zap_pte_range 0.10 -0.0 0.08 ± 4% -0.0 0.08 ± 5% -0.0 0.08 ± 4% perf-profile.self.cycles-pp.get_pfnblock_flags_mask 0.16 ± 2% -0.0 0.15 ± 5% -0.0 0.15 ± 3% -0.0 0.15 ± 5% perf-profile.self.cycles-pp.shmem_fault 0.10 ± 4% -0.0 0.08 ± 4% -0.0 0.08 ± 4% -0.0 0.08 ± 4% perf-profile.self.cycles-pp.__do_fault 0.12 ± 3% -0.0 0.10 ± 7% -0.0 0.10 ± 7% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.exc_page_fault 0.12 ± 3% -0.0 0.10 ± 3% -0.0 0.10 ± 3% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.access_error 0.12 ± 4% -0.0 0.10 -0.0 0.10 -0.0 0.10 ± 3% perf-profile.self.cycles-pp.vma_alloc_folio_noprof 0.11 -0.0 0.10 ± 5% -0.0 0.09 ± 4% -0.0 0.09 ± 5% perf-profile.self.cycles-pp.perf_swevent_event 0.09 ± 5% -0.0 0.08 -0.0 0.08 -0.0 0.08 perf-profile.self.cycles-pp.policy_nodemask 0.09 -0.0 0.08 ± 13% -0.0 0.08 ± 5% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.xas_start 0.10 ± 4% -0.0 0.09 ± 4% -0.0 0.09 -0.0 0.09 ± 4% perf-profile.self.cycles-pp.pte_offset_map_nolock 0.08 ± 4% -0.0 0.07 -0.0 0.07 ± 5% -0.0 0.07 ± 5% perf-profile.self.cycles-pp.__irqentry_text_end 0.10 -0.0 0.09 -0.0 0.09 ± 5% -0.0 0.09 perf-profile.self.cycles-pp.folio_prealloc 0.09 -0.0 0.08 -0.0 0.08 -0.0 0.08 perf-profile.self.cycles-pp.__cond_resched 0.38 ± 2% +0.1 0.47 ± 2% +0.1 0.46 +0.1 0.44 perf-profile.self.cycles-pp.folio_batch_move_lru 0.18 ± 2% +0.2 0.38 ± 6% +0.2 0.35 ± 7% +0.2 0.34 ± 4% perf-profile.self.cycles-pp.mem_cgroup_update_lru_size 40.64 +9.3 49.96 ± 2% +8.7 49.34 ± 2% +8.4 49.08 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath [2] ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/page_fault2/will-it-scale 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929 ---------------- --------------------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev %change %stddev \ | \ | \ | \ 17488267 -3.4% 16886777 -6.0% 16433590 -5.6% 16505101 will-it-scale.224.processes 78072 -3.4% 75386 -6.0% 73363 -5.6% 73683 will-it-scale.per_process_ops 17488267 -3.4% 16886777 -6.0% 16433590 -5.6% 16505101 will-it-scale.workload 5.296e+09 -3.4% 5.116e+09 -6.0% 4.977e+09 -5.6% 4.998e+09 proc-vmstat.numa_hit 5.291e+09 -3.4% 5.111e+09 -6.0% 4.973e+09 -5.6% 4.995e+09 proc-vmstat.numa_local 5.285e+09 -3.4% 5.105e+09 -6.0% 4.968e+09 -5.6% 4.989e+09 proc-vmstat.pgalloc_normal 5.264e+09 -3.4% 5.084e+09 -6.0% 4.947e+09 -5.6% 4.969e+09 proc-vmstat.pgfault 5.283e+09 -3.4% 5.104e+09 -6.0% 4.967e+09 -5.6% 4.989e+09 proc-vmstat.pgfree 3067 +20.1% 3685 ± 8% +19.5% 3665 ± 8% -0.4% 3056 sched_debug.cfs_rq:/.load.min 0.07 ± 19% -12.8% 0.06 ± 14% -31.1% 0.05 ± 14% -8.8% 0.06 ± 14% sched_debug.cfs_rq:/.nr_running.stddev 1727628 ± 22% +2.3% 1767491 ± 32% +8.6% 1876362 ± 25% -24.1% 1310525 ± 7% sched_debug.cpu.avg_idle.max 6058 ± 41% +71.5% 10389 ±118% +96.1% 11878 ± 66% -47.9% 3156 ± 43% sched_debug.cpu.max_idle_balance_cost.stddev 17928 ± 11% +133.0% 41768 ± 36% +39.4% 24992 ± 57% +6.3% 19052 ± 15% sched_debug.cpu.nr_switches.max 2270 ± 6% +70.6% 3874 ± 28% +21.4% 2756 ± 37% +0.5% 2282 ± 4% sched_debug.cpu.nr_switches.stddev 4369255 -9.9% 3934784 ± 8% -3.0% 4238563 ± 6% -3.0% 4239325 ± 7% numa-vmstat.node0.nr_file_pages 20526 ± 3% -25.8% 15236 ± 22% -11.5% 18161 ± 16% -6.4% 19205 ± 16% numa-vmstat.node0.nr_mapped 35617 ± 5% -27.8% 25727 ± 20% -12.1% 31303 ± 13% -9.1% 32375 ± 21% numa-vmstat.node0.nr_slab_reclaimable 65089 ± 16% -8.1% 59820 ± 19% -19.8% 52215 ± 3% -18.3% 53200 ± 3% numa-vmstat.node0.nr_slab_unreclaimable 738801 ± 3% -59.2% 301176 ±113% -17.7% 608173 ± 48% -18.0% 605778 ± 49% numa-vmstat.node0.nr_unevictable 738801 ± 3% -59.2% 301176 ±113% -17.7% 608173 ± 48% -18.0% 605778 ± 49% numa-vmstat.node0.nr_zone_unevictable 4024866 +10.9% 4465333 ± 7% +3.2% 4152344 ± 7% +3.4% 4163009 ± 7% numa-vmstat.node1.nr_file_pages 19132 ± 10% +51.8% 29044 ± 18% +22.2% 23371 ± 18% +17.3% 22446 ± 30% numa-vmstat.node1.nr_slab_reclaimable 45845 ± 24% +12.0% 51337 ± 23% +28.7% 58982 ± 2% +26.8% 58122 ± 3% numa-vmstat.node1.nr_slab_unreclaimable 30816 ± 81% +1420.1% 468441 ± 72% +423.9% 161444 ±184% +431.7% 163839 ±184% numa-vmstat.node1.nr_unevictable 30816 ± 81% +1420.1% 468441 ± 72% +423.9% 161444 ±184% +431.7% 163839 ±184% numa-vmstat.node1.nr_zone_unevictable 142458 ± 5% -27.7% 102968 ± 20% -12.1% 125181 ± 13% -9.1% 129506 ± 21% numa-meminfo.node0.KReclaimable 81201 ± 3% -25.4% 60607 ± 21% -11.8% 71622 ± 16% -6.6% 75868 ± 16% numa-meminfo.node0.Mapped 142458 ± 5% -27.7% 102968 ± 20% -12.1% 125181 ± 13% -9.1% 129506 ± 21% numa-meminfo.node0.SReclaimable 260359 ± 16% -8.1% 239286 ± 19% -19.8% 208866 ± 3% -18.3% 212806 ± 3% numa-meminfo.node0.SUnreclaim 402817 ± 12% -15.0% 342254 ± 18% -17.1% 334047 ± 6% -15.0% 342313 ± 9% numa-meminfo.node0.Slab 2955204 ± 3% -59.2% 1204704 ±113% -17.7% 2432692 ± 48% -18.0% 2423114 ± 49% numa-meminfo.node0.Unevictable 16107004 +11.0% 17872044 ± 7% +3.0% 16587232 ± 7% +3.3% 16635393 ± 7% numa-meminfo.node1.FilePages 76509 ± 10% +51.9% 116237 ± 18% +22.1% 93450 ± 18% +17.4% 89791 ± 30% numa-meminfo.node1.KReclaimable 76509 ± 10% +51.9% 116237 ± 18% +22.1% 93450 ± 18% +17.4% 89791 ± 30% numa-meminfo.node1.SReclaimable 183385 ± 24% +12.0% 205353 ± 23% +28.7% 235933 ± 2% +26.8% 232488 ± 3% numa-meminfo.node1.SUnreclaim 259894 ± 20% +23.7% 321590 ± 19% +26.7% 329384 ± 6% +24.0% 322280 ± 10% numa-meminfo.node1.Slab 123266 ± 81% +1420.1% 1873767 ± 72% +423.9% 645778 ±184% +431.7% 655357 ±184% numa-meminfo.node1.Unevictable 20.16 -1.4% 19.89 -2.9% 19.57 -2.9% 19.58 perf-stat.i.MPKI 2.501e+10 -1.7% 2.46e+10 -2.6% 2.436e+10 -2.4% 2.44e+10 perf-stat.i.branch-instructions 18042153 -0.3% 17981852 -1.9% 17692517 -2.8% 17539874 perf-stat.i.branch-misses 2.382e+09 -3.3% 2.304e+09 -5.8% 2.244e+09 -5.6% 2.249e+09 perf-stat.i.cache-misses 2.561e+09 -3.2% 2.479e+09 -5.5% 2.42e+09 -5.3% 2.424e+09 perf-stat.i.cache-references 5.49 +1.9% 5.59 +3.1% 5.66 +2.8% 5.64 perf-stat.i.cpi 274.25 +2.9% 282.07 +5.7% 289.98 +5.4% 289.07 perf-stat.i.cycles-between-cache-misses 1.177e+11 -1.9% 1.155e+11 -2.9% 1.143e+11 -2.7% 1.145e+11 perf-stat.i.instructions 0.19 -1.9% 0.18 -3.0% 0.18 -2.7% 0.18 perf-stat.i.ipc 155.11 -3.3% 150.03 -5.9% 145.89 -5.5% 146.59 perf-stat.i.metric.K/sec 17405977 -3.4% 16819060 -5.9% 16378605 -5.5% 16441964 perf-stat.i.minor-faults 17405978 -3.4% 16819060 -5.9% 16378606 -5.5% 16441964 perf-stat.i.page-faults 4.41 ± 50% +27.3% 5.61 +3.1% 4.54 ± 50% +28.5% 5.66 perf-stat.overall.cpi 217.50 ± 50% +29.2% 280.93 +6.3% 231.09 ± 50% +32.4% 287.87 perf-stat.overall.cycles-between-cache-misses 1623235 ± 50% +26.9% 2060668 +3.4% 1677714 ± 50% +29.0% 2093187 perf-stat.overall.path-length 5.48 -0.3 5.15 -0.4 5.10 -0.4 5.11 perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 57.55 -0.3 57.24 -0.4 57.15 -0.3 57.20 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 56.14 -0.2 55.94 -0.3 55.86 -0.2 55.90 perf-profile.calltrace.cycles-pp.testcase 1.86 -0.1 1.73 ± 2% -0.1 1.72 -0.2 1.71 perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.77 -0.1 1.64 ± 2% -0.1 1.63 -0.1 1.63 perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault 1.17 -0.1 1.10 -0.1 1.09 -0.1 1.10 perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 52.55 -0.1 52.49 -0.1 52.42 -0.1 52.47 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 52.62 -0.1 52.56 -0.1 52.48 -0.1 52.54 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 0.96 -0.0 0.91 -0.0 0.91 -0.0 0.91 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase 0.71 -0.0 0.68 -0.0 0.67 -0.0 0.67 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault 51.87 -0.0 51.84 -0.1 51.76 -0.0 51.82 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.60 -0.0 0.57 -0.0 0.56 -0.0 0.57 perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault 4.87 +0.0 4.90 +0.0 4.91 +0.0 4.91 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 4.85 +0.0 4.88 +0.0 4.90 +0.0 4.90 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region 4.86 +0.0 4.90 +0.0 4.91 +0.0 4.91 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap 4.86 +0.0 4.89 +0.1 4.91 +0.0 4.91 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap 4.77 +0.0 4.80 +0.1 4.83 +0.1 4.82 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu 37.74 +0.2 37.98 +0.3 38.04 +0.3 38.01 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 37.74 +0.2 37.98 +0.3 38.04 +0.3 38.01 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 37.74 +0.2 37.98 +0.3 38.04 +0.3 38.01 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap 37.73 +0.2 37.97 +0.3 38.04 +0.3 38.01 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 37.27 +0.3 37.53 +0.3 37.60 +0.3 37.57 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range 37.28 +0.3 37.54 +0.3 37.61 +0.3 37.58 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 37.28 +0.3 37.54 +0.3 37.61 +0.3 37.58 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range 36.72 +0.3 36.98 +0.4 37.08 +0.3 37.04 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu 37.15 +0.3 37.41 +0.3 37.49 +0.3 37.46 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range 42.65 +0.3 42.92 +0.4 43.00 +0.3 42.97 perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 42.65 +0.3 42.92 +0.4 43.00 +0.3 42.97 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 42.65 +0.3 42.92 +0.4 43.00 +0.3 42.97 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 42.65 +0.3 42.92 +0.4 43.00 +0.3 42.97 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 42.65 +0.3 42.92 +0.4 43.00 +0.3 42.97 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 42.65 +0.3 42.92 +0.4 43.00 +0.3 42.97 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 42.65 +0.3 42.92 +0.4 43.00 +0.3 42.97 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 42.65 +0.3 42.92 +0.4 43.00 +0.3 42.97 perf-profile.calltrace.cycles-pp.__munmap 41.26 +0.3 41.56 +0.4 41.68 +0.4 41.64 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages 41.26 +0.3 41.56 +0.4 41.68 +0.4 41.63 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache 41.23 +0.3 41.53 +0.4 41.66 +0.4 41.61 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs 43.64 +0.5 44.09 +0.4 44.05 +0.5 44.12 perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 41.57 +0.6 42.17 +0.6 42.14 +0.6 42.22 perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 40.93 +0.6 41.56 +0.6 41.53 +0.7 41.59 perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault 40.84 +0.6 41.48 +0.6 41.44 +0.7 41.50 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault 40.16 +0.7 40.83 +0.6 40.80 +0.7 40.87 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma 40.19 +0.7 40.85 +0.6 40.83 +0.7 40.89 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range 40.19 +0.7 40.85 +0.6 40.83 +0.7 40.89 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 5.49 -0.3 5.16 -0.4 5.12 -0.4 5.12 perf-profile.children.cycles-pp.copy_page 57.05 -0.3 56.79 -0.4 56.70 -0.3 56.75 perf-profile.children.cycles-pp.testcase 55.66 -0.2 55.44 -0.3 55.36 -0.2 55.41 perf-profile.children.cycles-pp.asm_exc_page_fault 1.88 -0.1 1.75 ± 2% -0.1 1.74 -0.2 1.73 perf-profile.children.cycles-pp.__pte_offset_map_lock 1.79 -0.1 1.66 ± 2% -0.1 1.64 -0.1 1.64 perf-profile.children.cycles-pp._raw_spin_lock 1.19 -0.1 1.11 -0.1 1.11 -0.1 1.11 perf-profile.children.cycles-pp.folio_prealloc 52.64 -0.1 52.57 -0.1 52.49 -0.1 52.55 perf-profile.children.cycles-pp.exc_page_fault 0.96 -0.1 0.91 -0.1 0.91 -0.1 0.91 perf-profile.children.cycles-pp.sync_regs 52.57 -0.0 52.52 -0.1 52.44 -0.1 52.50 perf-profile.children.cycles-pp.do_user_addr_fault 0.73 -0.0 0.69 -0.0 0.68 -0.0 0.68 perf-profile.children.cycles-pp.vma_alloc_folio_noprof 0.63 -0.0 0.60 -0.0 0.59 -0.0 0.59 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof 0.55 -0.0 0.52 -0.0 0.51 -0.0 0.51 perf-profile.children.cycles-pp.__alloc_pages_noprof 51.89 -0.0 51.86 -0.1 51.78 -0.0 51.84 perf-profile.children.cycles-pp.handle_mm_fault 1.02 -0.0 0.99 -0.0 0.99 -0.0 0.98 perf-profile.children.cycles-pp.native_irq_return_iret 0.46 -0.0 0.43 ± 2% -0.0 0.44 -0.0 0.43 perf-profile.children.cycles-pp.shmem_fault 0.39 -0.0 0.36 ± 2% -0.0 0.36 -0.0 0.38 perf-profile.children.cycles-pp.__mem_cgroup_charge 0.51 -0.0 0.48 ± 2% -0.0 0.49 -0.0 0.48 perf-profile.children.cycles-pp.__do_fault 0.38 -0.0 0.36 -0.0 0.35 -0.0 0.36 perf-profile.children.cycles-pp.lru_add_fn 0.51 -0.0 0.49 -0.0 0.50 -0.0 0.48 perf-profile.children.cycles-pp.shmem_get_folio_gfp 0.36 -0.0 0.34 -0.0 0.34 -0.0 0.34 perf-profile.children.cycles-pp.___perf_sw_event 0.42 -0.0 0.40 ± 2% -0.0 0.40 -0.0 0.39 perf-profile.children.cycles-pp.__perf_sw_event 0.41 -0.0 0.39 -0.0 0.39 -0.0 0.39 perf-profile.children.cycles-pp.get_page_from_freelist 0.25 ± 2% -0.0 0.23 -0.0 0.24 ± 2% -0.0 0.23 perf-profile.children.cycles-pp.filemap_get_entry 0.42 -0.0 0.41 -0.0 0.40 -0.0 0.40 perf-profile.children.cycles-pp.zap_present_ptes 0.14 ± 2% -0.0 0.12 ± 3% -0.0 0.12 ± 3% -0.0 0.13 perf-profile.children.cycles-pp.xas_load 0.21 ± 2% -0.0 0.20 -0.0 0.19 ± 2% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.__mod_node_page_state 0.26 -0.0 0.25 -0.0 0.24 -0.0 0.24 perf-profile.children.cycles-pp.__mod_lruvec_state 0.27 -0.0 0.26 ± 2% -0.0 0.26 -0.0 0.26 perf-profile.children.cycles-pp.lock_vma_under_rcu 0.11 -0.0 0.10 -0.0 0.09 ± 5% -0.0 0.10 perf-profile.children.cycles-pp._compound_head 0.23 ± 2% -0.0 0.22 ± 2% -0.0 0.22 -0.0 0.21 perf-profile.children.cycles-pp.rmqueue 0.09 -0.0 0.08 -0.0 0.08 -0.0 0.08 perf-profile.children.cycles-pp.scheduler_tick 0.12 -0.0 0.11 -0.0 0.11 -0.0 0.11 perf-profile.children.cycles-pp.tick_nohz_handler 0.21 -0.0 0.20 -0.0 0.20 -0.0 0.19 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.16 -0.0 0.15 ± 2% -0.0 0.15 -0.0 0.15 perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.11 -0.0 0.10 ± 3% -0.0 0.10 -0.0 0.10 perf-profile.children.cycles-pp.update_process_times 0.14 ± 3% -0.0 0.14 ± 3% -0.0 0.13 -0.0 0.13 ± 3% perf-profile.children.cycles-pp.try_charge_memcg 0.15 -0.0 0.14 ± 2% -0.0 0.14 ± 2% -0.0 0.14 perf-profile.children.cycles-pp.hrtimer_interrupt 0.06 -0.0 0.06 ± 8% -0.0 0.05 ± 7% -0.0 0.05 perf-profile.children.cycles-pp.task_tick_fair 0.16 ± 2% -0.0 0.16 -0.0 0.15 -0.0 0.15 ± 2% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios 0.07 +0.0 0.08 ± 6% +0.0 0.08 +0.0 0.08 perf-profile.children.cycles-pp.folio_add_lru 4.88 +0.0 4.91 +0.0 4.93 +0.0 4.93 perf-profile.children.cycles-pp.tlb_finish_mmu 37.74 +0.2 37.98 +0.3 38.04 +0.3 38.01 perf-profile.children.cycles-pp.unmap_page_range 37.74 +0.2 37.98 +0.3 38.04 +0.3 38.01 perf-profile.children.cycles-pp.unmap_vmas 37.74 +0.2 37.98 +0.3 38.04 +0.3 38.01 perf-profile.children.cycles-pp.zap_pmd_range 37.74 +0.2 37.98 +0.3 38.04 +0.3 38.01 perf-profile.children.cycles-pp.zap_pte_range 37.28 +0.3 37.54 +0.3 37.61 +0.3 37.58 perf-profile.children.cycles-pp.tlb_flush_mmu 42.65 +0.3 42.92 +0.3 43.00 +0.3 42.97 perf-profile.children.cycles-pp.__vm_munmap 42.65 +0.3 42.92 +0.3 43.00 +0.3 42.97 perf-profile.children.cycles-pp.__x64_sys_munmap 42.65 +0.3 42.92 +0.4 43.00 +0.3 42.97 perf-profile.children.cycles-pp.__munmap 42.65 +0.3 42.92 +0.4 43.00 +0.3 42.97 perf-profile.children.cycles-pp.unmap_region 42.65 +0.3 42.93 +0.4 43.01 +0.3 42.98 perf-profile.children.cycles-pp.do_vmi_align_munmap 42.65 +0.3 42.93 +0.4 43.01 +0.3 42.98 perf-profile.children.cycles-pp.do_vmi_munmap 42.86 +0.3 43.14 +0.4 43.22 +0.3 43.18 perf-profile.children.cycles-pp.do_syscall_64 42.86 +0.3 43.14 +0.4 43.22 +0.3 43.19 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 42.15 +0.3 42.44 +0.4 42.54 +0.3 42.50 perf-profile.children.cycles-pp.free_pages_and_swap_cache 42.12 +0.3 42.41 +0.4 42.50 +0.3 42.46 perf-profile.children.cycles-pp.folios_put_refs 42.15 +0.3 42.45 +0.4 42.54 +0.3 42.50 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages 41.51 +0.3 41.80 +0.4 41.93 +0.4 41.89 perf-profile.children.cycles-pp.__page_cache_release 43.66 +0.5 44.12 +0.4 44.08 +0.5 44.15 perf-profile.children.cycles-pp.finish_fault 41.59 +0.6 42.19 +0.6 42.16 +0.6 42.24 perf-profile.children.cycles-pp.set_pte_range 40.94 +0.6 41.57 +0.6 41.53 +0.7 41.59 perf-profile.children.cycles-pp.folio_add_lru_vma 40.99 +0.6 41.63 +0.6 41.60 +0.7 41.66 perf-profile.children.cycles-pp.folio_batch_move_lru 81.57 +1.0 82.53 +1.1 82.62 +1.1 82.65 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 81.60 +1.0 82.56 +1.1 82.66 +1.1 82.68 perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 81.59 +1.0 82.56 +1.1 82.66 +1.1 82.68 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 5.47 -0.3 5.14 -0.4 5.10 -0.4 5.10 perf-profile.self.cycles-pp.copy_page 1.77 -0.1 1.65 ± 2% -0.1 1.63 -0.1 1.63 perf-profile.self.cycles-pp._raw_spin_lock 2.19 -0.1 2.08 -0.1 2.08 -0.1 2.07 perf-profile.self.cycles-pp.testcase 0.96 -0.0 0.91 -0.0 0.91 -0.0 0.91 perf-profile.self.cycles-pp.sync_regs 1.02 -0.0 0.99 -0.0 0.99 -0.0 0.98 perf-profile.self.cycles-pp.native_irq_return_iret 0.28 ± 2% -0.0 0.26 ± 2% +0.0 0.29 ± 2% +0.0 0.30 ± 2% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.19 ± 2% -0.0 0.17 ± 2% -0.0 0.17 ± 2% -0.0 0.17 perf-profile.self.cycles-pp.get_page_from_freelist 0.20 -0.0 0.19 -0.0 0.18 ± 2% -0.0 0.19 ± 2% perf-profile.self.cycles-pp.__mod_node_page_state 0.28 -0.0 0.27 ± 3% -0.0 0.27 -0.0 0.26 perf-profile.self.cycles-pp.___perf_sw_event 0.16 ± 2% -0.0 0.15 ± 2% -0.0 0.15 -0.0 0.15 ± 2% perf-profile.self.cycles-pp.handle_mm_fault 0.06 -0.0 0.05 -0.0 0.05 -0.0 0.05 perf-profile.self.cycles-pp.down_read_trylock 0.09 -0.0 0.08 -0.0 0.08 -0.0 0.08 perf-profile.self.cycles-pp.folio_add_new_anon_rmap 0.11 -0.0 0.10 ± 3% -0.0 0.10 -0.0 0.11 ± 3% perf-profile.self.cycles-pp.xas_load 0.16 -0.0 0.15 ± 2% -0.0 0.15 ± 2% -0.0 0.15 perf-profile.self.cycles-pp.mas_walk 0.12 ± 4% -0.0 0.11 ± 3% +0.0 0.12 -0.0 0.10 perf-profile.self.cycles-pp.filemap_get_entry 0.11 ± 3% -0.0 0.11 ± 4% -0.0 0.10 ± 4% -0.0 0.10 perf-profile.self.cycles-pp.free_pages_and_swap_cache 0.11 -0.0 0.11 ± 4% -0.0 0.10 -0.0 0.10 ± 4% perf-profile.self.cycles-pp.error_entry 0.09 ± 4% -0.0 0.09 -0.0 0.08 -0.0 0.09 ± 4% perf-profile.self.cycles-pp._compound_head 0.21 +0.0 0.21 -0.0 0.20 -0.0 0.20 perf-profile.self.cycles-pp.folios_put_refs 0.12 +0.0 0.12 -0.0 0.11 +0.0 0.12 perf-profile.self.cycles-pp.do_fault 0.00 +0.0 0.00 +0.1 0.05 +0.0 0.00 perf-profile.self.cycles-pp.folio_unlock 81.57 +1.0 82.53 +1.1 82.62 +1.1 82.65 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath > > Shakeel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-21 2:43 ` Oliver Sang @ 2024-05-22 4:18 ` Shakeel Butt 2024-05-23 7:48 ` Oliver Sang 0 siblings, 1 reply; 15+ messages in thread From: Shakeel Butt @ 2024-05-22 4:18 UTC (permalink / raw) To: Oliver Sang Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin On Tue, May 21, 2024 at 10:43:16AM +0800, Oliver Sang wrote: > hi, Shakeel, > [...] > > we reported regression on a 2-node Skylake server. so I found a 1-node Skylake > desktop (we don't have 1 node server) to check. > Please try the following patch on both single node and dual node machines: From 00a84b489b9e18abd1b8ec575ea31afacaf0734b Mon Sep 17 00:00:00 2001 From: Shakeel Butt <shakeel.butt@linux.dev> Date: Tue, 21 May 2024 20:27:11 -0700 Subject: [PATCH] memcg: rearrage fields of mem_cgroup_per_node At the moment the fields of mem_cgroup_per_node which get read on the performance critical path share the cacheline with the fields which might get updated. This cause contention of that cacheline for concurrent readers. Let's move all the read only pointers at the start of the struct, followed by memcg-v1 only fields and at the end fields which get updated often. Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> --- include/linux/memcontrol.h | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 030d34e9d117..16efd9737be9 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -96,23 +96,25 @@ struct mem_cgroup_reclaim_iter { * per-node information in memory controller. */ struct mem_cgroup_per_node { - struct lruvec lruvec; + /* Keep the read-only fields at the start */ + struct mem_cgroup *memcg; /* Back pointer, we cannot */ + /* use container_of */ struct lruvec_stats_percpu __percpu *lruvec_stats_percpu; struct lruvec_stats *lruvec_stats; - - unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; - - struct mem_cgroup_reclaim_iter iter; - struct shrinker_info __rcu *shrinker_info; + /* memcg-v1 only stuff in middle */ + struct rb_node tree_node; /* RB tree node */ unsigned long usage_in_excess;/* Set to the value by which */ /* the soft limit is exceeded*/ bool on_tree; - struct mem_cgroup *memcg; /* Back pointer, we cannot */ - /* use container_of */ + + /* Fields which get updated often at the end. */ + struct lruvec lruvec; + unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; + struct mem_cgroup_reclaim_iter iter; }; struct mem_cgroup_threshold { -- 2.43.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-22 4:18 ` Shakeel Butt @ 2024-05-23 7:48 ` Oliver Sang 2024-05-23 16:47 ` Shakeel Butt 0 siblings, 1 reply; 15+ messages in thread From: Oliver Sang @ 2024-05-23 7:48 UTC (permalink / raw) To: Shakeel Butt Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin, oliver.sang [-- Attachment #1: Type: text/plain, Size: 7554 bytes --] hi, Shakeel, On Tue, May 21, 2024 at 09:18:19PM -0700, Shakeel Butt wrote: > On Tue, May 21, 2024 at 10:43:16AM +0800, Oliver Sang wrote: > > hi, Shakeel, > > > [...] > > > > we reported regression on a 2-node Skylake server. so I found a 1-node Skylake > > desktop (we don't have 1 node server) to check. > > > > Please try the following patch on both single node and dual node > machines: the regression is partially recovered by applying your patch. (but one even more regression case as below) details: since you mentioned the whole patch-set behavior last time, I applied the patch upon a94032b35e5f9 memcg: use proper type for mod_memcg_state below fd2296741e2686ed6ecd05187e4 = a94032b35e5f9 + patch for the regression in our original report, test machine is: model: Skylake nr_node: 2 nr_cpu: 104 memory: 192G regression partially recovered: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 91713 -13.0% 79833 -4.5% 87614 will-it-scale.per_process_ops detail data is in part [1] in attachment. in later threads, we also reported similar regression on other platforms. on: model: Ice Lake nr_node: 2 nr_cpu: 64 memory: 256G brand: Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz regression partially recovered but not so obvious as above: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 240373 -12.9% 209394 -10.1% 215996 will-it-scale.per_process_ops detail data is in part [2] in attachment. on: model: Sapphire Rapids nr_node: 2 nr_cpu: 224 memory: 512G brand: Intel(R) Xeon(R) Platinum 8480CTDX regression NOT recovered, even a little worse: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 78072 -5.6% 73683 -6.5% 72975 will-it-scale.per_process_ops detail data is in part [3] in attachment. for single node machine, we reported last time no regression on: model: Skylake nr_node: 1 nr_cpu: 36 memory: 32G brand: Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz we confirmed it's not impacted by this new patch, either: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-d08/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 136040 -0.1% 135881 -0.1% 135953 will-it-scale.per_process_ops if you need detail data for this comparison, please let us know. BTW, after last update, we found another single node machine which can reproduce the regression in our original report: model: Cascade Lake nr_node: 1 nr_cpu: 36 memory: 128G brand: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz the regression is also partially recovered now: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-csl-d02/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 187483 -19.4% 151162 -12.1% 164714 will-it-scale.per_process_ops detail data is in part [4] in attachment. > > > From 00a84b489b9e18abd1b8ec575ea31afacaf0734b Mon Sep 17 00:00:00 2001 > From: Shakeel Butt <shakeel.butt@linux.dev> > Date: Tue, 21 May 2024 20:27:11 -0700 > Subject: [PATCH] memcg: rearrage fields of mem_cgroup_per_node > > At the moment the fields of mem_cgroup_per_node which get read on the > performance critical path share the cacheline with the fields which > might get updated. This cause contention of that cacheline for > concurrent readers. Let's move all the read only pointers at the start > of the struct, followed by memcg-v1 only fields and at the end fields > which get updated often. > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> > --- > include/linux/memcontrol.h | 18 ++++++++++-------- > 1 file changed, 10 insertions(+), 8 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 030d34e9d117..16efd9737be9 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -96,23 +96,25 @@ struct mem_cgroup_reclaim_iter { > * per-node information in memory controller. > */ > struct mem_cgroup_per_node { > - struct lruvec lruvec; > + /* Keep the read-only fields at the start */ > + struct mem_cgroup *memcg; /* Back pointer, we cannot */ > + /* use container_of */ > > struct lruvec_stats_percpu __percpu *lruvec_stats_percpu; > struct lruvec_stats *lruvec_stats; > - > - unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; > - > - struct mem_cgroup_reclaim_iter iter; > - > struct shrinker_info __rcu *shrinker_info; > > + /* memcg-v1 only stuff in middle */ > + > struct rb_node tree_node; /* RB tree node */ > unsigned long usage_in_excess;/* Set to the value by which */ > /* the soft limit is exceeded*/ > bool on_tree; > - struct mem_cgroup *memcg; /* Back pointer, we cannot */ > - /* use container_of */ > + > + /* Fields which get updated often at the end. */ > + struct lruvec lruvec; > + unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; > + struct mem_cgroup_reclaim_iter iter; > }; > > struct mem_cgroup_threshold { > -- > 2.43.0 > > [-- Attachment #2: detail-comparison --] [-- Type: text/plain, Size: 136381 bytes --] [1] ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 1.646e+08 +7.6% 1.772e+08 ± 14% +34.5% 2.215e+08 ± 20% cpuidle..time 41.99 ± 16% -24.4% 31.73 ± 16% -25.2% 31.39 ± 12% sched_debug.cfs_rq:/.removed.load_avg.stddev 34.17 -0.9% 33.87 -0.2% 34.12 boot-time.boot 3182 -1.0% 3151 -0.2% 3176 boot-time.idle 21099 ± 5% -16.5% 17627 ± 2% -7.4% 19540 ± 3% perf-c2c.DRAM.local 4025 ± 2% +31.3% 5285 ± 4% -14.7% 3432 ± 2% perf-c2c.HITM.local 0.44 ± 24% +0.1 0.58 +0.2 0.65 ± 20% mpstat.cpu.all.idle% 0.01 ± 23% +0.0 0.01 ± 9% +0.0 0.02 ± 6% mpstat.cpu.all.soft% 7.14 -0.9 6.23 -0.3 6.79 mpstat.cpu.all.usr% 9538291 -13.0% 8302761 -4.5% 9111939 will-it-scale.104.processes 91713 -13.0% 79833 -4.5% 87614 will-it-scale.per_process_ops 9538291 -13.0% 8302761 -4.5% 9111939 will-it-scale.workload 1.438e+09 -12.9% 1.253e+09 -4.2% 1.378e+09 numa-numastat.node0.local_node 1.44e+09 -12.9% 1.254e+09 -4.2% 1.38e+09 numa-numastat.node0.numa_hit 1.453e+09 -13.1% 1.263e+09 -4.9% 1.382e+09 numa-numastat.node1.local_node 1.454e+09 -12.9% 1.265e+09 -4.8% 1.384e+09 numa-numastat.node1.numa_hit 1.44e+09 -12.9% 1.254e+09 -4.2% 1.38e+09 numa-vmstat.node0.numa_hit 1.438e+09 -12.9% 1.253e+09 -4.2% 1.378e+09 numa-vmstat.node0.numa_local 1.454e+09 -12.9% 1.265e+09 -4.8% 1.384e+09 numa-vmstat.node1.numa_hit 1.453e+09 -13.1% 1.263e+09 -4.9% 1.382e+09 numa-vmstat.node1.numa_local 2.894e+09 -12.9% 2.52e+09 -4.5% 2.764e+09 proc-vmstat.numa_hit 2.891e+09 -13.0% 2.516e+09 -4.5% 2.76e+09 proc-vmstat.numa_local 2.88e+09 -12.9% 2.509e+09 -4.5% 2.752e+09 proc-vmstat.pgalloc_normal 2.869e+09 -12.9% 2.499e+09 -4.5% 2.741e+09 proc-vmstat.pgfault 2.88e+09 -12.9% 2.509e+09 -4.5% 2.751e+09 proc-vmstat.pgfree 17.51 -3.2% 16.95 -1.5% 17.23 perf-stat.i.MPKI 9.457e+09 -9.7% 8.542e+09 -3.1% 9.165e+09 perf-stat.i.branch-instructions 45022022 -9.0% 40951240 -2.6% 43850606 perf-stat.i.branch-misses 84.38 -5.7 78.65 -3.2 81.15 perf-stat.i.cache-miss-rate% 8.353e+08 -12.9% 7.271e+08 -4.6% 7.969e+08 perf-stat.i.cache-misses 9.877e+08 -6.6% 9.224e+08 -0.8% 9.799e+08 perf-stat.i.cache-references 6.06 +11.3% 6.75 +3.2% 6.26 perf-stat.i.cpi 136.25 -1.1% 134.73 -0.1% 136.12 perf-stat.i.cpu-migrations 348.56 +14.9% 400.65 +4.9% 365.77 perf-stat.i.cycles-between-cache-misses 4.763e+10 -10.1% 4.285e+10 -3.1% 4.617e+10 perf-stat.i.instructions 0.17 -9.9% 0.15 -3.2% 0.16 perf-stat.i.ipc 182.56 -12.9% 158.99 -4.5% 174.33 perf-stat.i.metric.K/sec 9494393 -12.9% 8270117 -4.5% 9066901 perf-stat.i.minor-faults 9494393 -12.9% 8270117 -4.5% 9066902 perf-stat.i.page-faults 17.54 -3.2% 16.98 -1.6% 17.27 perf-stat.overall.MPKI 84.57 -5.7 78.84 -3.2 81.34 perf-stat.overall.cache-miss-rate% 6.07 +11.2% 6.76 +3.2% 6.27 perf-stat.overall.cpi 346.33 +14.9% 397.97 +4.8% 362.97 perf-stat.overall.cycles-between-cache-misses 0.16 -10.1% 0.15 -3.1% 0.16 perf-stat.overall.ipc 1503802 +3.5% 1555989 +1.7% 1528933 perf-stat.overall.path-length 9.424e+09 -9.7% 8.509e+09 -3.1% 9.133e+09 perf-stat.ps.branch-instructions 44739120 -9.2% 40645392 -2.6% 43568159 perf-stat.ps.branch-misses 8.326e+08 -13.0% 7.247e+08 -4.6% 7.945e+08 perf-stat.ps.cache-misses 9.846e+08 -6.6% 9.193e+08 -0.8% 9.768e+08 perf-stat.ps.cache-references 134.98 -1.1% 133.49 -0.1% 134.89 perf-stat.ps.cpu-migrations 4.747e+10 -10.1% 4.268e+10 -3.1% 4.601e+10 perf-stat.ps.instructions 9463902 -12.9% 8241837 -4.5% 9037920 perf-stat.ps.minor-faults 9463902 -12.9% 8241837 -4.5% 9037920 perf-stat.ps.page-faults 1.434e+13 -9.9% 1.292e+13 -2.9% 1.393e+13 perf-stat.total.instructions 64.15 -2.5 61.69 -0.9 63.21 perf-profile.calltrace.cycles-pp.testcase 58.30 -1.9 56.36 -0.7 57.58 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 52.64 -1.3 51.29 -0.5 52.17 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 52.50 -1.3 51.18 -0.5 52.05 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 50.81 -1.0 49.86 -0.2 50.64 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 9.27 -0.9 8.36 -0.4 8.83 ± 2% perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 49.86 -0.8 49.02 -0.1 49.76 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 49.21 -0.8 48.45 -0.1 49.14 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 0.60 ± 4% -0.6 0.00 -0.2 0.35 ± 70% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault 3.24 -0.5 2.73 -0.3 2.98 perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 5.15 -0.5 4.65 -0.2 4.94 perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase 0.82 -0.3 0.53 -0.3 0.56 perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 1.68 -0.3 1.43 -0.2 1.51 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault 1.50 ± 2% -0.2 1.26 ± 3% -0.1 1.42 perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault 2.52 -0.2 2.27 -0.1 2.40 perf-profile.calltrace.cycles-pp.error_entry.testcase 1.85 -0.2 1.68 -0.1 1.78 ± 2% perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.55 -0.1 1.42 -0.1 1.49 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault 1.07 -0.1 0.95 -0.1 1.00 perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault 0.68 -0.1 0.56 ± 2% -0.1 0.61 perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 0.55 -0.1 0.42 ± 44% -0.0 0.53 ± 2% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc 0.90 -0.1 0.80 -0.0 0.86 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase 0.89 -0.1 0.84 -0.0 0.88 perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault 1.23 -0.0 1.21 +0.0 1.27 perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 1.15 -0.0 1.13 +0.0 1.19 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault 0.96 +0.0 0.96 +0.1 1.01 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault 0.73 ± 2% +0.0 0.75 +0.1 0.79 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault 1.00 +0.1 1.06 +0.1 1.08 perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 3.85 +0.2 4.09 +0.1 3.95 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap 3.85 +0.2 4.09 +0.1 3.95 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap 3.85 +0.2 4.09 +0.1 3.96 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 3.82 +0.2 4.07 +0.1 3.92 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region 3.68 +0.3 3.93 +0.1 3.80 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu 0.83 +0.3 1.12 ± 2% +0.3 1.14 perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault.__handle_mm_fault 0.00 +0.6 0.56 ± 3% +0.3 0.34 ± 70% perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range 31.81 +0.6 32.44 +0.4 32.22 perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault 31.69 +0.6 32.33 +0.4 32.11 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault 30.47 +0.6 31.11 +0.4 30.90 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range 30.48 +0.6 31.13 +0.4 30.91 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 30.44 +0.7 31.09 +0.4 30.88 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma 0.00 +0.7 0.68 ± 2% +0.6 0.63 perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range 35.03 +0.7 35.76 +0.6 35.66 perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 32.87 +0.9 33.79 +0.7 33.58 perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 29.54 +2.3 31.84 +0.9 30.39 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 29.54 +2.3 31.84 +0.9 30.39 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range 29.53 +2.3 31.83 +0.9 30.39 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range 30.66 +2.3 32.98 +0.9 31.57 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 30.66 +2.3 32.98 +0.9 31.57 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 30.66 +2.3 32.98 +0.9 31.57 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap 30.66 +2.3 32.98 +0.9 31.57 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 29.26 +2.4 31.64 +0.9 30.16 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range 28.41 +2.4 30.83 +1.0 29.39 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu 34.56 +2.6 37.12 +1.0 35.57 perf-profile.calltrace.cycles-pp.__munmap 34.55 +2.6 37.12 +1.0 35.57 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 34.55 +2.6 37.12 +1.0 35.57 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 34.55 +2.6 37.12 +1.0 35.57 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 34.55 +2.6 37.12 +1.0 35.57 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 34.56 +2.6 37.12 +1.0 35.57 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 34.56 +2.6 37.12 +1.0 35.57 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 34.55 +2.6 37.11 +1.0 35.56 perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 31.41 +2.8 34.25 +1.1 32.55 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache 31.38 +2.9 34.24 +1.1 32.53 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs 31.42 +2.9 34.28 +1.1 32.56 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages 65.26 -2.6 62.67 -1.0 64.26 perf-profile.children.cycles-pp.testcase 56.09 -1.7 54.39 -0.6 55.47 perf-profile.children.cycles-pp.asm_exc_page_fault 52.66 -1.3 51.31 -0.5 52.19 perf-profile.children.cycles-pp.exc_page_fault 52.52 -1.3 51.20 -0.5 52.07 perf-profile.children.cycles-pp.do_user_addr_fault 50.83 -1.0 49.88 -0.2 50.66 perf-profile.children.cycles-pp.handle_mm_fault 9.35 -0.9 8.44 -0.4 8.91 ± 2% perf-profile.children.cycles-pp.copy_page 49.87 -0.8 49.03 -0.1 49.77 perf-profile.children.cycles-pp.__handle_mm_fault 49.23 -0.8 48.47 -0.1 49.16 perf-profile.children.cycles-pp.do_fault 3.27 -0.5 2.76 -0.3 3.01 perf-profile.children.cycles-pp.folio_prealloc 5.15 -0.5 4.65 -0.2 4.94 perf-profile.children.cycles-pp.__irqentry_text_end 0.82 -0.3 0.53 -0.3 0.57 perf-profile.children.cycles-pp.lock_vma_under_rcu 1.52 ± 2% -0.3 1.26 ± 3% -0.1 1.43 perf-profile.children.cycles-pp.__mem_cgroup_charge 1.69 -0.2 1.44 -0.2 1.52 perf-profile.children.cycles-pp.vma_alloc_folio_noprof 2.54 -0.2 2.29 -0.1 2.43 perf-profile.children.cycles-pp.error_entry 0.57 -0.2 0.33 -0.2 0.34 perf-profile.children.cycles-pp.mas_walk 1.87 -0.2 1.70 -0.1 1.80 ± 2% perf-profile.children.cycles-pp.__pte_offset_map_lock 0.60 ± 4% -0.2 0.44 ± 6% -0.1 0.52 ± 5% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm 1.57 -0.1 1.43 -0.1 1.51 ± 3% perf-profile.children.cycles-pp._raw_spin_lock 1.12 -0.1 0.99 -0.1 1.04 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof 0.70 -0.1 0.57 ± 2% -0.1 0.62 perf-profile.children.cycles-pp.lru_add_fn 0.95 -0.1 0.82 ± 5% +0.3 1.22 ± 2% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 1.16 -0.1 1.04 -0.0 1.11 perf-profile.children.cycles-pp.native_irq_return_iret 0.94 -0.1 0.84 -0.0 0.90 perf-profile.children.cycles-pp.sync_regs 0.43 -0.1 0.34 ± 2% -0.0 0.39 perf-profile.children.cycles-pp.free_unref_folios 0.96 -0.1 0.87 -0.0 0.92 perf-profile.children.cycles-pp.__perf_sw_event 0.44 -0.1 0.36 -0.1 0.39 perf-profile.children.cycles-pp.get_vma_policy 0.21 ± 3% -0.1 0.13 ± 2% -0.0 0.16 ± 2% perf-profile.children.cycles-pp._compound_head 0.75 -0.1 0.68 -0.0 0.72 perf-profile.children.cycles-pp.___perf_sw_event 0.94 -0.1 0.88 -0.0 0.92 perf-profile.children.cycles-pp.__alloc_pages_noprof 0.44 ± 5% -0.1 0.37 ± 7% -0.0 0.42 ± 6% perf-profile.children.cycles-pp.__count_memcg_events 0.31 -0.1 0.24 ± 2% -0.0 0.28 ± 3% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios 0.41 ± 4% -0.1 0.35 ± 7% -0.0 0.40 ± 5% perf-profile.children.cycles-pp.mem_cgroup_commit_charge 0.57 -0.0 0.52 -0.0 0.55 ± 2% perf-profile.children.cycles-pp.get_page_from_freelist 0.17 ± 2% -0.0 0.12 ± 4% -0.0 0.15 ± 3% perf-profile.children.cycles-pp.uncharge_batch 0.19 ± 3% -0.0 0.15 ± 8% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.cgroup_rstat_updated 0.15 ± 2% -0.0 0.12 ± 4% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.free_unref_page_commit 0.32 ± 3% -0.0 0.29 ± 2% -0.0 0.30 ± 2% perf-profile.children.cycles-pp.__mod_node_page_state 0.13 ± 3% -0.0 0.10 ± 5% -0.0 0.11 ± 3% perf-profile.children.cycles-pp.page_counter_uncharge 0.13 ± 2% -0.0 0.10 ± 4% -0.0 0.12 ± 6% perf-profile.children.cycles-pp.__mod_zone_page_state 0.10 ± 3% -0.0 0.07 ± 5% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.mem_cgroup_update_lru_size 0.08 -0.0 0.05 -0.0 0.05 ± 8% perf-profile.children.cycles-pp.policy_nodemask 1.24 -0.0 1.21 +0.0 1.28 perf-profile.children.cycles-pp.__do_fault 0.36 -0.0 0.33 -0.0 0.34 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.39 -0.0 0.37 -0.0 0.38 ± 2% perf-profile.children.cycles-pp.rmqueue 0.17 ± 2% -0.0 0.15 -0.0 0.16 ± 3% perf-profile.children.cycles-pp.percpu_counter_add_batch 0.32 -0.0 0.30 -0.0 0.31 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 1.15 -0.0 1.13 +0.0 1.19 perf-profile.children.cycles-pp.shmem_fault 0.09 -0.0 0.07 -0.0 0.08 perf-profile.children.cycles-pp.get_pfnblock_flags_mask 0.16 -0.0 0.14 -0.0 0.15 ± 3% perf-profile.children.cycles-pp.handle_pte_fault 0.12 ± 3% -0.0 0.10 ± 3% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.uncharge_folio 0.16 ± 2% -0.0 0.14 ± 2% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.shmem_get_policy 0.29 -0.0 0.27 -0.0 0.28 ± 2% perf-profile.children.cycles-pp.hrtimer_interrupt 0.08 -0.0 0.06 ± 6% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.folio_unlock 0.16 ± 4% -0.0 0.14 ± 3% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.__pte_offset_map 0.25 -0.0 0.24 -0.0 0.24 perf-profile.children.cycles-pp.__hrtimer_run_queues 0.30 -0.0 0.28 ± 2% -0.0 0.28 perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.20 ± 2% -0.0 0.18 ± 3% -0.0 0.19 ± 2% perf-profile.children.cycles-pp.tick_nohz_handler 0.09 ± 4% -0.0 0.08 -0.0 0.09 perf-profile.children.cycles-pp.down_read_trylock 0.12 ± 3% -0.0 0.11 -0.0 0.12 ± 3% perf-profile.children.cycles-pp.folio_add_new_anon_rmap 0.99 -0.0 0.99 +0.1 1.04 ± 2% perf-profile.children.cycles-pp.shmem_get_folio_gfp 0.04 ± 44% +0.0 0.06 ± 7% -0.0 0.02 ±142% perf-profile.children.cycles-pp.kthread 0.04 ± 44% +0.0 0.06 ± 7% -0.0 0.02 ±142% perf-profile.children.cycles-pp.ret_from_fork 0.04 ± 44% +0.0 0.06 ± 7% -0.0 0.02 ±142% perf-profile.children.cycles-pp.ret_from_fork_asm 0.73 +0.0 0.75 +0.1 0.79 perf-profile.children.cycles-pp.filemap_get_entry 0.00 +0.1 0.05 +0.0 0.01 ±223% perf-profile.children.cycles-pp._raw_spin_lock_irq 1.02 +0.1 1.07 +0.1 1.10 perf-profile.children.cycles-pp.zap_present_ptes 0.47 +0.2 0.68 ± 2% +0.2 0.64 perf-profile.children.cycles-pp.folio_remove_rmap_ptes 3.87 +0.2 4.11 +0.1 3.97 perf-profile.children.cycles-pp.tlb_finish_mmu 1.17 +0.6 1.75 ± 2% +0.5 1.67 perf-profile.children.cycles-pp.__lruvec_stat_mod_folio 31.81 +0.6 32.44 +0.4 32.22 perf-profile.children.cycles-pp.folio_add_lru_vma 31.77 +0.6 32.42 +0.4 32.19 perf-profile.children.cycles-pp.folio_batch_move_lru 35.04 +0.7 35.77 +0.6 35.67 perf-profile.children.cycles-pp.finish_fault 32.88 +0.9 33.80 +0.7 33.59 perf-profile.children.cycles-pp.set_pte_range 29.54 +2.3 31.84 +0.9 30.39 perf-profile.children.cycles-pp.tlb_flush_mmu 30.66 +2.3 32.98 +0.9 31.57 perf-profile.children.cycles-pp.zap_pte_range 30.66 +2.3 32.98 +0.9 31.58 perf-profile.children.cycles-pp.unmap_page_range 30.66 +2.3 32.98 +0.9 31.58 perf-profile.children.cycles-pp.unmap_vmas 30.66 +2.3 32.98 +0.9 31.58 perf-profile.children.cycles-pp.zap_pmd_range 33.41 +2.5 35.95 +1.0 34.36 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages 33.40 +2.5 35.94 +1.0 34.36 perf-profile.children.cycles-pp.free_pages_and_swap_cache 34.56 +2.6 37.12 +1.0 35.57 perf-profile.children.cycles-pp.__x64_sys_munmap 34.56 +2.6 37.12 +1.0 35.57 perf-profile.children.cycles-pp.__vm_munmap 34.56 +2.6 37.12 +1.0 35.58 perf-profile.children.cycles-pp.do_vmi_munmap 34.56 +2.6 37.12 +1.0 35.57 perf-profile.children.cycles-pp.__munmap 34.56 +2.6 37.12 +1.0 35.58 perf-profile.children.cycles-pp.do_vmi_align_munmap 34.56 +2.6 37.12 +1.0 35.58 perf-profile.children.cycles-pp.unmap_region 34.67 +2.6 37.24 +1.0 35.68 perf-profile.children.cycles-pp.do_syscall_64 34.67 +2.6 37.24 +1.0 35.69 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 33.22 +2.6 35.83 +1.0 34.21 perf-profile.children.cycles-pp.folios_put_refs 32.12 +2.7 34.80 +1.1 33.22 perf-profile.children.cycles-pp.__page_cache_release 61.97 +3.5 65.47 +1.6 63.54 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 61.98 +3.5 65.50 +1.6 63.56 perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 61.94 +3.5 65.48 +1.6 63.51 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 9.32 -0.9 8.41 -0.4 8.88 ± 2% perf-profile.self.cycles-pp.copy_page 5.15 -0.5 4.65 -0.2 4.94 perf-profile.self.cycles-pp.__irqentry_text_end 2.58 -0.3 2.30 -0.1 2.46 perf-profile.self.cycles-pp.testcase 2.53 -0.2 2.28 -0.1 2.42 perf-profile.self.cycles-pp.error_entry 0.56 -0.2 0.32 ± 2% -0.2 0.34 perf-profile.self.cycles-pp.mas_walk 0.60 ± 4% -0.2 0.43 ± 5% -0.1 0.51 ± 5% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm 1.54 -0.1 1.42 -0.1 1.49 ± 3% perf-profile.self.cycles-pp._raw_spin_lock 1.15 -0.1 1.04 -0.0 1.11 perf-profile.self.cycles-pp.native_irq_return_iret 0.94 -0.1 0.84 -0.0 0.90 perf-profile.self.cycles-pp.sync_regs 0.85 -0.1 0.75 ± 5% +0.3 1.13 ± 2% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.20 ± 3% -0.1 0.12 ± 3% -0.1 0.15 ± 2% perf-profile.self.cycles-pp._compound_head 0.27 ± 3% -0.1 0.19 ± 2% -0.0 0.23 ± 3% perf-profile.self.cycles-pp.free_pages_and_swap_cache 0.26 -0.1 0.19 ± 3% -0.0 0.25 ± 2% perf-profile.self.cycles-pp.__page_cache_release 0.66 -0.1 0.59 -0.0 0.63 perf-profile.self.cycles-pp.___perf_sw_event 0.28 ± 2% -0.1 0.22 ± 3% -0.0 0.25 perf-profile.self.cycles-pp.zap_present_ptes 0.32 -0.1 0.27 ± 4% -0.0 0.28 perf-profile.self.cycles-pp.lru_add_fn 0.37 ± 5% -0.1 0.32 ± 6% -0.0 0.36 ± 6% perf-profile.self.cycles-pp.__count_memcg_events 0.26 -0.1 0.20 -0.0 0.21 perf-profile.self.cycles-pp.get_vma_policy 0.47 -0.1 0.42 -0.0 0.44 ± 2% perf-profile.self.cycles-pp.__handle_mm_fault 0.16 -0.0 0.12 ± 4% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.vma_alloc_folio_noprof 0.20 -0.0 0.16 ± 3% -0.0 0.18 ± 2% perf-profile.self.cycles-pp.free_unref_folios 0.30 -0.0 0.25 -0.0 0.26 perf-profile.self.cycles-pp.handle_mm_fault 0.16 ± 4% -0.0 0.12 ± 3% -0.0 0.13 ± 3% perf-profile.self.cycles-pp.lock_vma_under_rcu 0.14 ± 3% -0.0 0.11 ± 3% -0.0 0.13 perf-profile.self.cycles-pp.folio_remove_rmap_ptes 0.10 ± 4% -0.0 0.07 -0.0 0.09 ± 4% perf-profile.self.cycles-pp.zap_pte_range 0.16 ± 2% -0.0 0.12 ± 7% -0.0 0.16 ± 3% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.10 ± 4% -0.0 0.07 ± 5% -0.0 0.08 ± 4% perf-profile.self.cycles-pp.alloc_pages_mpol_noprof 0.11 -0.0 0.08 -0.0 0.10 ± 4% perf-profile.self.cycles-pp.free_unref_page_commit 0.09 ± 5% -0.0 0.06 ± 7% -0.0 0.08 perf-profile.self.cycles-pp.mem_cgroup_update_lru_size 0.11 -0.0 0.08 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.page_counter_uncharge 0.12 ± 4% -0.0 0.09 -0.0 0.11 ± 5% perf-profile.self.cycles-pp.__mod_zone_page_state 0.31 ± 2% -0.0 0.29 ± 2% -0.0 0.30 ± 2% perf-profile.self.cycles-pp.__mod_node_page_state 0.14 ± 2% -0.0 0.12 ± 4% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.mem_cgroup_commit_charge 0.21 -0.0 0.19 ± 2% -0.0 0.20 ± 2% perf-profile.self.cycles-pp.do_user_addr_fault 0.09 -0.0 0.07 ± 5% -0.0 0.08 perf-profile.self.cycles-pp.get_pfnblock_flags_mask 0.21 -0.0 0.19 ± 2% -0.0 0.20 ± 2% perf-profile.self.cycles-pp.__perf_sw_event 0.17 ± 2% -0.0 0.15 -0.0 0.16 ± 2% perf-profile.self.cycles-pp.percpu_counter_add_batch 0.28 -0.0 0.26 ± 2% -0.0 0.27 perf-profile.self.cycles-pp.__alloc_pages_noprof 0.22 ± 2% -0.0 0.19 ± 2% -0.0 0.20 ± 2% perf-profile.self.cycles-pp.__pte_offset_map_lock 0.20 ± 2% -0.0 0.18 ± 2% -0.0 0.20 ± 3% perf-profile.self.cycles-pp.shmem_get_folio_gfp 0.12 -0.0 0.10 -0.0 0.11 ± 4% perf-profile.self.cycles-pp.uncharge_folio 0.11 ± 4% -0.0 0.09 ± 4% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.__mem_cgroup_charge 0.08 -0.0 0.06 ± 6% -0.0 0.07 ± 5% perf-profile.self.cycles-pp.folio_unlock 0.14 ± 3% -0.0 0.12 ± 3% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.do_fault 0.16 ± 3% -0.0 0.14 ± 2% -0.0 0.15 ± 3% perf-profile.self.cycles-pp.shmem_get_policy 0.10 ± 3% -0.0 0.08 ± 5% -0.0 0.09 perf-profile.self.cycles-pp.set_pte_range 0.16 ± 2% -0.0 0.15 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.get_page_from_freelist 0.10 ± 3% -0.0 0.09 -0.0 0.10 ± 5% perf-profile.self.cycles-pp.exc_page_fault 0.12 ± 3% -0.0 0.11 -0.0 0.12 ± 3% perf-profile.self.cycles-pp.folio_add_new_anon_rmap 0.09 -0.0 0.08 +0.0 0.09 perf-profile.self.cycles-pp.down_read_trylock 0.38 ± 2% +0.0 0.42 +0.1 0.44 ± 2% perf-profile.self.cycles-pp.filemap_get_entry 0.26 +0.1 0.36 -0.0 0.23 perf-profile.self.cycles-pp.folios_put_refs 0.33 +0.1 0.45 ± 4% +0.1 0.40 perf-profile.self.cycles-pp.folio_batch_move_lru 0.40 ± 5% +0.6 0.99 +0.2 0.59 perf-profile.self.cycles-pp.__lruvec_stat_mod_folio 61.94 +3.5 65.48 +1.6 63.51 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath [2] ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 194.40 ± 9% -13.9% 167.40 ± 2% -10.0% 175.00 ± 4% perf-c2c.HITM.remote 0.27 ± 3% -0.0 0.24 ± 2% -0.0 0.25 ± 2% mpstat.cpu.all.irq% 3.83 -0.6 3.21 -0.5 3.37 ± 2% mpstat.cpu.all.usr% 15383898 -12.9% 13401271 -10.1% 13823802 will-it-scale.64.processes 240373 -12.9% 209394 -10.1% 215996 will-it-scale.per_process_ops 15383898 -12.9% 13401271 -10.1% 13823802 will-it-scale.workload 2.359e+09 -12.8% 2.057e+09 -10.2% 2.118e+09 ± 2% numa-numastat.node0.local_node 2.359e+09 -12.8% 2.057e+09 -10.2% 2.118e+09 ± 2% numa-numastat.node0.numa_hit 2.346e+09 -13.2% 2.035e+09 ± 2% -10.3% 2.105e+09 numa-numastat.node1.local_node 2.345e+09 -13.2% 2.036e+09 ± 2% -10.2% 2.105e+09 numa-numastat.node1.numa_hit 2.36e+09 -12.9% 2.056e+09 -10.2% 2.118e+09 ± 2% numa-vmstat.node0.numa_hit 2.36e+09 -12.9% 2.056e+09 -10.3% 2.118e+09 ± 2% numa-vmstat.node0.numa_local 2.346e+09 -13.3% 2.035e+09 ± 2% -10.3% 2.105e+09 numa-vmstat.node1.numa_hit 2.347e+09 -13.3% 2.034e+09 ± 2% -10.3% 2.105e+09 numa-vmstat.node1.numa_local 7.86 ± 5% -29.5% 5.54 ± 34% -37.0% 4.95 ± 30% sched_debug.cfs_rq:/.removed.runnable_avg.avg 22.93 ± 4% -18.5% 18.68 ± 15% -21.7% 17.96 ± 20% sched_debug.cfs_rq:/.removed.runnable_avg.stddev 7.86 ± 5% -30.0% 5.50 ± 34% -37.0% 4.95 ± 30% sched_debug.cfs_rq:/.removed.util_avg.avg 22.93 ± 4% -19.9% 18.35 ± 14% -21.7% 17.96 ± 20% sched_debug.cfs_rq:/.removed.util_avg.stddev 149.50 ± 33% -70.9% 43.57 ±125% -58.2% 62.42 ± 67% sched_debug.cfs_rq:/.util_est.min 1930 ± 4% -10.5% 1729 ± 16% -14.9% 1643 ± 6% sched_debug.cpu.nr_switches.min 1137116 -1.8% 1116759 -1.8% 1116590 proc-vmstat.nr_anon_pages 4575 +1.7% 4654 +1.7% 4652 proc-vmstat.nr_page_table_pages 4.705e+09 -13.0% 4.093e+09 -10.2% 4.224e+09 proc-vmstat.numa_hit 4.706e+09 -13.0% 4.092e+09 -10.3% 4.223e+09 proc-vmstat.numa_local 4.645e+09 -12.8% 4.05e+09 -10.1% 4.177e+09 proc-vmstat.pgalloc_normal 4.631e+09 -12.8% 4.038e+09 -10.1% 4.164e+09 proc-vmstat.pgfault 4.643e+09 -12.8% 4.049e+09 -10.1% 4.176e+09 proc-vmstat.pgfree 21.14 -9.9% 19.05 -7.4% 19.58 perf-stat.i.MPKI 1.468e+10 -7.9% 1.351e+10 -6.2% 1.378e+10 perf-stat.i.branch-instructions 14349180 -6.2% 13464962 -5.2% 13596701 perf-stat.i.branch-misses 69.58 -4.6 64.96 -3.2 66.40 perf-stat.i.cache-miss-rate% 1.57e+09 -17.8% 1.291e+09 -13.6% 1.356e+09 ± 2% perf-stat.i.cache-misses 2.252e+09 -11.9% 1.985e+09 -9.4% 2.039e+09 perf-stat.i.cache-references 3.00 +10.6% 3.32 +8.1% 3.25 perf-stat.i.cpi 99.00 -0.9% 98.13 -1.1% 97.87 perf-stat.i.cpu-migrations 143.06 +22.4% 175.18 +16.4% 166.58 ± 2% perf-stat.i.cycles-between-cache-misses 7.403e+10 -8.7% 6.76e+10 -6.7% 6.91e+10 perf-stat.i.instructions 0.34 -9.7% 0.30 -7.6% 0.31 perf-stat.i.ipc 478.41 -12.7% 417.50 -10.0% 430.74 perf-stat.i.metric.K/sec 15310132 -12.7% 13361235 -10.0% 13784853 perf-stat.i.minor-faults 15310132 -12.7% 13361235 -10.0% 13784853 perf-stat.i.page-faults 21.21 -28.3% 15.20 ± 50% -7.5% 19.62 perf-stat.overall.MPKI 0.10 -0.0 0.08 ± 50% +0.0 0.10 perf-stat.overall.branch-miss-rate% 69.71 -17.9 51.83 ± 50% -3.2 66.46 perf-stat.overall.cache-miss-rate% 3.01 -11.4% 2.67 ± 50% +8.0% 3.25 perf-stat.overall.cpi 141.98 -1.2% 140.33 ± 50% +16.8% 165.83 ± 2% perf-stat.overall.cycles-between-cache-misses 0.33 -27.7% 0.24 ± 50% -7.4% 0.31 perf-stat.overall.ipc 1453908 -16.2% 1218410 ± 50% +3.6% 1506867 perf-stat.overall.path-length 1.463e+10 -26.4% 1.077e+10 ± 50% -6.2% 1.373e+10 perf-stat.ps.branch-instructions 14253731 -25.1% 10681742 ± 50% -5.2% 13506212 perf-stat.ps.branch-misses 1.565e+09 -34.6% 1.023e+09 ± 50% -13.6% 1.351e+09 ± 2% perf-stat.ps.cache-misses 2.245e+09 -29.6% 1.579e+09 ± 50% -9.4% 2.032e+09 perf-stat.ps.cache-references 7.378e+10 -27.0% 5.385e+10 ± 50% -6.7% 6.886e+10 perf-stat.ps.instructions 15260342 -30.3% 10633461 ± 50% -10.0% 13738637 perf-stat.ps.minor-faults 15260342 -30.3% 10633461 ± 50% -10.0% 13738637 perf-stat.ps.page-faults 2.237e+13 -27.2% 1.629e+13 ± 50% -6.9% 2.083e+13 perf-stat.total.instructions 75.68 -5.4 70.26 -5.0 70.73 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 72.31 -5.1 67.25 -4.7 67.66 perf-profile.calltrace.cycles-pp.testcase 63.50 -3.9 59.64 -3.7 59.78 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 63.32 -3.8 59.48 -3.7 59.63 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 61.04 -3.6 57.49 -3.5 57.55 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 21.29 -3.5 17.77 ± 2% -2.8 18.48 ± 2% perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 59.53 -3.3 56.21 -3.3 56.24 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 58.35 -3.2 55.17 -3.2 55.16 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 5.31 -0.8 4.50 -0.7 4.64 perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 4.97 -0.8 4.21 -0.6 4.35 perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault 4.40 -0.6 3.78 ± 2% -0.4 3.96 ± 3% perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.57 -0.6 0.00 -0.3 0.26 ±100% perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 2.63 -0.3 2.29 -0.3 2.36 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase 1.82 -0.3 1.49 -0.3 1.55 perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 2.21 -0.3 1.90 -0.2 1.97 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault 2.01 -0.3 1.73 ± 2% -0.2 1.84 ± 5% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault 1.80 -0.3 1.54 -0.2 1.59 perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault 1.55 -0.2 1.33 -0.2 1.36 perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault 1.74 -0.2 1.52 ± 2% -0.2 1.57 perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.63 ± 2% -0.2 0.41 ± 50% -0.1 0.53 ± 2% perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault 1.60 -0.2 1.39 ± 2% -0.2 1.44 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.29 -0.2 1.11 ± 3% -0.1 1.19 ± 6% perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault 1.42 -0.2 1.24 ± 2% -0.1 1.28 ± 2% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault 1.12 -0.2 0.95 ± 2% -0.1 0.98 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc 1.50 -0.1 1.36 ± 3% -0.2 1.33 ± 2% perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault.__handle_mm_fault 0.72 ± 2% -0.1 0.60 ± 3% -0.1 0.62 ± 2% perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.98 -0.1 0.87 ± 2% -0.1 0.90 ± 2% perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.92 -0.1 0.81 ± 3% -0.1 0.84 ± 3% perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault 0.74 -0.1 0.64 -0.1 0.66 perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range 0.66 -0.1 0.56 ± 2% -0.1 0.59 perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.64 -0.1 0.56 ± 2% -0.1 0.57 ± 2% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof 1.15 -0.1 1.07 -0.1 1.08 ± 2% perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 0.66 -0.1 0.58 ± 2% -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.mas_walk.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 2.71 +0.6 3.31 ± 2% +0.5 3.23 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 2.71 +0.6 3.31 ± 2% +0.5 3.23 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap 2.71 +0.6 3.31 ± 2% +0.5 3.22 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap 2.65 +0.6 3.26 ± 2% +0.5 3.17 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region 2.44 +0.6 3.07 ± 2% +0.5 2.98 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu 24.39 +2.1 26.54 ± 3% +1.0 25.41 ± 4% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 22.46 +2.3 24.81 ± 4% +1.2 23.70 ± 4% perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault 22.25 +2.4 24.63 ± 4% +1.3 23.52 ± 5% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault 20.38 +2.5 22.84 ± 4% +1.3 21.71 ± 5% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 20.37 +2.5 22.83 ± 4% +1.3 21.70 ± 5% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range 20.30 +2.5 22.77 ± 4% +1.3 21.63 ± 5% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma 22.59 +4.7 27.29 ± 2% +4.3 26.92 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 22.59 +4.7 27.29 ± 2% +4.3 26.92 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 22.59 +4.7 27.29 ± 2% +4.3 26.92 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap 22.58 +4.7 27.28 ± 2% +4.3 26.91 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 20.59 +5.1 25.64 ± 2% +4.6 25.21 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 20.59 +5.1 25.64 ± 2% +4.6 25.20 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range 20.56 +5.1 25.62 ± 2% +4.6 25.18 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range 20.07 +5.2 25.23 ± 3% +4.7 24.78 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range 18.73 +5.3 24.01 ± 3% +4.8 23.55 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu 25.34 +5.3 30.64 ± 2% +4.8 30.19 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 25.34 +5.3 30.64 ± 2% +4.8 30.19 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 25.34 +5.3 30.64 ± 2% +4.8 30.19 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 25.34 +5.3 30.64 ± 2% +4.8 30.19 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 25.34 +5.3 30.65 ± 2% +4.9 30.19 perf-profile.calltrace.cycles-pp.__munmap 25.34 +5.3 30.64 ± 2% +4.9 30.19 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 25.33 +5.3 30.64 ± 2% +4.9 30.18 perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 25.33 +5.3 30.64 ± 2% +4.9 30.19 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 20.36 +5.9 26.30 ± 3% +5.4 25.74 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages 20.35 +5.9 26.29 ± 3% +5.4 25.73 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache 20.28 +6.0 26.24 ± 3% +5.4 25.67 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs 74.49 -5.3 69.18 -4.9 69.64 perf-profile.children.cycles-pp.testcase 71.15 -4.8 66.30 -4.5 66.66 perf-profile.children.cycles-pp.asm_exc_page_fault 63.55 -3.9 59.68 -3.7 59.82 perf-profile.children.cycles-pp.exc_page_fault 63.38 -3.8 59.54 -3.7 59.68 perf-profile.children.cycles-pp.do_user_addr_fault 61.10 -3.6 57.54 -3.5 57.61 perf-profile.children.cycles-pp.handle_mm_fault 21.32 -3.5 17.80 ± 2% -2.8 18.51 ± 2% perf-profile.children.cycles-pp.copy_page 59.57 -3.3 56.24 -3.3 56.27 perf-profile.children.cycles-pp.__handle_mm_fault 58.44 -3.2 55.25 -3.2 55.25 perf-profile.children.cycles-pp.do_fault 5.36 -0.8 4.54 -0.7 4.69 perf-profile.children.cycles-pp.__pte_offset_map_lock 5.02 -0.8 4.25 -0.6 4.38 perf-profile.children.cycles-pp._raw_spin_lock 4.45 -0.6 3.82 ± 2% -0.4 4.00 ± 3% perf-profile.children.cycles-pp.folio_prealloc 2.64 -0.3 2.30 -0.3 2.37 perf-profile.children.cycles-pp.sync_regs 1.89 -0.3 1.55 -0.3 1.62 perf-profile.children.cycles-pp.zap_present_ptes 2.42 -0.3 2.09 ± 2% -0.3 2.16 ± 2% perf-profile.children.cycles-pp.native_irq_return_iret 2.24 -0.3 1.93 -0.2 2.00 perf-profile.children.cycles-pp.vma_alloc_folio_noprof 2.07 -0.3 1.77 ± 2% -0.2 1.88 ± 5% perf-profile.children.cycles-pp.__mem_cgroup_charge 1.89 -0.3 1.62 -0.2 1.67 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof 1.64 -0.2 1.41 -0.2 1.45 perf-profile.children.cycles-pp.__alloc_pages_noprof 1.42 -0.2 1.19 ± 2% -0.2 1.23 ± 2% perf-profile.children.cycles-pp.__perf_sw_event 1.77 -0.2 1.54 ± 2% -0.2 1.60 perf-profile.children.cycles-pp.__do_fault 1.62 -0.2 1.41 ± 2% -0.2 1.46 ± 2% perf-profile.children.cycles-pp.shmem_fault 1.25 -0.2 1.05 ± 2% -0.2 1.08 ± 2% perf-profile.children.cycles-pp.___perf_sw_event 2.04 -0.2 1.83 ± 3% -0.2 1.82 ± 2% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio 1.32 -0.2 1.13 ± 2% -0.1 1.21 ± 6% perf-profile.children.cycles-pp.mem_cgroup_commit_charge 1.47 -0.2 1.29 ± 2% -0.1 1.34 ± 2% perf-profile.children.cycles-pp.shmem_get_folio_gfp 1.17 -0.2 1.00 ± 2% -0.1 1.03 perf-profile.children.cycles-pp.get_page_from_freelist 0.84 -0.2 0.69 ± 2% -0.1 0.71 ± 3% perf-profile.children.cycles-pp.__mod_lruvec_state 0.61 -0.2 0.46 ± 2% -0.1 0.48 perf-profile.children.cycles-pp._compound_head 0.65 -0.1 0.53 ± 2% -0.1 0.54 ± 3% perf-profile.children.cycles-pp.__mod_node_page_state 1.02 -0.1 0.90 ± 2% -0.1 0.93 ± 2% perf-profile.children.cycles-pp.lock_vma_under_rcu 0.94 -0.1 0.82 ± 3% -0.1 0.85 ± 2% perf-profile.children.cycles-pp.filemap_get_entry 1.13 ± 2% -0.1 1.03 ± 3% -0.1 1.02 ± 3% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 0.76 -0.1 0.66 -0.1 0.68 perf-profile.children.cycles-pp.folio_remove_rmap_ptes 1.20 -0.1 1.11 -0.1 1.12 perf-profile.children.cycles-pp.lru_add_fn 0.69 -0.1 0.60 ± 2% -0.1 0.61 ± 2% perf-profile.children.cycles-pp.rmqueue 0.47 -0.1 0.38 -0.1 0.40 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios 0.59 -0.1 0.50 -0.1 0.52 perf-profile.children.cycles-pp.free_unref_folios 0.63 ± 3% -0.1 0.55 ± 3% -0.0 0.59 ± 7% perf-profile.children.cycles-pp.__count_memcg_events 0.67 -0.1 0.59 ± 2% -0.1 0.61 ± 2% perf-profile.children.cycles-pp.mas_walk 0.54 -0.1 0.47 ± 3% -0.1 0.49 ± 3% perf-profile.children.cycles-pp.xas_load 0.27 ± 3% -0.1 0.21 -0.1 0.22 ± 3% perf-profile.children.cycles-pp.uncharge_batch 0.32 -0.1 0.26 -0.0 0.28 ± 3% perf-profile.children.cycles-pp.cgroup_rstat_updated 0.22 ± 3% -0.1 0.17 ± 2% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.page_counter_uncharge 0.38 -0.0 0.33 -0.0 0.34 ± 2% perf-profile.children.cycles-pp.try_charge_memcg 0.31 -0.0 0.26 -0.0 0.28 ± 2% perf-profile.children.cycles-pp.percpu_counter_add_batch 0.31 ± 2% -0.0 0.27 ± 3% -0.0 0.28 ± 5% perf-profile.children.cycles-pp.get_vma_policy 0.30 -0.0 0.26 ± 3% -0.0 0.27 perf-profile.children.cycles-pp.handle_pte_fault 0.28 -0.0 0.25 -0.0 0.26 perf-profile.children.cycles-pp.error_entry 0.22 -0.0 0.19 ± 2% -0.0 0.20 perf-profile.children.cycles-pp.free_unref_page_commit 0.28 ± 2% -0.0 0.25 ± 2% -0.0 0.26 ± 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.32 ± 2% -0.0 0.29 ± 2% -0.0 0.29 ± 3% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.26 ± 2% -0.0 0.23 ± 5% -0.0 0.23 ± 5% perf-profile.children.cycles-pp._raw_spin_trylock 0.22 ± 2% -0.0 0.20 ± 2% -0.0 0.20 ± 3% perf-profile.children.cycles-pp.folio_add_new_anon_rmap 0.22 ± 2% -0.0 0.19 ± 3% -0.0 0.19 perf-profile.children.cycles-pp.pte_offset_map_nolock 0.14 ± 2% -0.0 0.11 -0.0 0.12 ± 4% perf-profile.children.cycles-pp.__mod_zone_page_state 0.14 ± 3% -0.0 0.12 ± 4% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.perf_exclude_event 0.18 -0.0 0.15 ± 2% -0.0 0.16 perf-profile.children.cycles-pp.__rmqueue_pcplist 0.26 ± 3% -0.0 0.23 ± 5% -0.0 0.22 ± 3% perf-profile.children.cycles-pp.__pte_offset_map 0.26 ± 3% -0.0 0.23 -0.0 0.23 ± 4% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.25 ± 3% -0.0 0.22 -0.0 0.22 ± 4% perf-profile.children.cycles-pp.hrtimer_interrupt 0.18 ± 2% -0.0 0.15 ± 2% -0.0 0.16 ± 2% perf-profile.children.cycles-pp.__cond_resched 0.16 ± 2% -0.0 0.14 ± 2% -0.0 0.14 perf-profile.children.cycles-pp.uncharge_folio 0.19 ± 2% -0.0 0.17 ± 4% -0.0 0.18 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.17 ± 2% -0.0 0.15 ± 3% -0.0 0.15 ± 4% perf-profile.children.cycles-pp.folio_unlock 0.19 ± 2% -0.0 0.17 ± 3% -0.0 0.18 ± 2% perf-profile.children.cycles-pp.down_read_trylock 0.16 -0.0 0.14 ± 2% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.folio_put 0.14 ± 2% -0.0 0.12 ± 6% -0.0 0.12 ± 6% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode 0.11 ± 3% -0.0 0.09 ± 4% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.xas_start 0.13 ± 3% -0.0 0.11 ± 4% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.page_counter_try_charge 0.18 ± 3% -0.0 0.16 ± 3% -0.0 0.16 ± 5% perf-profile.children.cycles-pp.tick_nohz_handler 0.12 ± 3% -0.0 0.10 ± 4% -0.0 0.10 perf-profile.children.cycles-pp.get_pfnblock_flags_mask 0.18 ± 2% -0.0 0.16 ± 2% -0.0 0.17 perf-profile.children.cycles-pp.up_read 0.16 ± 2% -0.0 0.14 ± 3% -0.0 0.14 ± 5% perf-profile.children.cycles-pp.update_process_times 0.14 -0.0 0.12 ± 3% -0.0 0.13 perf-profile.children.cycles-pp.policy_nodemask 0.08 -0.0 0.06 ± 6% -0.0 0.07 ± 5% perf-profile.children.cycles-pp.memcg_check_events 0.13 ± 3% -0.0 0.11 -0.0 0.12 ± 4% perf-profile.children.cycles-pp.access_error 0.12 ± 3% -0.0 0.11 ± 3% -0.0 0.11 perf-profile.children.cycles-pp.perf_swevent_event 0.09 ± 4% -0.0 0.08 -0.0 0.08 perf-profile.children.cycles-pp.__irqentry_text_end 0.06 -0.0 0.05 -0.0 0.05 ± 7% perf-profile.children.cycles-pp.pte_alloc_one 0.05 +0.0 0.06 +0.0 0.06 ± 8% perf-profile.children.cycles-pp.perf_mmap__push 0.19 ± 2% +0.2 0.35 ± 4% +0.1 0.30 ± 3% perf-profile.children.cycles-pp.mem_cgroup_update_lru_size 2.72 +0.6 3.32 ± 2% +0.5 3.24 perf-profile.children.cycles-pp.tlb_finish_mmu 24.44 +2.1 26.58 ± 3% +1.0 25.45 ± 4% perf-profile.children.cycles-pp.set_pte_range 22.47 +2.3 24.81 ± 4% +1.2 23.71 ± 4% perf-profile.children.cycles-pp.folio_add_lru_vma 22.31 +2.4 24.70 ± 4% +1.3 23.58 ± 4% perf-profile.children.cycles-pp.folio_batch_move_lru 22.59 +4.7 27.29 ± 2% +4.3 26.92 perf-profile.children.cycles-pp.unmap_page_range 22.59 +4.7 27.29 ± 2% +4.3 26.92 perf-profile.children.cycles-pp.unmap_vmas 22.59 +4.7 27.29 ± 2% +4.3 26.92 perf-profile.children.cycles-pp.zap_pmd_range 22.59 +4.7 27.29 ± 2% +4.3 26.92 perf-profile.children.cycles-pp.zap_pte_range 20.59 +5.1 25.64 ± 2% +4.6 25.21 perf-profile.children.cycles-pp.tlb_flush_mmu 25.34 +5.3 30.64 ± 2% +4.9 30.19 perf-profile.children.cycles-pp.__vm_munmap 25.34 +5.3 30.64 ± 2% +4.9 30.19 perf-profile.children.cycles-pp.__x64_sys_munmap 25.34 +5.3 30.65 ± 2% +4.9 30.19 perf-profile.children.cycles-pp.__munmap 25.34 +5.3 30.65 ± 2% +4.9 30.20 perf-profile.children.cycles-pp.do_vmi_align_munmap 25.34 +5.3 30.65 ± 2% +4.9 30.20 perf-profile.children.cycles-pp.do_vmi_munmap 25.46 +5.3 30.77 ± 2% +4.9 30.32 perf-profile.children.cycles-pp.do_syscall_64 25.33 +5.3 30.64 ± 2% +4.9 30.19 perf-profile.children.cycles-pp.unmap_region 25.46 +5.3 30.77 ± 2% +4.9 30.32 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 23.30 +5.7 28.96 ± 2% +5.1 28.44 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages 23.29 +5.7 28.95 ± 2% +5.1 28.43 perf-profile.children.cycles-pp.free_pages_and_swap_cache 23.00 +5.7 28.73 ± 2% +5.2 28.20 perf-profile.children.cycles-pp.folios_put_refs 21.22 +5.9 27.13 ± 3% +5.4 26.57 perf-profile.children.cycles-pp.__page_cache_release 40.79 +8.4 49.20 +6.7 47.50 ± 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 40.78 +8.4 49.19 +6.7 47.49 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 40.64 +8.4 49.09 +6.7 47.38 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 21.23 -3.5 17.73 ± 2% -2.8 18.43 ± 2% perf-profile.self.cycles-pp.copy_page 4.99 -0.8 4.22 -0.6 4.36 perf-profile.self.cycles-pp._raw_spin_lock 5.21 -0.7 4.53 -0.5 4.68 perf-profile.self.cycles-pp.testcase 2.63 -0.3 2.29 -0.3 2.37 ± 2% perf-profile.self.cycles-pp.sync_regs 2.42 -0.3 2.09 ± 2% -0.3 2.16 ± 2% perf-profile.self.cycles-pp.native_irq_return_iret 1.00 -0.2 0.83 ± 2% -0.1 0.87 ± 2% perf-profile.self.cycles-pp.___perf_sw_event 0.58 ± 2% -0.1 0.43 ± 3% -0.1 0.46 perf-profile.self.cycles-pp._compound_head 0.93 ± 2% -0.1 0.80 ± 3% -0.1 0.84 ± 6% perf-profile.self.cycles-pp.mem_cgroup_commit_charge 0.61 -0.1 0.50 ± 3% -0.1 0.51 ± 3% perf-profile.self.cycles-pp.__mod_node_page_state 0.51 -0.1 0.40 ± 2% -0.1 0.42 perf-profile.self.cycles-pp.free_pages_and_swap_cache 0.80 -0.1 0.70 ± 2% -0.1 0.72 perf-profile.self.cycles-pp.__handle_mm_fault 0.61 ± 2% -0.1 0.51 -0.1 0.54 perf-profile.self.cycles-pp.lru_add_fn 0.47 -0.1 0.39 ± 2% -0.1 0.41 perf-profile.self.cycles-pp.get_page_from_freelist 0.93 ± 2% -0.1 0.86 ± 3% -0.1 0.85 ± 3% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.45 -0.1 0.38 -0.1 0.40 perf-profile.self.cycles-pp.zap_present_ptes 0.65 -0.1 0.58 ± 2% -0.1 0.60 ± 2% perf-profile.self.cycles-pp.mas_walk 0.89 ± 2% -0.1 0.83 ± 3% -0.1 0.83 ± 2% perf-profile.self.cycles-pp.__lruvec_stat_mod_folio 0.44 -0.1 0.39 -0.0 0.40 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp 0.52 ± 3% -0.1 0.46 ± 5% -0.0 0.49 ± 8% perf-profile.self.cycles-pp.__count_memcg_events 0.46 -0.1 0.41 ± 3% -0.0 0.41 ± 3% perf-profile.self.cycles-pp.handle_mm_fault 0.44 -0.1 0.38 ± 3% -0.0 0.40 ± 3% perf-profile.self.cycles-pp.xas_load 0.32 -0.0 0.27 -0.0 0.28 ± 2% perf-profile.self.cycles-pp.__page_cache_release 0.34 ± 3% -0.0 0.29 ± 3% -0.0 0.29 ± 2% perf-profile.self.cycles-pp.__alloc_pages_noprof 0.39 -0.0 0.35 ± 3% -0.0 0.36 ± 3% perf-profile.self.cycles-pp.filemap_get_entry 0.20 ± 4% -0.0 0.15 ± 2% -0.0 0.16 ± 3% perf-profile.self.cycles-pp.page_counter_uncharge 0.27 ± 3% -0.0 0.22 ± 2% -0.0 0.23 ± 2% perf-profile.self.cycles-pp.rmqueue 0.29 -0.0 0.25 -0.0 0.27 ± 2% perf-profile.self.cycles-pp.percpu_counter_add_batch 0.27 -0.0 0.23 ± 2% -0.0 0.24 perf-profile.self.cycles-pp.free_unref_folios 0.24 -0.0 0.20 -0.0 0.21 ± 2% perf-profile.self.cycles-pp.folio_remove_rmap_ptes 0.26 -0.0 0.22 ± 4% -0.0 0.23 ± 3% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.30 -0.0 0.26 -0.0 0.27 ± 2% perf-profile.self.cycles-pp.do_user_addr_fault 0.23 ± 3% -0.0 0.20 ± 3% -0.0 0.21 ± 4% perf-profile.self.cycles-pp.__pte_offset_map_lock 0.22 -0.0 0.19 ± 2% -0.0 0.19 ± 2% perf-profile.self.cycles-pp.set_pte_range 0.19 ± 2% -0.0 0.16 ± 4% -0.0 0.16 ± 3% perf-profile.self.cycles-pp.__mod_lruvec_state 0.13 ± 3% -0.0 0.10 ± 3% -0.0 0.11 perf-profile.self.cycles-pp.__mem_cgroup_charge 0.25 -0.0 0.22 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.error_entry 0.23 ± 2% -0.0 0.20 ± 2% -0.0 0.21 perf-profile.self.cycles-pp.do_fault 0.21 ± 2% -0.0 0.19 ± 2% -0.0 0.19 perf-profile.self.cycles-pp.folio_add_new_anon_rmap 0.19 ± 2% -0.0 0.16 ± 2% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.folio_add_lru_vma 0.18 -0.0 0.15 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.free_unref_page_commit 0.15 ± 2% -0.0 0.13 ± 3% -0.0 0.13 perf-profile.self.cycles-pp.uncharge_folio 0.12 ± 3% -0.0 0.10 -0.0 0.10 ± 4% perf-profile.self.cycles-pp.perf_exclude_event 0.19 ± 2% -0.0 0.17 ± 3% -0.0 0.18 ± 6% perf-profile.self.cycles-pp.get_vma_policy 0.24 -0.0 0.22 -0.0 0.22 ± 3% perf-profile.self.cycles-pp.try_charge_memcg 0.14 ± 2% -0.0 0.12 -0.0 0.12 ± 3% perf-profile.self.cycles-pp.__rmqueue_pcplist 0.11 ± 3% -0.0 0.09 ± 5% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.__mod_zone_page_state 0.11 ± 3% -0.0 0.09 -0.0 0.10 ± 4% perf-profile.self.cycles-pp.page_counter_try_charge 0.15 ± 2% -0.0 0.13 ± 3% -0.0 0.13 ± 2% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.17 ± 4% -0.0 0.15 -0.0 0.15 ± 3% perf-profile.self.cycles-pp.lock_vma_under_rcu 0.15 ± 2% -0.0 0.13 -0.0 0.14 ± 3% perf-profile.self.cycles-pp.folio_put 0.18 -0.0 0.16 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.down_read_trylock 0.21 ± 3% -0.0 0.19 ± 4% -0.0 0.18 ± 4% perf-profile.self.cycles-pp.finish_fault 0.17 ± 2% -0.0 0.15 ± 3% -0.0 0.15 perf-profile.self.cycles-pp.__perf_sw_event 0.19 ± 2% -0.0 0.17 ± 2% -0.0 0.18 perf-profile.self.cycles-pp.asm_exc_page_fault 0.16 ± 2% -0.0 0.14 ± 2% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.folio_unlock 0.22 ± 3% -0.0 0.20 ± 4% -0.0 0.20 ± 2% perf-profile.self.cycles-pp.__pte_offset_map 0.16 ± 2% -0.0 0.15 ± 5% -0.0 0.15 ± 2% perf-profile.self.cycles-pp.shmem_fault 0.17 ± 2% -0.0 0.15 ± 3% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.up_read 0.10 -0.0 0.08 ± 4% -0.0 0.09 ± 5% perf-profile.self.cycles-pp.get_pfnblock_flags_mask 0.11 -0.0 0.09 ± 5% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.perf_swevent_event 0.10 ± 3% -0.0 0.09 ± 5% -0.0 0.09 ± 6% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode 0.11 -0.0 0.09 ± 5% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.zap_pte_range 0.10 ± 4% -0.0 0.09 ± 4% -0.0 0.09 ± 5% perf-profile.self.cycles-pp.pte_offset_map_nolock 0.10 ± 4% -0.0 0.08 ± 4% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.__do_fault 0.12 ± 3% -0.0 0.10 ± 3% -0.0 0.10 perf-profile.self.cycles-pp.exc_page_fault 0.12 ± 3% -0.0 0.11 ± 4% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.alloc_pages_mpol_noprof 0.12 ± 3% -0.0 0.10 ± 3% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.access_error 0.09 ± 5% -0.0 0.08 -0.0 0.08 ± 5% perf-profile.self.cycles-pp.policy_nodemask 0.12 ± 4% -0.0 0.10 ± 3% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.vma_alloc_folio_noprof 0.09 -0.0 0.08 ± 5% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.xas_start 0.10 -0.0 0.09 -0.0 0.09 perf-profile.self.cycles-pp.folio_prealloc 0.09 -0.0 0.08 -0.0 0.08 perf-profile.self.cycles-pp.__cond_resched 0.06 -0.0 0.05 -0.0 0.05 perf-profile.self.cycles-pp.vm_normal_page 0.38 ± 2% +0.1 0.44 +0.1 0.44 ± 3% perf-profile.self.cycles-pp.folio_batch_move_lru 0.18 ± 2% +0.2 0.34 ± 4% +0.1 0.29 ± 4% perf-profile.self.cycles-pp.mem_cgroup_update_lru_size 40.64 +8.4 49.08 +6.7 47.38 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath [3] ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 1727628 ± 22% -24.1% 1310525 ± 7% -5.3% 1636459 ± 30% sched_debug.cpu.avg_idle.max 6058 ± 41% -47.9% 3156 ± 43% +1.0% 6121 ± 61% sched_debug.cpu.max_idle_balance_cost.stddev 35617 ± 5% -9.1% 32375 ± 21% -26.2% 26270 ± 25% numa-vmstat.node0.nr_slab_reclaimable 4024866 +3.4% 4163009 ± 7% +8.7% 4374953 ± 7% numa-vmstat.node1.nr_file_pages 19132 ± 10% +17.3% 22446 ± 30% +49.4% 28587 ± 23% numa-vmstat.node1.nr_slab_reclaimable 17488267 -5.6% 16505101 -6.5% 16346741 will-it-scale.224.processes 78072 -5.6% 73683 -6.5% 72975 will-it-scale.per_process_ops 17488267 -5.6% 16505101 -6.5% 16346741 will-it-scale.workload 142458 ± 5% -9.1% 129506 ± 21% -26.2% 105066 ± 25% numa-meminfo.node0.KReclaimable 142458 ± 5% -9.1% 129506 ± 21% -26.2% 105066 ± 25% numa-meminfo.node0.SReclaimable 16107004 +3.3% 16635393 ± 7% +8.6% 17491995 ± 7% numa-meminfo.node1.FilePages 76509 ± 10% +17.4% 89791 ± 30% +49.4% 114321 ± 23% numa-meminfo.node1.KReclaimable 76509 ± 10% +17.4% 89791 ± 30% +49.4% 114321 ± 23% numa-meminfo.node1.SReclaimable 5.296e+09 -5.6% 4.998e+09 -6.5% 4.949e+09 proc-vmstat.numa_hit 5.291e+09 -5.6% 4.995e+09 -6.5% 4.947e+09 proc-vmstat.numa_local 5.285e+09 -5.6% 4.989e+09 -6.5% 4.941e+09 proc-vmstat.pgalloc_normal 5.264e+09 -5.6% 4.969e+09 -6.5% 4.921e+09 proc-vmstat.pgfault 5.283e+09 -5.6% 4.989e+09 -6.5% 4.941e+09 proc-vmstat.pgfree 20.16 -2.9% 19.58 -3.3% 19.50 perf-stat.i.MPKI 2.501e+10 -2.4% 2.44e+10 -2.9% 2.428e+10 perf-stat.i.branch-instructions 18042153 -2.8% 17539874 -3.8% 17362741 perf-stat.i.branch-misses 2.382e+09 -5.6% 2.249e+09 -6.5% 2.228e+09 perf-stat.i.cache-misses 2.561e+09 -5.3% 2.424e+09 -6.5% 2.394e+09 perf-stat.i.cache-references 5.49 +2.8% 5.64 +3.3% 5.67 perf-stat.i.cpi 274.25 +5.4% 289.07 +6.4% 291.86 perf-stat.i.cycles-between-cache-misses 1.177e+11 -2.7% 1.145e+11 -3.2% 1.139e+11 perf-stat.i.instructions 0.19 -2.7% 0.18 -3.2% 0.18 perf-stat.i.ipc 155.11 -5.5% 146.59 -6.5% 145.09 perf-stat.i.metric.K/sec 17405977 -5.5% 16441964 -6.5% 16274188 perf-stat.i.minor-faults 17405978 -5.5% 16441964 -6.5% 16274188 perf-stat.i.page-faults 4.41 ± 50% +28.5% 5.66 +29.1% 5.69 perf-stat.overall.cpi 217.50 ± 50% +32.4% 287.87 +33.6% 290.48 perf-stat.overall.cycles-between-cache-misses 1623235 ± 50% +29.0% 2093187 +29.6% 2103156 perf-stat.overall.path-length 5.48 -0.4 5.11 -0.4 5.11 perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 57.55 -0.3 57.20 -0.1 57.48 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 56.14 -0.2 55.90 +0.0 56.16 perf-profile.calltrace.cycles-pp.testcase 1.86 -0.2 1.71 -0.1 1.73 perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.77 -0.1 1.63 -0.1 1.64 perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault 1.17 -0.1 1.10 -0.1 1.08 perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 51.87 -0.0 51.82 +0.2 52.11 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.96 -0.0 0.91 -0.1 0.91 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase 0.71 -0.0 0.67 -0.0 0.66 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault 0.60 -0.0 0.57 -0.0 0.56 perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault 51.39 -0.0 51.37 +0.3 51.67 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 51.03 +0.0 51.03 +0.3 51.33 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 4.86 +0.0 4.91 +0.0 4.90 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap 4.87 +0.0 4.91 +0.0 4.90 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 4.86 +0.0 4.91 +0.0 4.90 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap 4.85 +0.0 4.90 +0.0 4.88 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region 4.77 +0.1 4.82 +0.0 4.81 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu 37.74 +0.3 38.01 -0.0 37.74 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 37.74 +0.3 38.01 -0.0 37.74 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 37.74 +0.3 38.01 -0.0 37.74 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap 37.73 +0.3 38.01 +0.0 37.74 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 37.27 +0.3 37.57 +0.0 37.30 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range 37.28 +0.3 37.58 +0.0 37.31 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 37.28 +0.3 37.58 +0.0 37.31 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range 37.15 +0.3 37.46 +0.0 37.20 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range 42.65 +0.3 42.97 +0.0 42.68 perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 42.65 +0.3 42.97 +0.0 42.68 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 42.65 +0.3 42.97 +0.0 42.68 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 42.65 +0.3 42.97 +0.0 42.68 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 42.65 +0.3 42.97 +0.0 42.68 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 36.72 +0.3 37.04 +0.1 36.79 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu 42.65 +0.3 42.97 +0.0 42.69 perf-profile.calltrace.cycles-pp.__munmap 42.65 +0.3 42.97 +0.0 42.69 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 42.65 +0.3 42.97 +0.0 42.69 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 41.26 +0.4 41.63 +0.1 41.38 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache 41.26 +0.4 41.64 +0.1 41.38 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages 41.23 +0.4 41.61 +0.1 41.36 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs 43.64 +0.5 44.12 +0.8 44.42 perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 41.57 +0.6 42.22 +0.9 42.50 perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 40.93 +0.7 41.59 +1.0 41.90 perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault 40.84 +0.7 41.50 +1.0 41.81 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault 40.19 +0.7 40.89 +1.0 41.19 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range 40.19 +0.7 40.89 +1.0 41.19 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 40.16 +0.7 40.87 +1.0 41.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma 5.49 -0.4 5.12 -0.4 5.12 perf-profile.children.cycles-pp.copy_page 57.05 -0.3 56.75 -0.0 57.02 perf-profile.children.cycles-pp.testcase 55.66 -0.2 55.41 +0.0 55.70 perf-profile.children.cycles-pp.asm_exc_page_fault 1.88 -0.2 1.73 -0.1 1.75 perf-profile.children.cycles-pp.__pte_offset_map_lock 1.79 -0.1 1.64 -0.1 1.66 perf-profile.children.cycles-pp._raw_spin_lock 1.19 -0.1 1.11 -0.1 1.10 perf-profile.children.cycles-pp.folio_prealloc 0.96 -0.1 0.91 -0.1 0.91 perf-profile.children.cycles-pp.sync_regs 51.89 -0.0 51.84 +0.2 52.13 perf-profile.children.cycles-pp.handle_mm_fault 0.73 -0.0 0.68 -0.0 0.68 perf-profile.children.cycles-pp.vma_alloc_folio_noprof 1.02 -0.0 0.98 -0.1 0.96 perf-profile.children.cycles-pp.native_irq_return_iret 0.63 -0.0 0.59 -0.0 0.59 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof 0.55 -0.0 0.51 -0.0 0.51 perf-profile.children.cycles-pp.__alloc_pages_noprof 0.51 -0.0 0.48 -0.0 0.48 perf-profile.children.cycles-pp.__do_fault 0.46 -0.0 0.43 -0.0 0.44 perf-profile.children.cycles-pp.shmem_fault 0.41 -0.0 0.39 -0.0 0.38 perf-profile.children.cycles-pp.get_page_from_freelist 0.51 -0.0 0.48 -0.0 0.50 perf-profile.children.cycles-pp.shmem_get_folio_gfp 0.36 -0.0 0.34 -0.0 0.34 perf-profile.children.cycles-pp.___perf_sw_event 0.42 -0.0 0.39 -0.0 0.39 perf-profile.children.cycles-pp.__perf_sw_event 0.42 -0.0 0.40 -0.0 0.40 perf-profile.children.cycles-pp.zap_present_ptes 0.26 -0.0 0.24 -0.0 0.24 perf-profile.children.cycles-pp.__mod_lruvec_state 0.38 -0.0 0.36 -0.0 0.36 perf-profile.children.cycles-pp.lru_add_fn 0.25 ± 2% -0.0 0.23 -0.0 0.24 perf-profile.children.cycles-pp.filemap_get_entry 0.21 ± 2% -0.0 0.20 ± 2% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.__mod_node_page_state 0.21 -0.0 0.19 ± 2% -0.0 0.20 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 51.40 -0.0 51.39 +0.3 51.68 perf-profile.children.cycles-pp.__handle_mm_fault 0.23 ± 2% -0.0 0.21 -0.0 0.21 ± 2% perf-profile.children.cycles-pp.rmqueue 0.39 -0.0 0.38 -0.0 0.36 ± 2% perf-profile.children.cycles-pp.__mem_cgroup_charge 0.16 ± 2% -0.0 0.15 ± 2% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios 0.11 -0.0 0.10 -0.0 0.10 ± 5% perf-profile.children.cycles-pp._compound_head 0.17 ± 2% -0.0 0.16 ± 2% -0.0 0.16 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.16 -0.0 0.15 -0.0 0.15 ± 2% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.27 -0.0 0.26 -0.0 0.26 perf-profile.children.cycles-pp.lock_vma_under_rcu 0.11 -0.0 0.10 -0.0 0.10 perf-profile.children.cycles-pp.update_process_times 0.09 -0.0 0.08 -0.0 0.08 perf-profile.children.cycles-pp.scheduler_tick 0.06 -0.0 0.05 -0.0 0.05 perf-profile.children.cycles-pp.task_tick_fair 0.12 -0.0 0.11 -0.0 0.11 ± 3% perf-profile.children.cycles-pp.tick_nohz_handler 0.15 -0.0 0.14 -0.0 0.14 perf-profile.children.cycles-pp.hrtimer_interrupt 0.11 ± 4% -0.0 0.10 -0.0 0.09 ± 5% perf-profile.children.cycles-pp.uncharge_batch 51.07 -0.0 51.06 +0.3 51.36 perf-profile.children.cycles-pp.do_fault 0.08 -0.0 0.08 ± 6% -0.0 0.07 perf-profile.children.cycles-pp.page_counter_uncharge 0.06 -0.0 0.06 ± 6% +0.0 0.07 perf-profile.children.cycles-pp.mem_cgroup_update_lru_size 0.15 ± 2% +0.0 0.16 ± 6% +0.0 0.17 ± 4% perf-profile.children.cycles-pp.generic_perform_write 0.07 +0.0 0.08 +0.0 0.08 ± 4% perf-profile.children.cycles-pp.folio_add_lru 0.09 ± 4% +0.0 0.10 ± 3% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.shmem_write_begin 4.88 +0.0 4.93 +0.0 4.91 perf-profile.children.cycles-pp.tlb_finish_mmu 37.74 +0.3 38.01 -0.0 37.74 perf-profile.children.cycles-pp.unmap_page_range 37.74 +0.3 38.01 -0.0 37.74 perf-profile.children.cycles-pp.unmap_vmas 37.74 +0.3 38.01 -0.0 37.74 perf-profile.children.cycles-pp.zap_pmd_range 37.74 +0.3 38.01 -0.0 37.74 perf-profile.children.cycles-pp.zap_pte_range 37.28 +0.3 37.58 +0.0 37.31 perf-profile.children.cycles-pp.tlb_flush_mmu 42.65 +0.3 42.97 +0.0 42.68 perf-profile.children.cycles-pp.__x64_sys_munmap 42.65 +0.3 42.97 +0.0 42.68 perf-profile.children.cycles-pp.__vm_munmap 42.65 +0.3 42.97 +0.0 42.69 perf-profile.children.cycles-pp.__munmap 42.65 +0.3 42.98 +0.0 42.69 perf-profile.children.cycles-pp.do_vmi_align_munmap 42.65 +0.3 42.98 +0.0 42.69 perf-profile.children.cycles-pp.do_vmi_munmap 42.86 +0.3 43.18 +0.1 42.91 perf-profile.children.cycles-pp.do_syscall_64 42.65 +0.3 42.97 +0.0 42.69 perf-profile.children.cycles-pp.unmap_region 42.86 +0.3 43.19 +0.1 42.91 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 42.15 +0.3 42.50 +0.1 42.22 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages 42.12 +0.3 42.46 +0.1 42.19 perf-profile.children.cycles-pp.folios_put_refs 42.15 +0.3 42.50 +0.1 42.22 perf-profile.children.cycles-pp.free_pages_and_swap_cache 41.51 +0.4 41.89 +0.1 41.63 perf-profile.children.cycles-pp.__page_cache_release 43.66 +0.5 44.15 +0.8 44.45 perf-profile.children.cycles-pp.finish_fault 41.59 +0.6 42.24 +0.9 42.52 perf-profile.children.cycles-pp.set_pte_range 40.94 +0.7 41.59 +1.0 41.90 perf-profile.children.cycles-pp.folio_add_lru_vma 40.99 +0.7 41.66 +1.0 41.97 perf-profile.children.cycles-pp.folio_batch_move_lru 81.57 +1.1 82.65 +1.1 82.68 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 81.59 +1.1 82.68 +1.1 82.72 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 81.60 +1.1 82.68 +1.1 82.72 perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 5.47 -0.4 5.10 -0.4 5.11 perf-profile.self.cycles-pp.copy_page 1.77 -0.1 1.63 -0.1 1.64 perf-profile.self.cycles-pp._raw_spin_lock 2.19 -0.1 2.07 -0.1 2.06 perf-profile.self.cycles-pp.testcase 0.96 -0.0 0.91 -0.1 0.90 perf-profile.self.cycles-pp.sync_regs 1.02 -0.0 0.98 -0.1 0.96 perf-profile.self.cycles-pp.native_irq_return_iret 0.28 -0.0 0.26 -0.0 0.26 ± 2% perf-profile.self.cycles-pp.___perf_sw_event 0.19 ± 2% -0.0 0.17 -0.0 0.17 ± 2% perf-profile.self.cycles-pp.get_page_from_freelist 0.20 -0.0 0.19 ± 2% -0.0 0.19 ± 2% perf-profile.self.cycles-pp.__mod_node_page_state 0.12 ± 4% -0.0 0.10 -0.0 0.11 ± 4% perf-profile.self.cycles-pp.filemap_get_entry 0.11 ± 3% -0.0 0.10 -0.0 0.10 ± 3% perf-profile.self.cycles-pp.free_pages_and_swap_cache 0.21 -0.0 0.20 -0.0 0.20 ± 2% perf-profile.self.cycles-pp.folios_put_refs 0.16 -0.0 0.15 -0.0 0.15 ± 3% perf-profile.self.cycles-pp.mas_walk 0.09 -0.0 0.08 -0.0 0.08 perf-profile.self.cycles-pp.folio_add_new_anon_rmap 0.06 -0.0 0.05 -0.0 0.05 ± 8% perf-profile.self.cycles-pp.down_read_trylock 0.18 -0.0 0.17 ± 2% -0.0 0.17 perf-profile.self.cycles-pp.lru_add_fn 0.09 ± 4% -0.0 0.09 ± 4% -0.0 0.08 perf-profile.self.cycles-pp._compound_head 81.57 +1.1 82.65 +1.1 82.68 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath [4] ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-csl-d02/page_fault2/will-it-scale 59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 13383 -14.7% 11416 -10.2% 12023 perf-c2c.DRAM.local 878.00 ± 4% +39.1% 1221 ± 6% +11.3% 977.00 ± 4% perf-c2c.HITM.local 0.54 ± 3% -0.1 0.43 ± 2% -0.1 0.47 ± 2% mpstat.cpu.all.irq% 0.04 ± 6% -0.0 0.03 +0.0 0.04 ± 11% mpstat.cpu.all.soft% 8.44 ± 2% -1.1 7.32 -0.9 7.53 mpstat.cpu.all.usr% 59743 ± 11% -22.9% 46054 ± 7% -15.0% 50754 ± 8% sched_debug.cfs_rq:/.avg_vruntime.stddev 59744 ± 11% -22.9% 46054 ± 7% -15.0% 50754 ± 8% sched_debug.cfs_rq:/.min_vruntime.stddev 3843 ± 4% -28.8% 2737 ± 8% -14.2% 3296 ± 10% sched_debug.cpu.nr_switches.min 6749425 -19.4% 5441878 -12.1% 5929733 will-it-scale.36.processes 187483 -19.4% 151162 -12.1% 164714 will-it-scale.per_process_ops 6749425 -19.4% 5441878 -12.1% 5929733 will-it-scale.workload 734606 -2.1% 718878 -1.8% 721386 proc-vmstat.nr_anon_pages 9660 -4.0% 9278 -2.9% 9383 proc-vmstat.nr_mapped 2999 +3.2% 3095 +2.3% 3069 proc-vmstat.nr_page_table_pages 2.043e+09 -19.3% 1.649e+09 -12.0% 1.799e+09 proc-vmstat.numa_hit 2.049e+09 -19.3% 1.653e+09 -12.0% 1.803e+09 proc-vmstat.numa_local 2.036e+09 -19.2% 1.644e+09 -12.0% 1.791e+09 proc-vmstat.pgalloc_normal 2.029e+09 -19.3% 1.639e+09 -12.0% 1.785e+09 proc-vmstat.pgfault 2.035e+09 -19.2% 1.644e+09 -12.0% 1.791e+09 proc-vmstat.pgfree 21123 ± 2% +3.4% 21833 +3.9% 21942 proc-vmstat.pgreuse 17.45 -8.6% 15.96 -6.0% 16.41 perf-stat.i.MPKI 6.199e+09 -10.2% 5.567e+09 -5.5% 5.856e+09 perf-stat.i.branch-instructions 0.26 -0.0 0.25 -0.0 0.25 perf-stat.i.branch-miss-rate% 16660671 -10.6% 14902193 -7.3% 15444974 perf-stat.i.branch-misses 87.85 -2.9 84.90 -2.8 85.02 perf-stat.i.cache-miss-rate% 5.476e+08 -19.5% 4.407e+08 -12.3% 4.805e+08 perf-stat.i.cache-misses 6.227e+08 -16.7% 5.186e+08 -9.3% 5.647e+08 perf-stat.i.cache-references 4.35 +14.1% 4.96 +7.6% 4.68 perf-stat.i.cpi 61.84 ± 2% -16.2% 51.79 -14.1% 53.13 perf-stat.i.cpu-migrations 251.09 +24.4% 312.35 +14.2% 286.75 perf-stat.i.cycles-between-cache-misses 3.137e+10 -11.8% 2.768e+10 -6.6% 2.931e+10 perf-stat.i.instructions 0.23 -11.7% 0.21 -6.5% 0.22 perf-stat.i.ipc 373.37 -19.3% 301.36 -12.0% 328.39 perf-stat.i.metric.K/sec 6720929 -19.3% 5424836 -12.0% 5911373 perf-stat.i.minor-faults 6720929 -19.3% 5424836 -12.0% 5911373 perf-stat.i.page-faults 17.45 -8.8% 15.92 -6.1% 16.39 perf-stat.overall.MPKI 0.27 -0.0 0.27 -0.0 0.26 perf-stat.overall.branch-miss-rate% 87.94 -3.0 84.96 -2.9 85.08 perf-stat.overall.cache-miss-rate% 4.35 +13.4% 4.93 +7.1% 4.65 perf-stat.overall.cpi 249.03 +24.3% 309.56 +14.0% 283.85 perf-stat.overall.cycles-between-cache-misses 0.23 -11.8% 0.20 -6.6% 0.21 perf-stat.overall.ipc 1400364 +9.4% 1532615 +6.5% 1491568 perf-stat.overall.path-length 6.178e+09 -10.2% 5.548e+09 -5.5% 5.835e+09 perf-stat.ps.branch-instructions 16578081 -10.7% 14811244 -7.4% 15346617 perf-stat.ps.branch-misses 5.458e+08 -19.5% 4.392e+08 -12.3% 4.788e+08 perf-stat.ps.cache-misses 6.206e+08 -16.7% 5.169e+08 -9.3% 5.628e+08 perf-stat.ps.cache-references 61.60 ± 2% -16.3% 51.58 -14.2% 52.85 perf-stat.ps.cpu-migrations 3.127e+10 -11.8% 2.758e+10 -6.6% 2.921e+10 perf-stat.ps.instructions 6698560 -19.3% 5406176 -12.1% 5890997 perf-stat.ps.minor-faults 6698560 -19.3% 5406177 -12.1% 5890998 perf-stat.ps.page-faults 9.451e+12 -11.8% 8.34e+12 -6.4% 8.845e+12 perf-stat.total.instructions 78.09 -11.0 67.12 -7.4 70.68 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 84.87 ± 2% -10.3 74.55 -6.9 77.97 perf-profile.calltrace.cycles-pp.testcase 68.48 ± 2% -9.3 59.13 -6.2 62.28 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 68.26 ± 2% -9.3 58.94 -6.2 62.08 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 65.58 ± 2% -8.7 56.90 -5.7 59.92 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 64.14 ± 2% -8.5 55.61 -5.6 58.59 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 63.24 ± 2% -8.4 54.84 -5.5 57.78 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 40.12 ± 4% -4.1 36.02 -2.9 37.23 perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 15.19 ± 3% -3.5 11.73 -1.9 13.28 perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 9.10 ± 8% -3.1 6.01 ± 2% -1.9 7.16 ± 3% perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault 8.89 ± 8% -3.1 5.83 ± 3% -1.9 6.96 ± 3% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault 10.98 ± 6% -3.0 7.97 ± 2% -1.6 9.38 ± 2% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 7.41 ± 10% -2.9 4.49 ± 4% -1.9 5.50 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range 7.42 ± 10% -2.9 4.51 ± 4% -1.9 5.52 ± 4% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 7.35 ± 10% -2.9 4.44 ± 4% -1.9 5.45 ± 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma 2.14 ± 15% -1.4 0.70 ± 6% -1.2 0.93 ± 3% perf-profile.calltrace.cycles-pp._compound_head.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range 3.15 ± 11% -1.3 1.84 -1.2 1.96 perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 3.60 ± 3% -0.4 3.16 -0.3 3.28 perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 3.88 -0.4 3.46 -0.4 3.50 perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 1.29 -0.4 0.87 -0.4 0.92 perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 3.09 ± 3% -0.4 2.68 -0.3 2.81 perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault 0.96 -0.3 0.62 ± 2% -0.3 0.65 perf-profile.calltrace.cycles-pp.mas_walk.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 3.45 ± 3% -0.3 3.12 -0.2 3.24 perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 3.31 ± 3% -0.3 3.00 -0.2 3.11 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault 3.09 ± 3% -0.3 2.80 -0.2 2.90 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault 2.42 -0.3 2.16 -0.3 2.14 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault 2.72 ± 4% -0.2 2.50 -0.1 2.58 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault 1.55 ± 2% -0.2 1.33 -0.2 1.38 perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase 0.87 -0.2 0.72 -0.1 0.79 perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault 1.39 ± 3% -0.1 1.25 ± 3% -0.1 1.30 ± 2% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault 0.81 -0.1 0.70 ± 2% -0.1 0.73 ± 2% perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.74 -0.1 1.63 -0.1 1.62 perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault 0.85 ± 2% -0.1 0.74 ± 3% -0.1 0.78 perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 0.71 -0.1 0.62 -0.1 0.64 ± 3% perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault 1.01 ± 4% -0.1 0.93 ± 2% -0.1 0.94 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault 0.72 ± 2% -0.1 0.64 ± 3% -0.0 0.67 perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.56 -0.1 1.50 -0.1 1.48 perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault 0.35 ± 81% +0.1 0.44 ± 50% +0.3 0.68 ± 7% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault 0.77 ± 2% +0.1 0.87 ± 2% +0.0 0.80 ± 2% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof 1.47 ± 2% +0.2 1.63 ± 6% +0.4 1.90 ± 2% perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault.__handle_mm_fault 0.62 ± 5% +0.2 0.84 ± 2% +0.1 0.69 ± 2% perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range 0.00 +0.7 0.68 ± 3% +0.4 0.35 ± 70% perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range 1.66 ± 12% +1.2 2.86 +0.8 2.50 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 1.66 ± 12% +1.2 2.86 +0.8 2.49 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap 1.66 ± 12% +1.2 2.86 +0.8 2.49 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap 1.51 ± 15% +1.3 2.80 +0.9 2.41 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region 1.31 ± 18% +1.3 2.64 ± 2% +0.9 2.25 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu 16.10 ± 9% +9.5 25.63 ± 2% +6.4 22.50 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 16.10 ± 9% +9.5 25.63 ± 2% +6.4 22.50 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap 16.10 ± 9% +9.5 25.63 ± 2% +6.4 22.50 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap 16.09 ± 9% +9.5 25.62 ± 2% +6.4 22.49 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 17.82 ± 10% +10.7 28.54 ± 2% +7.2 25.03 perf-profile.calltrace.cycles-pp.__munmap 17.81 ± 10% +10.7 28.53 ± 2% +7.2 25.02 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 17.81 ± 10% +10.7 28.53 ± 2% +7.2 25.02 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 17.81 ± 10% +10.7 28.53 +7.2 25.02 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 17.82 ± 10% +10.7 28.54 ± 2% +7.2 25.03 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 17.82 ± 10% +10.7 28.54 ± 2% +7.2 25.03 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 17.81 ± 10% +10.7 28.53 ± 2% +7.2 25.02 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 17.79 ± 10% +10.7 28.53 ± 2% +7.2 25.02 perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 12.80 ± 15% +10.9 23.68 ± 2% +7.6 20.42 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 12.78 ± 15% +10.9 23.68 ± 2% +7.6 20.41 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range 12.77 ± 15% +10.9 23.67 ± 2% +7.6 20.40 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range 11.49 ± 18% +11.7 23.22 ± 2% +8.3 19.79 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range 10.49 ± 20% +11.9 22.36 ± 2% +8.4 18.90 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu 11.02 ± 22% +13.4 24.43 ± 2% +9.4 20.44 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache 11.03 ± 22% +13.4 24.46 ± 2% +9.4 20.46 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages 10.97 ± 22% +13.4 24.41 ± 2% +9.4 20.40 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs 81.97 ± 2% -10.7 71.28 -7.2 74.78 perf-profile.children.cycles-pp.testcase 74.32 ± 2% -10.3 64.01 -6.9 67.40 perf-profile.children.cycles-pp.asm_exc_page_fault 68.51 ± 2% -9.4 59.15 -6.2 62.30 perf-profile.children.cycles-pp.exc_page_fault 68.29 ± 2% -9.3 58.97 -6.2 62.11 perf-profile.children.cycles-pp.do_user_addr_fault 65.61 ± 2% -8.7 56.92 -5.7 59.95 perf-profile.children.cycles-pp.handle_mm_fault 64.16 ± 2% -8.5 55.63 -5.6 58.60 perf-profile.children.cycles-pp.__handle_mm_fault 63.27 ± 2% -8.4 54.87 -5.5 57.82 perf-profile.children.cycles-pp.do_fault 40.21 ± 4% -4.1 36.11 -2.9 37.33 perf-profile.children.cycles-pp.copy_page 15.21 ± 3% -3.5 11.75 -1.9 13.30 perf-profile.children.cycles-pp.finish_fault 9.10 ± 8% -3.1 6.02 ± 2% -1.9 7.16 ± 3% perf-profile.children.cycles-pp.folio_add_lru_vma 8.91 ± 8% -3.0 5.87 ± 3% -1.9 6.99 ± 3% perf-profile.children.cycles-pp.folio_batch_move_lru 10.99 ± 6% -3.0 7.98 ± 2% -1.6 9.40 ± 2% perf-profile.children.cycles-pp.set_pte_range 2.16 ± 15% -1.4 0.71 ± 6% -1.2 0.94 ± 4% perf-profile.children.cycles-pp._compound_head 3.17 ± 11% -1.3 1.85 -1.2 1.98 perf-profile.children.cycles-pp.zap_present_ptes 3.63 ± 3% -0.5 3.17 -0.3 3.30 perf-profile.children.cycles-pp.__pte_offset_map_lock 3.14 ± 3% -0.4 2.71 -0.3 2.85 perf-profile.children.cycles-pp._raw_spin_lock 1.30 -0.4 0.88 -0.4 0.93 perf-profile.children.cycles-pp.lock_vma_under_rcu 3.90 -0.4 3.49 -0.4 3.53 perf-profile.children.cycles-pp.folio_prealloc 0.97 -0.3 0.62 ± 2% -0.3 0.66 perf-profile.children.cycles-pp.mas_walk 3.46 ± 3% -0.3 3.13 -0.2 3.25 perf-profile.children.cycles-pp.__do_fault 3.31 ± 3% -0.3 3.00 -0.2 3.12 perf-profile.children.cycles-pp.shmem_fault 6.74 ± 4% -0.3 6.44 -0.2 6.53 perf-profile.children.cycles-pp.native_irq_return_iret 3.10 ± 3% -0.3 2.82 -0.2 2.92 perf-profile.children.cycles-pp.shmem_get_folio_gfp 2.43 -0.3 2.17 -0.3 2.15 perf-profile.children.cycles-pp.vma_alloc_folio_noprof 1.60 ± 2% -0.2 1.37 -0.2 1.42 perf-profile.children.cycles-pp.sync_regs 2.73 ± 4% -0.2 2.51 -0.1 2.58 perf-profile.children.cycles-pp.filemap_get_entry 1.66 -0.2 1.44 ± 2% -0.1 1.51 perf-profile.children.cycles-pp.__perf_sw_event 0.64 ± 4% -0.2 0.44 ± 2% -0.1 0.53 perf-profile.children.cycles-pp.free_unref_folios 1.45 -0.2 1.28 ± 2% -0.1 1.33 perf-profile.children.cycles-pp.___perf_sw_event 0.88 -0.2 0.73 -0.1 0.80 perf-profile.children.cycles-pp.lru_add_fn 1.40 ± 3% -0.1 1.26 ± 3% -0.1 1.31 ± 2% perf-profile.children.cycles-pp.__mem_cgroup_charge 1.23 ± 9% -0.1 1.09 ± 8% +0.2 1.41 ± 4% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 1.83 -0.1 1.71 -0.1 1.69 perf-profile.children.cycles-pp.alloc_pages_mpol_noprof 0.58 ± 7% -0.1 0.47 ± 5% -0.1 0.51 ± 3% perf-profile.children.cycles-pp.__count_memcg_events 0.69 ± 3% -0.1 0.59 -0.1 0.63 ± 4% perf-profile.children.cycles-pp.__mod_lruvec_state 0.33 ± 5% -0.1 0.22 ± 2% -0.1 0.27 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios 0.51 ± 5% -0.1 0.42 ± 7% -0.1 0.46 ± 3% perf-profile.children.cycles-pp.mem_cgroup_commit_charge 0.53 -0.1 0.44 ± 3% -0.1 0.43 ± 2% perf-profile.children.cycles-pp.get_vma_policy 1.02 ± 4% -0.1 0.93 ± 3% -0.1 0.94 perf-profile.children.cycles-pp.xas_load 0.58 ± 3% -0.1 0.50 ± 2% -0.0 0.54 ± 5% perf-profile.children.cycles-pp.__mod_node_page_state 0.57 ± 6% -0.1 0.50 ± 3% -0.0 0.53 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.48 ± 7% -0.1 0.40 ± 4% -0.0 0.44 ± 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.23 ± 5% -0.1 0.16 ± 4% -0.0 0.19 perf-profile.children.cycles-pp.free_unref_page_commit 0.43 ± 7% -0.1 0.36 ± 3% -0.0 0.39 ± 2% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.43 ± 6% -0.1 0.36 ± 3% -0.0 0.39 ± 2% perf-profile.children.cycles-pp.hrtimer_interrupt 0.16 ± 4% -0.1 0.10 -0.0 0.12 perf-profile.children.cycles-pp.get_pfnblock_flags_mask 0.15 ± 9% -0.1 0.09 ± 4% -0.0 0.11 ± 6% perf-profile.children.cycles-pp.uncharge_batch 0.30 ± 6% -0.1 0.24 ± 4% -0.0 0.27 ± 3% perf-profile.children.cycles-pp.tick_nohz_handler 0.37 ± 7% -0.1 0.31 ± 3% -0.0 0.34 ± 2% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.27 ± 8% -0.1 0.22 ± 4% -0.0 0.24 ± 3% perf-profile.children.cycles-pp.update_process_times 1.64 -0.1 1.59 -0.1 1.56 perf-profile.children.cycles-pp.__alloc_pages_noprof 0.30 ± 9% -0.0 0.25 ± 28% -0.0 0.25 ± 5% perf-profile.children.cycles-pp.cgroup_rstat_updated 0.16 ± 6% -0.0 0.12 ± 3% -0.0 0.15 ± 4% perf-profile.children.cycles-pp.mem_cgroup_update_lru_size 0.25 -0.0 0.20 -0.0 0.21 ± 2% perf-profile.children.cycles-pp.handle_pte_fault 0.11 ± 11% -0.0 0.07 ± 5% -0.0 0.08 perf-profile.children.cycles-pp.page_counter_uncharge 0.20 ± 5% -0.0 0.16 ± 2% -0.0 0.16 ± 4% perf-profile.children.cycles-pp.__pte_offset_map 0.22 ± 3% -0.0 0.19 ± 4% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.error_entry 0.10 ± 3% -0.0 0.07 ± 7% -0.0 0.07 ± 8% perf-profile.children.cycles-pp.policy_nodemask 0.16 ± 3% -0.0 0.12 ± 3% -0.0 0.13 ± 4% perf-profile.children.cycles-pp.pte_offset_map_nolock 0.15 ± 5% -0.0 0.12 ± 3% -0.0 0.14 ± 4% perf-profile.children.cycles-pp.uncharge_folio 0.22 ± 3% -0.0 0.19 ± 3% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.folio_add_new_anon_rmap 0.17 ± 9% -0.0 0.14 ± 5% -0.0 0.16 ± 3% perf-profile.children.cycles-pp.scheduler_tick 0.18 ± 5% -0.0 0.16 ± 5% -0.0 0.17 ± 2% perf-profile.children.cycles-pp.up_read 0.19 ± 2% -0.0 0.16 ± 7% -0.0 0.15 ± 7% perf-profile.children.cycles-pp.shmem_get_policy 0.14 ± 4% -0.0 0.12 ± 4% -0.0 0.12 ± 6% perf-profile.children.cycles-pp.down_read_trylock 0.13 ± 6% -0.0 0.10 ± 3% -0.0 0.11 perf-profile.children.cycles-pp.folio_put 0.08 ± 11% -0.0 0.06 ± 14% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.free_pcppages_bulk 0.12 ± 6% -0.0 0.09 ± 5% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.task_tick_fair 0.29 ± 3% -0.0 0.27 ± 4% -0.0 0.28 ± 3% perf-profile.children.cycles-pp._raw_spin_trylock 0.12 ± 6% -0.0 0.10 ± 4% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.folio_unlock 0.79 ± 2% +0.1 0.90 ± 2% +0.0 0.83 ± 2% perf-profile.children.cycles-pp.rmqueue 0.24 ± 3% +0.2 0.41 ± 6% +0.1 0.33 ± 4% perf-profile.children.cycles-pp.__rmqueue_pcplist 0.10 ± 5% +0.2 0.29 ± 9% +0.1 0.20 ± 9% perf-profile.children.cycles-pp.rmqueue_bulk 0.62 ± 4% +0.2 0.84 ± 2% +0.1 0.70 ± 3% perf-profile.children.cycles-pp.folio_remove_rmap_ptes 1.89 ± 2% +0.4 2.34 ± 5% +0.5 2.43 ± 2% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio 1.67 ± 12% +1.2 2.87 +0.8 2.50 perf-profile.children.cycles-pp.tlb_finish_mmu 16.10 ± 9% +9.5 25.63 ± 2% +6.4 22.50 perf-profile.children.cycles-pp.unmap_vmas 16.10 ± 9% +9.5 25.63 ± 2% +6.4 22.50 perf-profile.children.cycles-pp.unmap_page_range 16.10 ± 9% +9.5 25.63 ± 2% +6.4 22.50 perf-profile.children.cycles-pp.zap_pmd_range 16.10 ± 9% +9.5 25.63 ± 2% +6.4 22.50 perf-profile.children.cycles-pp.zap_pte_range 18.48 ± 17% +10.5 29.01 +7.5 26.01 perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 17.97 ± 9% +10.7 28.66 +7.2 25.16 perf-profile.children.cycles-pp.do_syscall_64 17.97 ± 9% +10.7 28.66 +7.2 25.16 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 18.49 ± 17% +10.7 29.20 +7.6 26.12 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 17.82 ± 10% +10.7 28.54 ± 2% +7.2 25.03 perf-profile.children.cycles-pp.__munmap 17.81 ± 10% +10.7 28.53 ± 2% +7.2 25.02 perf-profile.children.cycles-pp.__vm_munmap 17.81 ± 10% +10.7 28.53 ± 2% +7.2 25.02 perf-profile.children.cycles-pp.__x64_sys_munmap 17.82 ± 10% +10.7 28.54 ± 2% +7.2 25.03 perf-profile.children.cycles-pp.do_vmi_munmap 17.81 ± 10% +10.7 28.54 ± 2% +7.2 25.03 perf-profile.children.cycles-pp.do_vmi_align_munmap 17.80 ± 10% +10.7 28.53 ± 2% +7.2 25.02 perf-profile.children.cycles-pp.unmap_region 18.38 ± 17% +10.8 29.13 +7.7 26.03 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 12.80 ± 15% +10.9 23.68 ± 2% +7.6 20.42 perf-profile.children.cycles-pp.tlb_flush_mmu 14.44 ± 15% +12.1 26.54 ± 2% +8.5 22.91 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages 14.43 ± 15% +12.1 26.54 ± 2% +8.5 22.90 perf-profile.children.cycles-pp.free_pages_and_swap_cache 13.19 ± 17% +13.0 26.17 ± 2% +9.2 22.36 perf-profile.children.cycles-pp.folios_put_refs 11.81 ± 20% +13.2 25.01 ± 2% +9.4 21.17 perf-profile.children.cycles-pp.__page_cache_release 39.99 ± 4% -4.1 35.92 -2.9 37.12 perf-profile.self.cycles-pp.copy_page 2.14 ± 15% -1.4 0.70 ± 5% -1.2 0.93 ± 4% perf-profile.self.cycles-pp._compound_head 1.39 ± 13% -0.9 0.48 ± 3% -0.7 0.67 ± 4% perf-profile.self.cycles-pp.free_pages_and_swap_cache 4.45 -0.7 3.74 -0.5 3.92 perf-profile.self.cycles-pp.testcase 3.12 ± 3% -0.4 2.69 -0.3 2.83 perf-profile.self.cycles-pp._raw_spin_lock 0.96 -0.3 0.61 ± 2% -0.3 0.64 perf-profile.self.cycles-pp.mas_walk 6.74 ± 4% -0.3 6.44 -0.2 6.53 perf-profile.self.cycles-pp.native_irq_return_iret 1.59 ± 2% -0.2 1.36 -0.2 1.42 perf-profile.self.cycles-pp.sync_regs 1.22 ± 2% -0.2 1.04 ± 2% -0.1 1.10 perf-profile.self.cycles-pp.___perf_sw_event 1.71 ± 4% -0.1 1.57 -0.1 1.64 perf-profile.self.cycles-pp.filemap_get_entry 1.06 ± 10% -0.1 0.94 ± 10% +0.2 1.25 ± 4% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 0.48 ± 9% -0.1 0.38 ± 6% -0.0 0.44 ± 5% perf-profile.self.cycles-pp.__count_memcg_events 0.63 -0.1 0.54 -0.1 0.56 perf-profile.self.cycles-pp.__handle_mm_fault 0.44 -0.1 0.35 -0.1 0.38 ± 2% perf-profile.self.cycles-pp.lru_add_fn 0.29 -0.1 0.21 ± 3% -0.0 0.28 ± 3% perf-profile.self.cycles-pp.__page_cache_release 0.36 ± 3% -0.1 0.28 ± 2% -0.1 0.31 ± 4% perf-profile.self.cycles-pp.get_page_from_freelist 0.57 ± 3% -0.1 0.49 ± 2% -0.0 0.53 ± 5% perf-profile.self.cycles-pp.__mod_node_page_state 0.25 ± 3% -0.1 0.18 ± 2% -0.0 0.22 ± 2% perf-profile.self.cycles-pp.free_unref_folios 0.23 ± 2% -0.1 0.16 ± 2% -0.0 0.18 ± 4% perf-profile.self.cycles-pp.folio_remove_rmap_ptes 0.85 ± 4% -0.1 0.78 ± 2% -0.1 0.80 perf-profile.self.cycles-pp.xas_load 0.28 ± 3% -0.1 0.21 ± 3% -0.0 0.23 ± 3% perf-profile.self.cycles-pp.do_user_addr_fault 0.19 ± 2% -0.1 0.13 ± 8% -0.1 0.13 ± 3% perf-profile.self.cycles-pp.set_pte_range 0.30 ± 2% -0.1 0.24 ± 3% -0.0 0.26 ± 2% perf-profile.self.cycles-pp.zap_present_ptes 0.29 ± 2% -0.1 0.23 ± 6% -0.1 0.24 ± 7% perf-profile.self.cycles-pp.get_vma_policy 0.15 ± 2% -0.1 0.09 -0.1 0.09 ± 4% perf-profile.self.cycles-pp.vma_alloc_folio_noprof 0.16 ± 5% -0.1 0.10 ± 4% -0.0 0.12 perf-profile.self.cycles-pp.get_pfnblock_flags_mask 0.32 ± 5% -0.1 0.26 ± 4% -0.0 0.28 ± 3% perf-profile.self.cycles-pp.__alloc_pages_noprof 0.28 ± 4% -0.1 0.23 ± 6% -0.0 0.23 ± 4% perf-profile.self.cycles-pp.rmqueue 0.26 -0.0 0.21 ± 3% -0.0 0.23 ± 3% perf-profile.self.cycles-pp.asm_exc_page_fault 0.07 ± 5% -0.0 0.02 ±122% -0.0 0.04 ± 45% perf-profile.self.cycles-pp.policy_nodemask 0.19 ± 6% -0.0 0.14 -0.0 0.15 ± 6% perf-profile.self.cycles-pp.lock_vma_under_rcu 0.16 ± 5% -0.0 0.11 ± 3% -0.0 0.14 ± 4% perf-profile.self.cycles-pp.mem_cgroup_update_lru_size 0.32 ± 3% -0.0 0.28 -0.0 0.29 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp 0.15 ± 2% -0.0 0.10 ± 3% -0.0 0.13 ± 3% perf-profile.self.cycles-pp.free_unref_page_commit 0.20 ± 6% -0.0 0.16 ± 5% -0.0 0.18 ± 3% perf-profile.self.cycles-pp.__perf_sw_event 0.19 ± 2% -0.0 0.14 ± 3% -0.0 0.16 ± 3% perf-profile.self.cycles-pp.do_fault 0.12 -0.0 0.08 ± 5% -0.0 0.10 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.27 ± 9% -0.0 0.23 ± 32% -0.0 0.22 ± 6% perf-profile.self.cycles-pp.cgroup_rstat_updated 0.19 ± 5% -0.0 0.15 ± 3% -0.0 0.15 ± 4% perf-profile.self.cycles-pp.__pte_offset_map 0.10 ± 11% -0.0 0.06 ± 10% -0.0 0.07 ± 5% perf-profile.self.cycles-pp.page_counter_uncharge 0.15 ± 6% -0.0 0.12 ± 3% -0.0 0.14 ± 4% perf-profile.self.cycles-pp.uncharge_folio 0.21 ± 4% -0.0 0.17 ± 2% -0.0 0.18 ± 2% perf-profile.self.cycles-pp.error_entry 0.09 ± 4% -0.0 0.06 ± 6% -0.0 0.07 ± 5% perf-profile.self.cycles-pp.alloc_pages_mpol_noprof 0.18 -0.0 0.15 ± 4% -0.0 0.16 ± 2% perf-profile.self.cycles-pp.exc_page_fault 0.22 ± 3% -0.0 0.19 ± 3% -0.0 0.20 ± 2% perf-profile.self.cycles-pp.folio_add_new_anon_rmap 0.22 ± 4% -0.0 0.19 ± 4% -0.0 0.20 ± 3% perf-profile.self.cycles-pp.shmem_fault 0.18 ± 6% -0.0 0.15 ± 3% -0.0 0.16 ± 3% perf-profile.self.cycles-pp.up_read 0.11 ± 4% -0.0 0.09 -0.0 0.10 ± 4% perf-profile.self.cycles-pp.zap_pte_range 0.13 ± 6% -0.0 0.10 ± 3% -0.0 0.11 perf-profile.self.cycles-pp.folio_put 0.29 -0.0 0.26 ± 4% -0.0 0.27 ± 3% perf-profile.self.cycles-pp._raw_spin_trylock 0.14 ± 2% -0.0 0.12 ± 4% -0.0 0.12 ± 6% perf-profile.self.cycles-pp.down_read_trylock 0.11 ± 4% -0.0 0.09 ± 4% -0.0 0.10 ± 5% perf-profile.self.cycles-pp.__mod_lruvec_state 0.09 ± 4% -0.0 0.07 ± 5% -0.0 0.07 perf-profile.self.cycles-pp.pte_offset_map_nolock 0.12 ± 6% -0.0 0.10 ± 4% -0.0 0.10 ± 4% perf-profile.self.cycles-pp.folio_unlock 0.18 ± 4% -0.0 0.16 ± 7% -0.0 0.14 ± 8% perf-profile.self.cycles-pp.shmem_get_policy 0.07 ± 7% -0.0 0.05 ± 7% -0.0 0.06 ± 6% perf-profile.self.cycles-pp.__do_fault 0.08 ± 5% -0.0 0.07 -0.0 0.08 ± 6% perf-profile.self.cycles-pp.handle_pte_fault 0.08 -0.0 0.07 ± 5% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.__mem_cgroup_charge 0.40 -0.0 0.39 ± 4% -0.0 0.38 ± 2% perf-profile.self.cycles-pp.__pte_offset_map_lock 0.38 ± 3% +0.1 0.44 ± 3% +0.1 0.47 ± 2% perf-profile.self.cycles-pp.folio_batch_move_lru 0.39 ± 3% +0.1 0.46 -0.0 0.35 ± 2% perf-profile.self.cycles-pp.folios_put_refs 0.61 ± 13% +0.5 1.15 ± 3% +0.4 0.97 ± 3% perf-profile.self.cycles-pp.__lruvec_stat_mod_folio 18.38 ± 17% +10.8 29.13 +7.7 26.03 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-23 7:48 ` Oliver Sang @ 2024-05-23 16:47 ` Shakeel Butt 2024-05-24 7:45 ` Oliver Sang 0 siblings, 1 reply; 15+ messages in thread From: Shakeel Butt @ 2024-05-23 16:47 UTC (permalink / raw) To: Oliver Sang Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin On Thu, May 23, 2024 at 03:48:40PM +0800, Oliver Sang wrote: > hi, Shakeel, > > On Tue, May 21, 2024 at 09:18:19PM -0700, Shakeel Butt wrote: > > On Tue, May 21, 2024 at 10:43:16AM +0800, Oliver Sang wrote: > > > hi, Shakeel, > > > > > [...] > > > > > > we reported regression on a 2-node Skylake server. so I found a 1-node Skylake > > > desktop (we don't have 1 node server) to check. > > > > > > > Please try the following patch on both single node and dual node > > machines: > > > the regression is partially recovered by applying your patch. > (but one even more regression case as below) > > details: > > since you mentioned the whole patch-set behavior last time, I applied the > patch upon > a94032b35e5f9 memcg: use proper type for mod_memcg_state > > below fd2296741e2686ed6ecd05187e4 = a94032b35e5f9 + patch > Thanks a lot Oliver. I have couple of questions and requests: 1. What is the baseline kernel you are using? Is it linux-next or linus? If linux-next, which one specifically? 2. What is the cgroup hierarchy where the workload is running? Is it running in the root cgroup? 3. For the followup experiments when needed, can you please remove the whole series (including 59142d87ab03b8ff) for the base numbers. 4. My experiment [1] on Cooper Lake (2 node) and Skylake (1 node) shows significant improvement but I noticed that I am directly running page_fault2_processes with -t equal nr_cpus but you are running through runtest.py. Also it seems like lkp has modified runtest.py. I will try to run the same setup as yours to repro. [1] https://lore.kernel.org/all/20240523034824.1255719-1-shakeel.butt@linux.dev thanks, Shakeel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-23 16:47 ` Shakeel Butt @ 2024-05-24 7:45 ` Oliver Sang 2024-05-24 18:06 ` Shakeel Butt 0 siblings, 1 reply; 15+ messages in thread From: Oliver Sang @ 2024-05-24 7:45 UTC (permalink / raw) To: Shakeel Butt Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin, oliver.sang hi, Shakeel, On Thu, May 23, 2024 at 09:47:30AM -0700, Shakeel Butt wrote: > On Thu, May 23, 2024 at 03:48:40PM +0800, Oliver Sang wrote: > > hi, Shakeel, > > > > On Tue, May 21, 2024 at 09:18:19PM -0700, Shakeel Butt wrote: > > > On Tue, May 21, 2024 at 10:43:16AM +0800, Oliver Sang wrote: > > > > hi, Shakeel, > > > > > > > [...] > > > > > > > > we reported regression on a 2-node Skylake server. so I found a 1-node Skylake > > > > desktop (we don't have 1 node server) to check. > > > > > > > > > > Please try the following patch on both single node and dual node > > > machines: > > > > > > the regression is partially recovered by applying your patch. > > (but one even more regression case as below) > > > > details: > > > > since you mentioned the whole patch-set behavior last time, I applied the > > patch upon > > a94032b35e5f9 memcg: use proper type for mod_memcg_state > > > > below fd2296741e2686ed6ecd05187e4 = a94032b35e5f9 + patch > > > > Thanks a lot Oliver. I have couple of questions and requests: you are welcome! > > 1. What is the baseline kernel you are using? Is it linux-next or linus? > If linux-next, which one specifically? base is just 59142d87ab03b, which is in current linux-next/master, and is already merged into linus/master now. linux$ git rev-list linux-next/master | grep 59142d87ab03b 59142d87ab03b8ff969074348f65730d465f42ee linux$ git rev-list linus/master | grep 59142d87ab03b 59142d87ab03b8ff969074348f65730d465f42ee the data for it is the first column in the tables we supplied. I just applied your patch upon a94032b35e5f9, so: linux$ git log --oneline --graph fd2296741e2686ed6ecd05187e4 * fd2296741e268 fix for 70a64b7919 from Shakeel <----- your fix patch * a94032b35e5f9 memcg: use proper type for mod_memcg_state <--- patch-set tip, I believe * acb5fe2f1aff0 memcg: warn for unexpected events and stats * 4715c6a753dcc mm: cleanup WORKINGSET_NODES in workingset * 0667c7870a186 memcg: cleanup __mod_memcg_lruvec_state * ff48c71c26aae memcg: reduce memory for the lruvec and memcg stats * aab6103b97f1c mm: memcg: account memory used for memcg vmstats and lruvec stats * 70a64b7919cbd memcg: dynamically allocate lruvec_stats <--- we reported this as 'fbc' in original report * 59142d87ab03b memcg: reduce memory size of mem_cgroup_events_index <--- base > > 2. What is the cgroup hierarchy where the workload is running? Is it > running in the root cgroup? Our test system uses systemd from the distribution (debian-12). The workload is automatically assigned to a specific cgroup by systemd which is in the sub-hierarchy of root, so it is not directly running in the root cgroup. > > 3. For the followup experiments when needed, can you please remove the > whole series (including 59142d87ab03b8ff) for the base numbers. I cannot understand this very well, if the patch is to fix the regression cause by this series, seems to me the best way is to apply this patch on top of the series. anything I misunderstood here? anyway, I could do that, do you mean such like v6.9, which doesn't include this serial yet? I could use it as base, then apply your patch onto it. then check the diff between v6.9 and v6.9+patch. but I still have some concern that, what a big improvement show in this test cannot guarantee there will be same improvement if comparing the series and the series+patch > > 4. My experiment [1] on Cooper Lake (2 node) and Skylake (1 node) shows > significant improvement but I noticed that I am directly running > page_fault2_processes with -t equal nr_cpus but you are running through > runtest.py. Also it seems like lkp has modified runtest.py. I will try > to run the same setup as yours to repro. > > > [1] https://lore.kernel.org/all/20240523034824.1255719-1-shakeel.butt@linux.dev > > thanks, > Shakeel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-24 7:45 ` Oliver Sang @ 2024-05-24 18:06 ` Shakeel Butt 2024-05-28 6:30 ` Shakeel Butt 0 siblings, 1 reply; 15+ messages in thread From: Shakeel Butt @ 2024-05-24 18:06 UTC (permalink / raw) To: Oliver Sang Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin On Fri, May 24, 2024 at 03:45:54PM +0800, Oliver Sang wrote: > hi, Shakeel, > [...] > > > > > 1. What is the baseline kernel you are using? Is it linux-next or linus? > > If linux-next, which one specifically? > > base is just 59142d87ab03b, which is in current linux-next/master, > and is already merged into linus/master now. > > linux$ git rev-list linux-next/master | grep 59142d87ab03b > 59142d87ab03b8ff969074348f65730d465f42ee > > linux$ git rev-list linus/master | grep 59142d87ab03b > 59142d87ab03b8ff969074348f65730d465f42ee > > > the data for it is the first column in the tables we supplied. > > I just applied your patch upon a94032b35e5f9, so: > > linux$ git log --oneline --graph fd2296741e2686ed6ecd05187e4 > * fd2296741e268 fix for 70a64b7919 from Shakeel <----- your fix patch > * a94032b35e5f9 memcg: use proper type for mod_memcg_state <--- patch-set tip, I believe > * acb5fe2f1aff0 memcg: warn for unexpected events and stats > * 4715c6a753dcc mm: cleanup WORKINGSET_NODES in workingset > * 0667c7870a186 memcg: cleanup __mod_memcg_lruvec_state > * ff48c71c26aae memcg: reduce memory for the lruvec and memcg stats > * aab6103b97f1c mm: memcg: account memory used for memcg vmstats and lruvec stats > * 70a64b7919cbd memcg: dynamically allocate lruvec_stats <--- we reported this as 'fbc' in original report > * 59142d87ab03b memcg: reduce memory size of mem_cgroup_events_index <--- base > Cool, let's stick to the linus tree. I was actually taking next-20240521 and reverting all the patches in the series to treat as the base. One request I have would be to make the base the patch previous to the 59142d87ab03b i.e. not 59142d87ab03b. > > > > > 2. What is the cgroup hierarchy where the workload is running? Is it > > running in the root cgroup? > > Our test system uses systemd from the distribution (debian-12). The workload is > automatically assigned to a specific cgroup by systemd which is in the > sub-hierarchy of root, so it is not directly running in the root cgroup. > > > > > 3. For the followup experiments when needed, can you please remove the > > whole series (including 59142d87ab03b8ff) for the base numbers. > > I cannot understand this very well, if the patch is to fix the regression > cause by this series, seems to me the best way is to apply this patch on top > of the series. anything I misunderstood here? > Sorry I just meant to make the 'base' case to compare against the commit previous to 59142d87ab03b as I said above. I will re-run my experiments on linus tree and report back. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-24 18:06 ` Shakeel Butt @ 2024-05-28 6:30 ` Shakeel Butt 2024-05-30 6:17 ` Oliver Sang 0 siblings, 1 reply; 15+ messages in thread From: Shakeel Butt @ 2024-05-28 6:30 UTC (permalink / raw) To: Oliver Sang Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin On Fri, May 24, 2024 at 11:06:54AM GMT, Shakeel Butt wrote: > On Fri, May 24, 2024 at 03:45:54PM +0800, Oliver Sang wrote: [...] > I will re-run my experiments on linus tree and report back. I am not able to reproduce the regression with the fix I have proposed, at least on my 1 node 52 CPUs (Cooper Lake) and 2 node 80 CPUs (Skylake) machines. Let me give more details below: Setup instructions: ------------------- mount -t tmpfs tmpfs /tmp mkdir -p /sys/fs/cgroup/A mkdir -p /sys/fs/cgroup/A/B mkdir -p /sys/fs/cgroup/A/B/C echo +memory > /sys/fs/cgroup/A/cgroup.subtree_control echo +memory > /sys/fs/cgroup/A/B/cgroup.subtree_control echo $$ > /sys/fs/cgroup/A/B/C/cgroup.procs The base case (commit a4c43b8a0980): ------------------------------------ $ python3 ./runtest.py page_fault2 295 process 0 0 52 tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 52,2796769,0.03,0,0.00,0 $ python3 ./runtest.py page_fault2 295 process 0 0 80 tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 80,6755010,0.04,0,0.00,0 The regressing series (last commit a94032b35e5f) ------------------------------------------------ $ python3 ./runtest.py page_fault2 295 process 0 0 52 tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 52,2684859,0.03,0,0.00,0 $ python3 ./runtest.py page_fault2 295 process 0 0 80 tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 80,6010438,0.13,0,0.00,0 The fix on top of regressing series: ------------------------------------ $ python3 ./runtest.py page_fault2 295 process 0 0 52 tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 52,3812133,0.02,0,0.00,0 $ python3 ./runtest.py page_fault2 295 process 0 0 80 tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 80,7979893,0.15,0,0.00,0 As you can see, the fix is improving the performance over the base, at least for me. I can only speculate that either the difference of hardware is giving us different results (you have newer CPUs) or there is still disparity of experiment setup/environment between us. Are you disabling hyperthreading? Is the prefetching heuristics different on your systems? Regarding test environment, can you check my setup instructions above and see if I am doing something wrong or different? At the moment, I am inclined towards asking Andrew to include my fix in following 6.10-rc* but keep this report open, so we continue to improve. Let me know if you have concerns. thanks, Shakeel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression 2024-05-28 6:30 ` Shakeel Butt @ 2024-05-30 6:17 ` Oliver Sang 0 siblings, 0 replies; 15+ messages in thread From: Oliver Sang @ 2024-05-30 6:17 UTC (permalink / raw) To: Shakeel Butt Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner, Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang, fengwei.yin, oliver.sang hi, Shakeel, On Mon, May 27, 2024 at 11:30:38PM -0700, Shakeel Butt wrote: > On Fri, May 24, 2024 at 11:06:54AM GMT, Shakeel Butt wrote: > > On Fri, May 24, 2024 at 03:45:54PM +0800, Oliver Sang wrote: > [...] > > I will re-run my experiments on linus tree and report back. > > I am not able to reproduce the regression with the fix I have proposed, > at least on my 1 node 52 CPUs (Cooper Lake) and 2 node 80 CPUs (Skylake) > machines. Let me give more details below: > > Setup instructions: > ------------------- > mount -t tmpfs tmpfs /tmp > mkdir -p /sys/fs/cgroup/A > mkdir -p /sys/fs/cgroup/A/B > mkdir -p /sys/fs/cgroup/A/B/C > echo +memory > /sys/fs/cgroup/A/cgroup.subtree_control > echo +memory > /sys/fs/cgroup/A/B/cgroup.subtree_control > echo $$ > /sys/fs/cgroup/A/B/C/cgroup.procs > > The base case (commit a4c43b8a0980): > ------------------------------------ > $ python3 ./runtest.py page_fault2 295 process 0 0 52 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 52,2796769,0.03,0,0.00,0 > > $ python3 ./runtest.py page_fault2 295 process 0 0 80 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 80,6755010,0.04,0,0.00,0 > > > The regressing series (last commit a94032b35e5f) > ------------------------------------------------ > $ python3 ./runtest.py page_fault2 295 process 0 0 52 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 52,2684859,0.03,0,0.00,0 > > $ python3 ./runtest.py page_fault2 295 process 0 0 80 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 80,6010438,0.13,0,0.00,0 > > The fix on top of regressing series: > ------------------------------------ > $ python3 ./runtest.py page_fault2 295 process 0 0 52 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 52,3812133,0.02,0,0.00,0 > > $ python3 ./runtest.py page_fault2 295 process 0 0 80 > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 80,7979893,0.15,0,0.00,0 > > > As you can see, the fix is improving the performance over the base, at > least for me. I can only speculate that either the difference of > hardware is giving us different results (you have newer CPUs) or there > is still disparity of experiment setup/environment between us. > > Are you disabling hyperthreading? Is the prefetching heuristics > different on your systems? we don't disable hyperthreading. for prefetching, we don't change bios default setting. for the skl server in our original report: MLC Spatial Prefetcher - enabled DCU Data Prefetcher - enabled DCU Instruction Prefetcher - enabled LLC Prefetch - disabled but we don't uniform these setting for all our servers. such like for that Ice Lake server mentioned in previous mail, the "LLC Prefetch" is default to be enabled, so we keep it as enabled. > > Regarding test environment, can you check my setup instructions above > and see if I am doing something wrong or different? > > At the moment, I am inclined towards asking Andrew to include my fix in > following 6.10-rc* but keep this report open, so we continue to improve. > Let me know if you have concerns. yeah, different setup/environment could cause difference. anyway, when your fix merged, we could capture it for some performance improvement. or if you want us a manual check, you could let us know. Thanks! > > thanks, > Shakeel ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-05-30 6:18 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-05-17 5:56 [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression kernel test robot 2024-05-17 23:38 ` Yosry Ahmed 2024-05-18 6:28 ` Shakeel Butt 2024-05-19 9:14 ` Oliver Sang 2024-05-19 17:20 ` Shakeel Butt 2024-05-20 2:43 ` Oliver Sang 2024-05-20 3:49 ` Shakeel Butt 2024-05-21 2:43 ` Oliver Sang 2024-05-22 4:18 ` Shakeel Butt 2024-05-23 7:48 ` Oliver Sang 2024-05-23 16:47 ` Shakeel Butt 2024-05-24 7:45 ` Oliver Sang 2024-05-24 18:06 ` Shakeel Butt 2024-05-28 6:30 ` Shakeel Butt 2024-05-30 6:17 ` Oliver Sang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox