* [linus:master] [mm, mmap] d4148aeab4: will-it-scale.per_process_ops 3888.9% improvement
@ 2024-11-07 14:10 kernel test robot
2024-11-17 16:42 ` Vlastimil Babka
0 siblings, 1 reply; 2+ messages in thread
From: kernel test robot @ 2024-11-07 14:10 UTC (permalink / raw)
To: Vlastimil Babka
Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Michael Matz,
Gabriel Krisman Bertazi, Matthias Bodenbinder, Lorenzo Stoakes,
Yang Shi, Rik van Riel, Jann Horn, Liam R. Howlett, Petr Tesarik,
Thorsten Leemhuis, linux-mm, ying.huang, feng.tang, fengwei.yin,
oliver.sang
Hello,
kernel test robot noticed a 3888.9% improvement of will-it-scale.per_process_ops on:
commit: d4148aeab412432bf928f311eca8a2ba52bb05df ("mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linux-next/master 5b913f5d7d7fe0f567dea8605f21da6eaa1735fb]
testcase: will-it-scale
config: x86_64-rhel-8.3
compiler: gcc-12
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:
nr_task: 100%
mode: process
test: malloc1
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+---------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.radixsort.ops_per_sec 9.2% regression |
| test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=radixsort |
| | testtime=60s |
+------------------+---------------------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241107/202411072132.a8d2cf0f-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-cpl-4sp2/malloc1/will-it-scale
commit:
15e8156713 ("mm: shrinker: avoid memleak in alloc_shrinker_info")
d4148aeab4 ("mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes")
15e8156713cc3803 d4148aeab412432bf928f311eca
---------------- ---------------------------
%stddev %change %stddev
\ | \
255351 +4.5% 266817 vmstat.system.in
1.00 -0.8 0.20 ± 7% mpstat.cpu.all.irq%
0.13 +1.1 1.28 ± 32% mpstat.cpu.all.soft%
0.49 ± 15% +1.3 1.77 ± 27% mpstat.cpu.all.usr%
4143 ± 5% +181.3% 11654 ± 7% perf-c2c.DRAM.remote
2500 ± 3% +2209.9% 57748 ± 15% perf-c2c.HITM.local
3810 ± 2% +1461.0% 59473 ± 15% perf-c2c.HITM.total
78888 +3883.3% 3142322 ± 22% will-it-scale.224.processes
351.67 +3888.9% 14027 ± 22% will-it-scale.per_process_ops
78888 +3883.3% 3142322 ± 22% will-it-scale.workload
702940 ± 15% +123.4% 1570646 ± 11% meminfo.Active
702881 ± 15% +123.5% 1570598 ± 11% meminfo.Active(anon)
3953778 +28.4% 5075703 ± 4% meminfo.Cached
382618 +117.7% 832966 ± 10% meminfo.SUnreclaim
913559 ± 5% +122.8% 2035523 ± 11% meminfo.Shmem
528864 +85.5% 980938 ± 8% meminfo.Slab
12139918 +3685.1% 4.595e+08 ± 22% numa-numastat.node0.local_node
12236476 +3655.6% 4.595e+08 ± 22% numa-numastat.node0.numa_hit
12182775 +3990.8% 4.984e+08 ± 23% numa-numastat.node1.local_node
12264802 +3963.8% 4.984e+08 ± 23% numa-numastat.node1.numa_hit
12222359 +3921.0% 4.915e+08 ± 23% numa-numastat.node2.local_node
12320084 +3889.7% 4.915e+08 ± 23% numa-numastat.node2.numa_hit
12317018 +3904.7% 4.933e+08 ± 22% numa-numastat.node3.local_node
12388608 +3881.9% 4.933e+08 ± 22% numa-numastat.node3.numa_hit
27453504 ± 11% +20.0% 32957345 ± 2% sched_debug.cfs_rq:/.avg_vruntime.min
1.78 ± 2% +12.5% 2.00 ± 4% sched_debug.cfs_rq:/.h_nr_running.max
27453504 ± 11% +20.0% 32957377 ± 2% sched_debug.cfs_rq:/.min_vruntime.min
384.34 ± 6% +28.2% 492.57 ± 4% sched_debug.cfs_rq:/.util_est.avg
113.34 ± 7% +87.5% 212.55 ± 11% sched_debug.cfs_rq:/.util_est.stddev
395.67 ± 6% -91.8% 32.63 ± 7% sched_debug.cpu.clock.stddev
0.00 ± 5% -85.8% 0.00 ± 26% sched_debug.cpu.next_balance.stddev
1.78 ± 2% +12.5% 2.00 ± 8% sched_debug.cpu.nr_running.max
5.99 ± 9% -16.9% 4.98 ± 8% sched_debug.cpu.nr_uninterruptible.stddev
369561 ± 5% -48.5% 190196 ± 65% numa-meminfo.node0.AnonPages.max
108019 ± 4% +102.2% 218399 ± 12% numa-meminfo.node0.SUnreclaim
148129 ± 16% +86.2% 275784 ± 14% numa-meminfo.node0.Slab
73527 ± 61% -53.5% 34174 ± 80% numa-meminfo.node1.Mapped
94870 ± 4% +122.7% 211285 ± 9% numa-meminfo.node1.SUnreclaim
136858 ± 15% +83.3% 250864 ± 11% numa-meminfo.node1.Slab
89356 ± 2% +124.1% 200203 ± 10% numa-meminfo.node2.SUnreclaim
118047 ± 16% +89.0% 223166 ± 15% numa-meminfo.node2.Slab
698747 ± 15% +117.2% 1517832 ± 12% numa-meminfo.node3.Active
698731 ± 15% +117.2% 1517816 ± 12% numa-meminfo.node3.Active(anon)
90353 ± 7% +122.1% 200636 ± 10% numa-meminfo.node3.SUnreclaim
902916 ± 5% +118.4% 1972406 ± 12% numa-meminfo.node3.Shmem
125802 ± 17% +81.8% 228694 ± 8% numa-meminfo.node3.Slab
175727 ± 15% +123.5% 392704 ± 11% proc-vmstat.nr_active_anon
988408 +28.4% 1268872 ± 4% proc-vmstat.nr_file_pages
228353 ± 5% +122.8% 508826 ± 11% proc-vmstat.nr_shmem
36558 +1.2% 36996 proc-vmstat.nr_slab_reclaimable
95650 +117.2% 207775 ± 10% proc-vmstat.nr_slab_unreclaimable
175727 ± 15% +123.5% 392704 ± 11% proc-vmstat.nr_zone_active_anon
61863 ± 20% +50.4% 93011 ± 17% proc-vmstat.numa_hint_faults
27849 ± 47% +108.0% 57912 ± 22% proc-vmstat.numa_hint_faults_local
49211744 +3847.8% 1.943e+09 ± 22% proc-vmstat.numa_hit
48863846 +3875.5% 1.943e+09 ± 22% proc-vmstat.numa_local
102439 ± 38% +399.0% 511155 ± 5% proc-vmstat.pgactivate
1.218e+10 -83.3% 2.036e+09 ± 23% proc-vmstat.pgalloc_normal
25163379 +3665.3% 9.475e+08 ± 22% proc-vmstat.pgfault
1.218e+10 -83.3% 2.035e+09 ± 23% proc-vmstat.pgfree
23741008 -100.0% 0.00 proc-vmstat.thp_fault_alloc
27004 ± 4% +102.3% 54625 ± 12% numa-vmstat.node0.nr_slab_unreclaimable
12236166 +3655.7% 4.595e+08 ± 22% numa-vmstat.node0.numa_hit
12139608 +3685.2% 4.595e+08 ± 22% numa-vmstat.node0.numa_local
18419 ± 61% -53.5% 8573 ± 80% numa-vmstat.node1.nr_mapped
23716 ± 4% +122.9% 52871 ± 9% numa-vmstat.node1.nr_slab_unreclaimable
12263982 +3964.1% 4.984e+08 ± 23% numa-vmstat.node1.numa_hit
12181955 +3991.1% 4.984e+08 ± 23% numa-vmstat.node1.numa_local
22339 ± 2% +124.8% 50226 ± 10% numa-vmstat.node2.nr_slab_unreclaimable
12319433 +3889.9% 4.915e+08 ± 23% numa-vmstat.node2.numa_hit
12221708 +3921.2% 4.915e+08 ± 23% numa-vmstat.node2.numa_local
174568 ± 15% +117.3% 379260 ± 12% numa-vmstat.node3.nr_active_anon
225581 ± 5% +118.6% 493053 ± 12% numa-vmstat.node3.nr_shmem
22588 ± 7% +122.8% 50325 ± 11% numa-vmstat.node3.nr_slab_unreclaimable
174566 ± 15% +117.3% 379259 ± 12% numa-vmstat.node3.nr_zone_active_anon
12386775 +3882.4% 4.933e+08 ± 22% numa-vmstat.node3.numa_hit
12315185 +3905.2% 4.933e+08 ± 22% numa-vmstat.node3.numa_local
20.80 -88.0% 2.51 ± 4% perf-stat.i.MPKI
1.314e+09 +1555.3% 2.175e+10 ± 18% perf-stat.i.branch-instructions
0.79 ± 3% -0.4 0.35 perf-stat.i.branch-miss-rate%
12157447 ± 2% +510.1% 74172493 ± 18% perf-stat.i.branch-misses
61.19 -29.2 31.97 ± 2% perf-stat.i.cache-miss-rate%
1.255e+08 +104.5% 2.567e+08 ± 19% perf-stat.i.cache-misses
2.043e+08 +292.8% 8.024e+08 ± 18% perf-stat.i.cache-references
139.74 -93.8% 8.69 ± 27% perf-stat.i.cpi
264.51 +8.5% 286.93 perf-stat.i.cpu-migrations
6664 -47.9% 3471 ± 28% perf-stat.i.cycles-between-cache-misses
6.183e+09 +1555.5% 1.024e+11 ± 18% perf-stat.i.instructions
0.01 ± 23% +808.8% 0.12 ± 18% perf-stat.i.ipc
82374 +3701.0% 3131041 ± 22% perf-stat.i.minor-faults
82375 +3701.0% 3131042 ± 22% perf-stat.i.page-faults
20.32 -91.8% 1.67 ± 70% perf-stat.overall.MPKI
0.92 ± 2% -0.7 0.23 ± 70% perf-stat.overall.branch-miss-rate%
61.44 -39.9 21.51 ± 70% perf-stat.overall.cache-miss-rate%
136.21 -95.6% 6.05 ± 80% perf-stat.overall.cpi
6703 -63.8% 2428 ± 80% perf-stat.overall.cycles-between-cache-misses
23815471 -71.7% 6746184 ± 71% perf-stat.overall.path-length
2.41 ± 6% -93.2% 0.16 ± 78% perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
0.60 ± 21% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.pte_alloc_one.__do_huge_pmd_anonymous_page
0.74 ± 36% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_huge_pmd_anonymous_page.__handle_mm_fault
2.00 ± 27% -76.6% 0.47 ±149% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.mmap_region
0.35 ± 4% -78.5% 0.07 ± 10% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
1.99 ± 20% -77.4% 0.45 ± 93% perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
2.15 ± 14% -98.2% 0.04 ±196% perf-sched.sch_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
0.57 ± 7% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.folio_zero_user.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.84 ± 96% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.__anon_vma_prepare.__vmf_anon_prepare.do_huge_pmd_anonymous_page
1.41 ± 39% -93.0% 0.10 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
1.52 ± 31% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_file_alloc.init_file.alloc_empty_file
1.31 ± 17% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.fifo_open.do_dentry_open.vfs_open
0.05 ± 4% -76.9% 0.01 ± 27% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1.13 ± 33% -99.6% 0.00 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
2.20 ± 19% -93.4% 0.14 ±223% perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
1.65 ± 30% -70.6% 0.48 ± 65% perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.50 ± 10% -89.7% 0.05 ± 50% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.03 ± 21% -63.7% 0.01 ± 53% perf-sched.sch_delay.avg.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
0.93 ± 11% -54.4% 0.42 ± 35% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
2.62 ± 33% -51.4% 1.27 ± 32% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
0.81 ±133% -94.4% 0.05 ± 13% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
1.11 ± 12% -51.4% 0.54 ± 20% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.49 ± 37% -77.7% 0.11 ± 52% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.08 ± 3% -71.3% 0.02 ±107% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
0.05 ± 33% -80.9% 0.01 ± 11% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.05 ± 8% -81.4% 0.01 ± 41% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
2.15 ± 13% -77.3% 0.49 ± 28% perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
4.04 -57.7% 1.71 ± 97% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
4.09 ± 2% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.pte_alloc_one.__do_huge_pmd_anonymous_page
1.19 ± 74% +593.4% 8.27 ± 62% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.pte_alloc_one.__pte_alloc
0.84 ±173% +695.6% 6.69 ± 78% perf-sched.sch_delay.max.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_anonymous_page.__handle_mm_fault
4.00 -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_huge_pmd_anonymous_page.__handle_mm_fault
2.71 ± 21% -82.7% 0.47 ±149% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.mmap_region
7.66 ± 93% -91.2% 0.67 ±223% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter
0.01 ±223% +9242.3% 0.81 ±179% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
8.12 ±103% +186.4% 23.26 ± 73% perf-sched.sch_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas
2.34 ± 11% -75.7% 0.57 ± 75% perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.part
2.77 ± 21% -96.7% 0.09 ±212% perf-sched.sch_delay.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
36.48 ± 69% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.folio_zero_user.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.37 ±223% +998.1% 4.04 ± 35% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.__anon_vma_prepare.__vmf_anon_prepare.do_anonymous_page
2.26 ± 78% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.__anon_vma_prepare.__vmf_anon_prepare.do_huge_pmd_anonymous_page
1.83 ± 40% -94.6% 0.10 ±223% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
1.75 ± 25% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.security_file_alloc.init_file.alloc_empty_file
1.51 ± 30% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.fifo_open.do_dentry_open.vfs_open
0.13 ± 50% -86.1% 0.02 ± 46% perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
4.17 ± 5% -99.8% 0.01 ±223% perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
4.02 +510.8% 24.57 ± 38% perf-sched.sch_delay.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap
3.99 -92.2% 0.31 ±223% perf-sched.sch_delay.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
4.11 ± 5% +33.0% 5.47 ± 27% perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
4.54 ± 12% -54.1% 2.08 ± 65% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
4.00 ± 25% -70.1% 1.20 ± 30% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.03 ± 17% -60.8% 0.01 ± 64% perf-sched.sch_delay.max.ms.ipmi_thread.kthread.ret_from_fork.ret_from_fork_asm
4.08 ± 2% -61.1% 1.59 ±108% perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
389.98 ±121% -96.7% 13.00 ± 33% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
9.71 ±119% -78.8% 2.06 ± 58% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
8.60 ±210% -99.1% 0.08 ± 36% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
4.05 ± 2% -34.0% 2.67 ± 42% perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
8.02 ± 5% -55.7% 3.55 ± 11% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.47 ± 5% -13.9% 0.40 ± 13% perf-sched.total_sch_delay.average.ms
396.80 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter
0.17 ±223% +596.6% 1.17 ± 14% perf-sched.wait_and_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas
1.15 ± 7% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.folio_zero_user.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
417.15 ± 5% +75.4% 731.48 ± 12% perf-sched.wait_and_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
11.50 ± 25% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter
26.17 ±223% +15367.5% 4047 ± 49% perf-sched.wait_and_delay.count.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas
15379 ± 15% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.folio_zero_user.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
30.33 ±223% +836.8% 284.17 ± 53% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
1000 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter
1.34 ±223% +3359.9% 46.51 ± 73% perf-sched.wait_and_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas
72.95 ± 69% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.folio_zero_user.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
13.11 ±122% -61.3% 5.08 perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
2.41 ± 6% -93.5% 0.16 ± 76% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
0.60 ± 21% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.pte_alloc_one.__do_huge_pmd_anonymous_page
0.74 ± 36% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_huge_pmd_anonymous_page.__handle_mm_fault
2.00 ± 27% -76.6% 0.47 ±149% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.mmap_region
394.43 ± 10% -99.8% 0.67 ±223% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter
1.99 ± 20% -77.8% 0.44 ± 95% perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
2.15 ± 14% -98.4% 0.03 ±193% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
0.57 ± 7% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.folio_zero_user.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.84 ± 96% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.__anon_vma_prepare.__vmf_anon_prepare.do_huge_pmd_anonymous_page
1.41 ± 40% -93.0% 0.10 ±223% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
84.95 ±219% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_file_alloc.init_file.alloc_empty_file
1.31 ± 17% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.fifo_open.do_dentry_open.vfs_open
1.13 ± 33% -99.6% 0.00 ±223% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
2.11 ± 23% -93.1% 0.14 ±223% perf-sched.wait_time.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
415.50 ± 5% +75.9% 731.00 ± 13% perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
5.81 ± 41% -74.2% 1.50 ± 9% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.93 ± 12% -54.5% 0.42 ± 35% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
9.89 ±101% -87.1% 1.27 ± 32% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
13.22 ± 5% -77.3% 3.01 ± 14% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
2.13 ± 15% -77.6% 0.48 ± 28% perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
4.04 -57.7% 1.71 ± 97% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
4.09 ± 2% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.pte_alloc_one.__do_huge_pmd_anonymous_page
1.19 ± 74% +593.4% 8.27 ± 62% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.pte_alloc_one.__pte_alloc
0.84 ±173% +695.6% 6.69 ± 78% perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_anonymous_page.__handle_mm_fault
4.00 -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.__vmf_anon_prepare.do_huge_pmd_anonymous_page.__handle_mm_fault
2.71 ± 21% -82.7% 0.47 ±149% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.mmap_region
1000 -99.9% 0.67 ±223% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter
8.12 ±103% +186.4% 23.26 ± 73% perf-sched.wait_time.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas
2.34 ± 11% -75.7% 0.57 ± 75% perf-sched.wait_time.max.ms.__cond_resched.dput.step_into.link_path_walk.part
2.77 ± 21% -96.7% 0.09 ±212% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
36.48 ± 69% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.folio_zero_user.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.37 ±223% +998.1% 4.04 ± 35% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.__anon_vma_prepare.__vmf_anon_prepare.do_anonymous_page
2.26 ± 78% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.__anon_vma_prepare.__vmf_anon_prepare.do_huge_pmd_anonymous_page
1.83 ± 40% -94.6% 0.10 ±223% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
168.46 ±221% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.security_file_alloc.init_file.alloc_empty_file
1.51 ± 30% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.fifo_open.do_dentry_open.vfs_open
4.17 ± 5% -99.8% 0.01 ±223% perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
4.02 +510.8% 24.57 ± 38% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap
3.99 -92.2% 0.31 ±223% perf-sched.wait_time.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
4.11 ± 5% +33.0% 5.47 ± 27% perf-sched.wait_time.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
250.47 ±134% -89.0% 27.62 ± 59% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
4.08 ± 2% -61.1% 1.59 ±108% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
499.41 +278.7% 1891 ± 64% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
54.24 ± 46% -59.6% 21.93 ± 68% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
4.05 ± 2% -34.0% 2.67 ± 42% perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
96.58 -96.6 0.00 perf-profile.calltrace.cycles-pp.folio_zero_user.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
96.37 -96.4 0.00 perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
96.69 -90.4 6.27 ± 11% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
96.73 -90.3 6.45 ± 11% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
97.00 -90.2 6.79 ± 10% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
96.77 -90.1 6.68 ± 10% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
96.77 -90.1 6.69 ± 10% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
87.45 -87.4 0.00 perf-profile.calltrace.cycles-pp.clear_page_erms.folio_zero_user.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +0.6 0.61 ± 11% perf-profile.calltrace.cycles-pp.lru_add.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.vms_clear_ptes
0.00 +0.6 0.62 ± 12% perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_noprof.__anon_vma_prepare.__vmf_anon_prepare.do_anonymous_page
0.00 +0.7 0.67 ± 10% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_noprof.__anon_vma_prepare.__vmf_anon_prepare.do_anonymous_page.__handle_mm_fault
0.00 +0.7 0.69 ± 27% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
0.00 +0.8 0.82 ± 10% perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range
0.00 +0.8 0.84 ± 13% perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault
0.00 +0.8 0.84 ± 10% perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
0.00 +0.8 0.85 ± 8% perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.___pte_free_tlb.free_pud_range.free_p4d_range.free_pgd_range
0.00 +0.9 0.86 ± 9% perf-profile.calltrace.cycles-pp.__anon_vma_prepare.__vmf_anon_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +0.9 0.87 ± 8% perf-profile.calltrace.cycles-pp.___pte_free_tlb.free_pud_range.free_p4d_range.free_pgd_range.free_pgtables
0.00 +0.9 0.88 ± 9% perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.__folio_mod_stat.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault
0.00 +0.9 0.88 ± 10% perf-profile.calltrace.cycles-pp.__folio_mod_stat.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +0.9 0.90 ± 9% perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
0.00 +0.9 0.90 ± 9% perf-profile.calltrace.cycles-pp.__vmf_anon_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +0.9 0.91 ± 10% perf-profile.calltrace.cycles-pp.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +1.0 1.03 ± 9% perf-profile.calltrace.cycles-pp.free_pud_range.free_p4d_range.free_pgd_range.free_pgtables.vms_clear_ptes
0.00 +1.0 1.05 ± 9% perf-profile.calltrace.cycles-pp.free_p4d_range.free_pgd_range.free_pgtables.vms_clear_ptes.vms_complete_munmap_vmas
0.00 +1.1 1.07 ± 9% perf-profile.calltrace.cycles-pp.free_pgd_range.free_pgtables.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap
0.00 +1.1 1.11 ± 75% perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_folios.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
0.00 +1.3 1.29 ± 46% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +1.4 1.38 ± 9% perf-profile.calltrace.cycles-pp.free_pgtables.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
0.00 +1.5 1.52 ± 28% perf-profile.calltrace.cycles-pp.__memcg_kmem_charge_page.__alloc_pages_noprof.alloc_pages_mpol_noprof.pte_alloc_one.__pte_alloc
0.00 +1.5 1.54 ± 34% perf-profile.calltrace.cycles-pp.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +1.7 1.73 ± 22% perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.pte_alloc_one.__pte_alloc.do_anonymous_page
0.00 +1.7 1.74 ± 18% perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.8 1.75 ± 21% perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault
0.00 +2.0 2.02 ± 14% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.vms_clear_ptes
0.00 +2.1 2.10 ± 19% perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.00 +2.2 2.22 ± 19% perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.00 +2.3 2.28 ± 19% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
0.00 +2.3 2.29 ± 19% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
0.00 +2.5 2.46 ± 19% perf-profile.calltrace.cycles-pp.__mmap
0.00 +2.5 2.47 ± 16% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas
0.00 +2.5 2.54 ± 16% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap
0.00 +2.6 2.58 ± 16% perf-profile.calltrace.cycles-pp.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
0.00 +2.6 2.63 ± 12% perf-profile.calltrace.cycles-pp.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +2.7 2.69 ± 12% perf-profile.calltrace.cycles-pp.__pte_alloc.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +3.2 3.21 ± 38% perf-profile.calltrace.cycles-pp.page_counter_cancel.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_folios.folios_put_refs
0.00 +4.0 4.02 ± 37% perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_folios.folios_put_refs.free_pages_and_swap_cache
0.00 +5.0 5.04 ± 23% perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_folios.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
0.00 +6.2 6.16 ± 7% perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_folios.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
0.00 +6.2 6.16 ± 12% perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.00 +36.4 36.44 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
0.00 +36.6 36.57 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
0.00 +36.8 36.80 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
0.00 +36.9 36.93 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
0.00 +37.0 37.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.vms_clear_ptes
0.00 +37.3 37.33 ± 3% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
0.00 +37.9 37.87 ± 3% perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
0.00 +38.4 38.42 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.vms_clear_ptes.vms_complete_munmap_vmas
0.00 +38.4 38.45 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap
0.00 +38.5 38.45 perf-profile.calltrace.cycles-pp.lru_add_drain.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
0.78 ± 2% +44.1 44.87 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes
0.79 ± 2% +44.2 44.99 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas
0.79 ± 2% +44.2 45.00 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap
0.86 ± 2% +44.4 45.24 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
1.23 ± 2% +86.5 87.70 perf-profile.calltrace.cycles-pp.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
1.25 +86.6 87.86 perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
1.44 +87.1 88.55 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
1.48 +87.1 88.60 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.49 +87.2 88.66 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
1.50 +87.2 88.66 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
1.52 +87.2 88.72 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
1.52 +87.2 88.73 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
1.53 +87.4 88.90 perf-profile.calltrace.cycles-pp.__munmap
96.37 -96.4 0.00 perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
95.62 -95.6 0.00 perf-profile.children.cycles-pp.folio_zero_user
96.71 -90.4 6.28 ± 11% perf-profile.children.cycles-pp.__handle_mm_fault
96.75 -90.3 6.46 ± 11% perf-profile.children.cycles-pp.handle_mm_fault
97.02 -90.2 6.83 ± 10% perf-profile.children.cycles-pp.asm_exc_page_fault
96.79 -90.1 6.68 ± 10% perf-profile.children.cycles-pp.do_user_addr_fault
96.80 -90.1 6.70 ± 10% perf-profile.children.cycles-pp.exc_page_fault
89.46 -89.3 0.20 ± 22% perf-profile.children.cycles-pp.clear_page_erms
4.15 -3.8 0.35 ± 23% perf-profile.children.cycles-pp.__cond_resched
0.81 ± 3% -0.6 0.23 ± 7% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.80 ± 3% -0.6 0.22 ± 7% perf-profile.children.cycles-pp.hrtimer_interrupt
0.73 ± 3% -0.5 0.18 ± 8% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.60 ± 4% -0.5 0.14 ± 5% perf-profile.children.cycles-pp.tick_nohz_handler
0.83 ± 16% -0.4 0.39 ± 3% perf-profile.children.cycles-pp.__cmd_record
0.53 ± 3% -0.4 0.12 ± 6% perf-profile.children.cycles-pp.update_process_times
0.52 -0.4 0.12 ± 28% perf-profile.children.cycles-pp.free_unref_folios
0.44 ± 3% -0.3 0.12 ± 6% perf-profile.children.cycles-pp.cmd_record
0.44 ± 3% -0.3 0.12 ± 6% perf-profile.children.cycles-pp.record__mmap_read_evlist
0.44 ± 3% -0.3 0.12 ± 6% perf-profile.children.cycles-pp.handle_internal_command
0.44 ± 3% -0.3 0.12 ± 6% perf-profile.children.cycles-pp.main
0.44 ± 3% -0.3 0.12 ± 6% perf-profile.children.cycles-pp.run_builtin
0.43 ± 3% -0.3 0.11 ± 6% perf-profile.children.cycles-pp.perf_mmap__push
0.34 ± 3% -0.3 0.07 ± 17% perf-profile.children.cycles-pp.record__pushfn
0.34 ± 4% -0.3 0.07 ± 17% perf-profile.children.cycles-pp.write
0.34 ± 4% -0.3 0.07 ± 17% perf-profile.children.cycles-pp.writen
0.32 ± 3% -0.3 0.07 ± 15% perf-profile.children.cycles-pp.ksys_write
0.32 ± 3% -0.2 0.07 ± 16% perf-profile.children.cycles-pp.vfs_write
0.30 ± 3% -0.2 0.07 ± 16% perf-profile.children.cycles-pp.shmem_file_write_iter
0.30 ± 3% -0.2 0.06 ± 19% perf-profile.children.cycles-pp.generic_perform_write
0.29 ± 4% -0.2 0.09 ± 4% perf-profile.children.cycles-pp.sched_tick
0.38 ± 33% -0.1 0.24 ± 8% perf-profile.children.cycles-pp.process_simple
0.20 ± 5% -0.1 0.06 ± 8% perf-profile.children.cycles-pp.task_tick_fair
0.38 ± 33% -0.1 0.24 ± 8% perf-profile.children.cycles-pp.ordered_events__queue
0.38 ± 33% -0.1 0.24 ± 8% perf-profile.children.cycles-pp.queue_event
0.00 +0.1 0.10 ± 22% perf-profile.children.cycles-pp.sync_regs
0.00 +0.1 0.10 ± 59% perf-profile.children.cycles-pp.__count_memcg_events
0.00 +0.1 0.10 ± 24% perf-profile.children.cycles-pp.___perf_sw_event
0.00 +0.1 0.10 ± 26% perf-profile.children.cycles-pp.perf_event_mmap_output
0.00 +0.1 0.11 ± 24% perf-profile.children.cycles-pp.native_flush_tlb_local
0.00 +0.1 0.11 ± 8% perf-profile.children.cycles-pp.__put_anon_vma
0.00 +0.1 0.11 ± 17% perf-profile.children.cycles-pp.rmqueue
0.00 +0.1 0.12 ± 26% perf-profile.children.cycles-pp.find_mergeable_anon_vma
0.00 +0.1 0.12 ± 23% perf-profile.children.cycles-pp.flush_tlb_func
0.00 +0.1 0.13 ± 28% perf-profile.children.cycles-pp.__call_rcu_common
0.00 +0.1 0.13 ± 24% perf-profile.children.cycles-pp.__perf_sw_event
0.00 +0.1 0.13 ± 38% perf-profile.children.cycles-pp.obj_cgroup_uncharge_pages
0.05 +0.1 0.18 ± 23% perf-profile.children.cycles-pp.flush_tlb_mm_range
0.00 +0.1 0.14 ± 27% perf-profile.children.cycles-pp.___slab_alloc
0.00 +0.1 0.14 ± 23% perf-profile.children.cycles-pp.mas_preallocate
0.00 +0.1 0.15 ± 18% perf-profile.children.cycles-pp.mas_empty_area_rev
0.00 +0.2 0.16 ± 22% perf-profile.children.cycles-pp.clear_bhb_loop
0.02 ±141% +0.2 0.17 ± 25% perf-profile.children.cycles-pp.perf_iterate_sb
0.00 +0.2 0.16 ± 26% perf-profile.children.cycles-pp.mas_alloc_nodes
0.07 ± 5% +0.2 0.24 ± 25% perf-profile.children.cycles-pp.vms_gather_munmap_vmas
0.00 +0.2 0.17 ± 20% perf-profile.children.cycles-pp.obj_cgroup_charge
0.00 +0.2 0.18 ± 25% perf-profile.children.cycles-pp.mas_walk
0.05 +0.2 0.24 ± 24% perf-profile.children.cycles-pp.mas_find
0.06 ± 6% +0.2 0.27 ± 11% perf-profile.children.cycles-pp.unlink_anon_vmas
0.05 ± 7% +0.2 0.26 ± 7% perf-profile.children.cycles-pp.free_unref_page
0.07 ± 11% +0.2 0.28 ± 21% perf-profile.children.cycles-pp.perf_event_mmap_event
0.07 ± 9% +0.2 0.31 ± 22% perf-profile.children.cycles-pp.perf_event_mmap
0.00 +0.2 0.24 ± 20% perf-profile.children.cycles-pp.vm_unmapped_area
0.02 ± 99% +0.2 0.27 ± 20% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
0.06 ± 6% +0.2 0.31 ± 20% perf-profile.children.cycles-pp.__get_unmapped_area
0.00 +0.3 0.26 ± 29% perf-profile.children.cycles-pp.rcu_cblist_dequeue
0.00 +0.3 0.26 ± 6% perf-profile.children.cycles-pp.free_pcppages_bulk
0.00 +0.3 0.26 ± 23% perf-profile.children.cycles-pp.mas_store_prealloc
0.00 +0.3 0.27 ± 7% perf-profile.children.cycles-pp.__put_partials
0.00 +0.3 0.28 ± 7% perf-profile.children.cycles-pp.free_unref_page_commit
0.00 +0.3 0.28 ± 19% perf-profile.children.cycles-pp.vm_area_free_rcu_cb
0.14 ± 2% +0.3 0.48 ± 25% perf-profile.children.cycles-pp.mas_store_gfp
0.05 ± 7% +0.3 0.40 ± 4% perf-profile.children.cycles-pp.__memcg_slab_free_hook
0.10 +0.4 0.47 ± 24% perf-profile.children.cycles-pp.mas_wr_node_store
0.00 +0.5 0.46 ± 27% perf-profile.children.cycles-pp.__slab_free
0.07 +0.5 0.55 ± 10% perf-profile.children.cycles-pp.vm_area_alloc
0.06 +0.5 0.61 ± 12% perf-profile.children.cycles-pp.lru_add
0.00 +0.7 0.70 ± 26% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
0.00 +0.7 0.72 ± 24% perf-profile.children.cycles-pp.__mod_memcg_state
0.12 ± 6% +0.8 0.88 ± 10% perf-profile.children.cycles-pp.__folio_mod_stat
0.13 ± 5% +0.8 0.91 ± 10% perf-profile.children.cycles-pp.folio_add_new_anon_rmap
0.04 ± 71% +0.8 0.83 ± 13% perf-profile.children.cycles-pp.mod_objcg_state
0.00 +0.8 0.83 ± 35% perf-profile.children.cycles-pp.propagate_protected_usage
0.00 +0.8 0.84 ± 10% perf-profile.children.cycles-pp.folio_remove_rmap_ptes
0.00 +0.9 0.86 ± 9% perf-profile.children.cycles-pp.__anon_vma_prepare
0.00 +0.9 0.88 ± 8% perf-profile.children.cycles-pp.___pte_free_tlb
0.01 ±223% +0.9 0.90 ± 9% perf-profile.children.cycles-pp.__vmf_anon_prepare
0.00 +0.9 0.90 ± 9% perf-profile.children.cycles-pp.zap_present_ptes
0.06 +1.0 1.02 ± 10% perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
0.05 ± 7% +1.0 1.02 ± 33% perf-profile.children.cycles-pp.native_irq_return_iret
0.00 +1.0 1.04 ± 9% perf-profile.children.cycles-pp.free_pud_range
0.00 +1.1 1.05 ± 9% perf-profile.children.cycles-pp.free_p4d_range
0.00 +1.1 1.07 ± 9% perf-profile.children.cycles-pp.free_pgd_range
0.00 +1.1 1.11 ± 75% perf-profile.children.cycles-pp.uncharge_folio
0.12 ± 4% +1.1 1.24 ± 11% perf-profile.children.cycles-pp.kmem_cache_free
0.10 ± 4% +1.2 1.30 ± 46% perf-profile.children.cycles-pp.__mem_cgroup_charge
0.11 ± 4% +1.2 1.34 ± 9% perf-profile.children.cycles-pp.kmem_cache_alloc_noprof
0.16 ± 3% +1.3 1.41 ± 16% perf-profile.children.cycles-pp.handle_softirqs
0.13 +1.3 1.41 ± 16% perf-profile.children.cycles-pp.rcu_core
0.12 ± 4% +1.3 1.41 ± 16% perf-profile.children.cycles-pp.rcu_do_batch
0.10 ± 3% +1.3 1.39 ± 9% perf-profile.children.cycles-pp.free_pgtables
0.06 +1.5 1.53 ± 28% perf-profile.children.cycles-pp.__memcg_kmem_charge_page
0.28 ± 4% +1.5 1.75 ± 18% perf-profile.children.cycles-pp.mmap_region
0.00 +1.5 1.54 ± 34% perf-profile.children.cycles-pp.alloc_anon_folio
0.38 ± 2% +1.6 1.96 ± 17% perf-profile.children.cycles-pp.__alloc_pages_noprof
0.39 ± 2% +1.6 2.00 ± 16% perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
0.34 ± 2% +1.8 2.11 ± 19% perf-profile.children.cycles-pp.do_mmap
0.36 ± 2% +1.9 2.24 ± 19% perf-profile.children.cycles-pp.vm_mmap_pgoff
0.00 +2.0 2.02 ± 14% perf-profile.children.cycles-pp.zap_pte_range
0.37 ± 3% +2.1 2.48 ± 19% perf-profile.children.cycles-pp.__mmap
0.22 ± 2% +2.3 2.50 ± 16% perf-profile.children.cycles-pp.zap_pmd_range
0.23 ± 3% +2.3 2.55 ± 16% perf-profile.children.cycles-pp.unmap_page_range
0.24 ± 3% +2.3 2.58 ± 16% perf-profile.children.cycles-pp.unmap_vmas
0.16 +2.5 2.63 ± 12% perf-profile.children.cycles-pp.pte_alloc_one
0.08 ± 12% +2.6 2.71 ± 32% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.00 +2.7 2.69 ± 12% perf-profile.children.cycles-pp.__pte_alloc
0.05 +3.2 3.22 ± 38% perf-profile.children.cycles-pp.page_counter_cancel
0.22 ± 5% +3.2 3.40 ± 10% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
0.05 +4.0 4.03 ± 37% perf-profile.children.cycles-pp.page_counter_uncharge
0.06 +5.0 5.04 ± 23% perf-profile.children.cycles-pp.uncharge_batch
0.06 ± 7% +6.1 6.16 ± 7% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
0.00 +6.2 6.16 ± 12% perf-profile.children.cycles-pp.do_anonymous_page
0.10 ± 5% +37.8 37.88 ± 3% perf-profile.children.cycles-pp.__page_cache_release
0.12 ± 3% +38.3 38.46 perf-profile.children.cycles-pp.folio_batch_move_lru
0.00 +38.5 38.46 perf-profile.children.cycles-pp.lru_add_drain_cpu
0.00 +38.5 38.47 perf-profile.children.cycles-pp.lru_add_drain
0.79 ± 2% +44.2 44.99 perf-profile.children.cycles-pp.free_pages_and_swap_cache
0.79 ± 2% +44.2 45.01 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
0.78 ± 2% +44.2 45.01 perf-profile.children.cycles-pp.folios_put_refs
0.86 ± 2% +44.4 45.24 perf-profile.children.cycles-pp.tlb_finish_mmu
0.12 ± 6% +73.4 73.50 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.21 ± 4% +73.6 73.77 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.08 +74.3 74.35 ± 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
1.23 ± 2% +86.5 87.71 perf-profile.children.cycles-pp.vms_clear_ptes
1.26 +86.6 87.87 perf-profile.children.cycles-pp.vms_complete_munmap_vmas
1.44 +87.1 88.55 perf-profile.children.cycles-pp.do_vmi_align_munmap
1.48 +87.1 88.60 perf-profile.children.cycles-pp.do_vmi_munmap
1.50 +87.2 88.66 perf-profile.children.cycles-pp.__x64_sys_munmap
1.49 +87.2 88.66 perf-profile.children.cycles-pp.__vm_munmap
1.53 +87.4 88.93 perf-profile.children.cycles-pp.__munmap
2.30 +88.8 91.10 perf-profile.children.cycles-pp.do_syscall_64
2.30 +88.8 91.12 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
88.52 -88.3 0.20 ± 22% perf-profile.self.cycles-pp.clear_page_erms
3.36 -3.2 0.20 ± 23% perf-profile.self.cycles-pp.__cond_resched
0.49 -0.5 0.03 ± 70% perf-profile.self.cycles-pp.free_unref_folios
0.37 ± 33% -0.1 0.24 ± 8% perf-profile.self.cycles-pp.queue_event
0.00 +0.1 0.09 ± 22% perf-profile.self.cycles-pp.___perf_sw_event
0.00 +0.1 0.10 ± 64% perf-profile.self.cycles-pp.__count_memcg_events
0.00 +0.1 0.10 ± 22% perf-profile.self.cycles-pp.sync_regs
0.00 +0.1 0.10 ± 23% perf-profile.self.cycles-pp.perf_event_mmap_output
0.00 +0.1 0.10 ± 47% perf-profile.self.cycles-pp.obj_cgroup_uncharge_pages
0.00 +0.1 0.11 ± 24% perf-profile.self.cycles-pp.native_flush_tlb_local
0.00 +0.1 0.11 ± 21% perf-profile.self.cycles-pp.kmem_cache_free
0.00 +0.1 0.11 ± 25% perf-profile.self.cycles-pp.__memcg_slab_free_hook
0.00 +0.1 0.12 ± 22% perf-profile.self.cycles-pp.__page_cache_release
0.00 +0.1 0.13 ± 24% perf-profile.self.cycles-pp.mmap_region
0.00 +0.1 0.13 ± 26% perf-profile.self.cycles-pp.obj_cgroup_charge
0.00 +0.2 0.15 ± 22% perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
0.00 +0.2 0.16 ± 22% perf-profile.self.cycles-pp.clear_bhb_loop
0.00 +0.2 0.16 ± 25% perf-profile.self.cycles-pp.free_pud_range
0.00 +0.2 0.17 ± 24% perf-profile.self.cycles-pp.kmem_cache_alloc_noprof
0.00 +0.2 0.17 ± 23% perf-profile.self.cycles-pp.lru_add
0.00 +0.2 0.17 ± 24% perf-profile.self.cycles-pp.mas_walk
0.00 +0.2 0.18 ± 22% perf-profile.self.cycles-pp.folio_batch_move_lru
0.09 +0.2 0.28 ± 29% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.06 ± 9% +0.2 0.26 ± 25% perf-profile.self.cycles-pp.mas_wr_node_store
0.00 +0.2 0.23 ± 21% perf-profile.self.cycles-pp.zap_pmd_range
0.00 +0.3 0.25 ± 28% perf-profile.self.cycles-pp.rcu_cblist_dequeue
0.00 +0.3 0.28 ± 21% perf-profile.self.cycles-pp.folios_put_refs
0.00 +0.4 0.44 ± 26% perf-profile.self.cycles-pp.__slab_free
0.00 +0.5 0.46 ± 33% perf-profile.self.cycles-pp.mod_objcg_state
0.00 +0.5 0.48 ± 96% perf-profile.self.cycles-pp.__mem_cgroup_charge
0.00 +0.6 0.59 ± 32% perf-profile.self.cycles-pp.uncharge_batch
0.00 +0.7 0.69 ± 24% perf-profile.self.cycles-pp.__mod_memcg_state
0.00 +0.7 0.69 ± 27% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
0.00 +0.8 0.81 ± 35% perf-profile.self.cycles-pp.propagate_protected_usage
0.00 +0.8 0.82 ± 36% perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave
0.05 ± 7% +1.0 1.02 ± 33% perf-profile.self.cycles-pp.native_irq_return_iret
0.00 +1.0 1.02 ± 22% perf-profile.self.cycles-pp.zap_pte_range
0.00 +1.1 1.10 ± 76% perf-profile.self.cycles-pp.uncharge_folio
0.00 +1.1 1.14 ± 37% perf-profile.self.cycles-pp.__memcg_kmem_charge_page
0.08 ± 4% +1.7 1.78 ± 35% perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
0.07 ± 16% +2.6 2.62 ± 33% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.05 +3.1 3.16 ± 38% perf-profile.self.cycles-pp.page_counter_cancel
0.12 ± 7% +73.4 73.50 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/radixsort/stress-ng/60s
commit:
15e8156713 ("mm: shrinker: avoid memleak in alloc_shrinker_info")
d4148aeab4 ("mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes")
15e8156713cc3803 d4148aeab412432bf928f311eca
---------------- ---------------------------
%stddev %change %stddev
\ | \
18.98 ± 5% +30.5% 24.78 ± 14% sched_debug.cpu.clock.stddev
0.05 ± 7% +886.3% 0.50 ± 86% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
1.14 ± 74% -55.5% 0.51 ± 37% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
265008 ± 13% -37.2% 166403 ± 22% meminfo.Active
264976 ± 13% -37.2% 166371 ± 22% meminfo.Active(anon)
1315990 -63.8% 475807 ± 3% meminfo.AnonHugePages
164713 -9.2% 149574 stress-ng.radixsort.ops
2745 -9.2% 2492 stress-ng.radixsort.ops_per_sec
78925 +289.9% 307739 stress-ng.time.minor_page_faults
327.88 ± 3% -67.1% 107.89 ± 3% numa-vmstat.node0.nr_anon_transparent_hugepages
63711 ± 11% -40.7% 37791 ± 25% numa-vmstat.node1.nr_active_anon
317.83 ± 3% -60.5% 125.70 ± 8% numa-vmstat.node1.nr_anon_transparent_hugepages
293623 ± 18% +35.1% 396637 ± 14% numa-vmstat.node1.nr_inactive_anon
63710 ± 11% -40.7% 37791 ± 25% numa-vmstat.node1.nr_zone_active_anon
293622 ± 18% +35.1% 396635 ± 14% numa-vmstat.node1.nr_zone_inactive_anon
668445 ± 3% -67.1% 220222 ± 3% numa-meminfo.node0.AnonHugePages
1317034 ± 13% -23.2% 1010970 ± 16% numa-meminfo.node0.AnonPages.max
254241 ± 11% -40.5% 151233 ± 26% numa-meminfo.node1.Active
254219 ± 11% -40.5% 151227 ± 26% numa-meminfo.node1.Active(anon)
647491 ± 3% -60.4% 256719 ± 8% numa-meminfo.node1.AnonHugePages
1170941 ± 18% +35.3% 1583801 ± 14% numa-meminfo.node1.Inactive
1170783 ± 18% +35.3% 1583760 ± 14% numa-meminfo.node1.Inactive(anon)
66923 ± 13% -36.6% 42442 ± 22% proc-vmstat.nr_active_anon
642.60 -63.8% 232.40 ± 3% proc-vmstat.nr_anon_transparent_hugepages
597472 ± 2% +5.6% 630776 proc-vmstat.nr_inactive_anon
66923 ± 13% -36.6% 42442 ± 22% proc-vmstat.nr_zone_active_anon
597472 ± 2% +5.6% 630776 proc-vmstat.nr_zone_inactive_anon
1188533 +19.2% 1416356 proc-vmstat.numa_hit
2502 -63.5% 913.67 proc-vmstat.numa_huge_pte_updates
956729 +23.2% 1178316 proc-vmstat.numa_local
1495630 ± 2% -54.0% 687360 ± 6% proc-vmstat.numa_pte_updates
820025 +26.8% 1040066 proc-vmstat.pgfault
6.20 +9.6% 6.79 perf-stat.i.MPKI
4.203e+10 -6.8% 3.916e+10 perf-stat.i.branch-instructions
6.15 +0.2 6.30 perf-stat.i.branch-miss-rate%
2.636e+09 -6.2% 2.474e+09 perf-stat.i.branch-misses
2.56 ± 2% +8.5% 2.77 perf-stat.i.cpi
2.444e+11 -6.9% 2.276e+11 perf-stat.i.instructions
0.40 -9.1% 0.36 perf-stat.i.ipc
2.20 ± 2% +90.0% 4.18 ± 2% perf-stat.i.metric.K/sec
10808 +45.4% 15713 ± 2% perf-stat.i.minor-faults
10809 +45.4% 15713 ± 2% perf-stat.i.page-faults
6.34 +7.9% 6.84 perf-stat.overall.MPKI
37.76 -0.6 37.12 perf-stat.overall.cache-miss-rate%
2.54 +9.9% 2.80 perf-stat.overall.cpi
401.20 +1.8% 408.57 perf-stat.overall.cycles-between-cache-misses
0.39 -9.0% 0.36 perf-stat.overall.ipc
4.134e+10 ± 2% -7.4% 3.829e+10 perf-stat.ps.branch-instructions
2.592e+09 -6.7% 2.419e+09 perf-stat.ps.branch-misses
2.404e+11 ± 2% -7.4% 2.226e+11 perf-stat.ps.instructions
10358 +36.2% 14109 ± 2% perf-stat.ps.minor-faults
10358 +36.2% 14109 ± 2% perf-stat.ps.page-faults
1.525e+13 -9.9% 1.374e+13 perf-stat.total.instructions
0.60 ± 10% -0.3 0.28 ±100% perf-profile.calltrace.cycles-pp.__perf_mmap__read_init.perf_mmap__read_init.perf_mmap__push.record__mmap_read_evlist.__cmd_record
0.71 ± 4% +0.1 0.77 ± 3% perf-profile.calltrace.cycles-pp.update_load_avg.task_tick_fair.sched_tick.update_process_times.tick_nohz_handler
1.25 ± 9% +0.3 1.54 ± 4% perf-profile.calltrace.cycles-pp.update_curr.task_tick_fair.sched_tick.update_process_times.tick_nohz_handler
0.26 ±100% +0.3 0.57 ± 3% perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
2.64 ± 6% +0.4 3.09 ± 3% perf-profile.calltrace.cycles-pp.task_tick_fair.sched_tick.update_process_times.tick_nohz_handler.__hrtimer_run_queues
0.08 ±223% +0.5 0.55 ± 4% perf-profile.calltrace.cycles-pp.account_user_time.update_process_times.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt
5.00 ± 6% +0.7 5.75 ± 3% perf-profile.calltrace.cycles-pp.sched_tick.update_process_times.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt
7.66 ± 5% +1.1 8.72 ± 3% perf-profile.calltrace.cycles-pp.update_process_times.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
8.10 ± 5% +1.1 9.18 ± 3% perf-profile.calltrace.cycles-pp.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
8.38 ± 5% +1.1 9.51 ± 3% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
10.78 ± 5% +1.3 12.08 ± 3% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
11.04 ± 5% +1.3 12.36 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
11.77 ± 5% +1.4 13.19 ± 2% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
12.68 ± 5% +1.6 14.23 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt
12.84 -0.4 12.47 ± 2% perf-profile.children.cycles-pp.strcmp@plt
0.65 ± 10% -0.2 0.50 ± 30% perf-profile.children.cycles-pp.__perf_mmap__read_init
0.12 ± 3% +0.0 0.14 ± 4% perf-profile.children.cycles-pp.sched_clock
0.05 ± 8% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.rb_next
0.09 ± 7% +0.0 0.13 ± 10% perf-profile.children.cycles-pp.timerqueue_del
0.20 ± 4% +0.0 0.23 ± 4% perf-profile.children.cycles-pp.lapic_next_deadline
0.40 ± 7% +0.0 0.44 ± 5% perf-profile.children.cycles-pp.handle_softirqs
0.42 ± 4% +0.0 0.47 ± 5% perf-profile.children.cycles-pp.native_irq_return_iret
0.20 ± 9% +0.1 0.26 ± 6% perf-profile.children.cycles-pp.__cgroup_account_cputime_field
0.12 ± 4% +0.1 0.19 ± 10% perf-profile.children.cycles-pp._raw_spin_lock
0.75 ± 4% +0.1 0.82 ± 3% perf-profile.children.cycles-pp.update_load_avg
0.40 ± 4% +0.1 0.47 ± 2% perf-profile.children.cycles-pp.hrtimer_active
0.34 ± 9% +0.1 0.41 ± 5% perf-profile.children.cycles-pp.irqtime_account_irq
0.52 ± 5% +0.1 0.60 ± 3% perf-profile.children.cycles-pp.__irq_exit_rcu
0.12 ± 9% +0.1 0.20 ± 6% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.36 ± 2% +0.1 0.44 ± 3% perf-profile.children.cycles-pp.task_mm_cid_work
0.50 ± 6% +0.1 0.58 ± 4% perf-profile.children.cycles-pp.account_user_time
0.38 ± 2% +0.1 0.48 ± 3% perf-profile.children.cycles-pp.task_work_run
0.46 ± 2% +0.1 0.56 ± 3% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
1.32 ± 9% +0.3 1.64 ± 4% perf-profile.children.cycles-pp.update_curr
2.76 ± 6% +0.5 3.25 ± 3% perf-profile.children.cycles-pp.task_tick_fair
5.24 ± 6% +0.8 6.04 ± 3% perf-profile.children.cycles-pp.sched_tick
7.99 ± 5% +1.2 9.14 ± 3% perf-profile.children.cycles-pp.update_process_times
8.45 ± 5% +1.2 9.63 ± 3% perf-profile.children.cycles-pp.tick_nohz_handler
8.75 ± 5% +1.2 9.99 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
11.23 ± 5% +1.4 12.66 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
11.50 ± 5% +1.4 12.94 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
12.26 ± 5% +1.6 13.82 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
13.22 ± 5% +1.7 14.92 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
6.12 -0.2 5.92 ± 2% perf-profile.self.cycles-pp.strcmp@plt
0.09 ± 11% +0.0 0.11 ± 6% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.10 ± 9% +0.0 0.12 ± 5% perf-profile.self.cycles-pp.task_tick_fair
0.05 ± 8% +0.0 0.08 ± 10% perf-profile.self.cycles-pp.rb_next
0.20 ± 4% +0.0 0.23 ± 3% perf-profile.self.cycles-pp.lapic_next_deadline
0.42 ± 4% +0.0 0.47 ± 5% perf-profile.self.cycles-pp.native_irq_return_iret
0.16 ± 14% +0.1 0.21 ± 4% perf-profile.self.cycles-pp.__cgroup_account_cputime_field
0.12 ± 4% +0.1 0.18 ± 10% perf-profile.self.cycles-pp._raw_spin_lock
0.24 ± 9% +0.1 0.30 ± 7% perf-profile.self.cycles-pp.irqtime_account_irq
0.40 ± 4% +0.1 0.46 ± 3% perf-profile.self.cycles-pp.hrtimer_active
0.10 ± 3% +0.1 0.18 ± 8% perf-profile.self.cycles-pp.hrtimer_interrupt
0.11 ± 11% +0.1 0.20 ± 6% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
0.33 ± 3% +0.1 0.42 ± 4% perf-profile.self.cycles-pp.task_mm_cid_work
0.88 ± 11% +0.2 1.05 ± 5% perf-profile.self.cycles-pp.update_curr
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [linus:master] [mm, mmap] d4148aeab4: will-it-scale.per_process_ops 3888.9% improvement
2024-11-07 14:10 [linus:master] [mm, mmap] d4148aeab4: will-it-scale.per_process_ops 3888.9% improvement kernel test robot
@ 2024-11-17 16:42 ` Vlastimil Babka
0 siblings, 0 replies; 2+ messages in thread
From: Vlastimil Babka @ 2024-11-17 16:42 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Michael Matz,
Gabriel Krisman Bertazi, Matthias Bodenbinder, Lorenzo Stoakes,
Yang Shi, Rik van Riel, Jann Horn, Liam R. Howlett, Petr Tesarik,
Thorsten Leemhuis, linux-mm, ying.huang, feng.tang, fengwei.yin,
Pedro Falcato
On 11/7/24 15:10, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 3888.9% improvement of will-it-scale.per_process_ops on:
>
>
> commit: d4148aeab412432bf928f311eca8a2ba52bb05df ("mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linux-next/master 5b913f5d7d7fe0f567dea8605f21da6eaa1735fb]
>
> testcase: will-it-scale
> config: x86_64-rhel-8.3
> compiler: gcc-12
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> parameters:
>
> nr_task: 100%
> mode: process
> test: malloc1
> cpufreq_governor: performance
Since this report is now linked from youtube videos with 200k views, maybe
someone will appreciate reading about what's actually happening here, in
more detail that would be normally sufficient for linux-mm. Thanks to Pedro
who initially brought up what malloc1 does on IRC.
The test [1] just performs repeatedly a malloc()/free() of a 128MB large buffer.
#define SIZE (128UL * 1024 * 1024)
while(1) {
void *addr = malloc(SIZE);
assert(addr != NULL);
free(addr);
}
We can compile the same code outside of the test, without the iterations,
but printing the addr and pausing the process so we can look at
/proc/$pid/smaps.
Before the commit was applied I get:
addr: 0x7f0f62a00010
And the corresponding area in smaps is:
7f0f62a00000-7f0f6aa01000 rw-p 00000000 00:00 0
Size: 131076 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 2048 kB
Pss: 2048 kB
Pss_Dirty: 2048 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 2048 kB
Referenced: 2048 kB
Anonymous: 2048 kB
KSM: 0 kB
LazyFree: 0 kB
AnonHugePages: 2048 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 1
We can notice the 128MB become 128MB+4kB and he address returned by malloc()
is offset by 16bytes (0x10) from the mmapped aread.
So we have
7f0f62a00000 - start of mmapped area
7f0f62a00010 - start of malloc()-returned buffer
7f0f6aa00010 - end of malloc()'d buffer (128MB)
7f0f6aa01000 - end of mmapped area
malloc() AFAIK would normally manage some large-ish arena allocated by a
single mmap() and give out addresses to smaller allocations from that, but
any request as large as 128MB is turned directly into a mmap(). But malloc()
has to remember how large the allocation was in order to turn free(addr)
into an appropriate munmap(addr, size), so it prepends the 16 byte header to
store the size, but any mmap() needs to be rounded up to page size so this
16 bytes become 4kB and the mmap() is not exactly 128MB but 4kB larger.
Before commit d4148aeab412, due to commit efa7df3e3bb5 the implementation is
that mmap() >= 2MB will have the start of the area aligned to 2MB, which we
can see from the address 7f0f62a00000 which is divisible by 2MB.
The fields Rss: 2048 kB and AnonHugePages: 2048 kB above also reveal that
when malloc() wrote its 16 byte header, a transparent hugepage (THP) was
allocated for the first 2MB of the area.
After commit d4148aeab412 we get
addr: 0x7fd7b9998010
7fd7b9998000-7fd7c199c000 rw-p 00000000 00:00 0
Size: 131088 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 12 kB
Pss: 12 kB
Pss_Dirty: 12 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 12 kB
Referenced: 12 kB
Anonymous: 12 kB
KSM: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 1
Again mmap() was done with an extra page for the 16byte header, but the area
is no longer aligned to 2MB because the size of the area is not multiple of
2MB. Thus the header cannot be populated with a 2MB THP but only a 4kB base
page.
And that's the whole difference. The test allocates 128MB via malloc() but
doesn't actually touch the memory at all. So only the page with the header
is faulted, and obviously it's much faster to allocate and clear (write with
zeros, as the kernel has to do for userspace pages) a 4kB page instead of
2MB. But it's an artifact of the benchmark - we can assume a program that
allocates memory with malloc() would also use it, and then most of the 128MB
+ 4kB would get backed with THPs and the differences would largely disappear.
Also you might be wondering - if improvement was reported for malloc1 due to
d4148aeab412432, shouldn't have a regression been reported in the past for
efa7df3e3bb5? In fact it was, actually a multiple regressions for various
prior versions of the patch that eventually became commit efa7df3e3bb5, but
this particular one in malloc1 was (rightfully) dismissed as not important
enough [2]
[1] https://github.com/antonblanchard/will-it-scale/blob/master/tests/malloc1.c
[2] https://lore.kernel.org/all/87edv4r2ip.fsf@yhuang6-desk2.ccr.corp.intel.com/
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+---------------------------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.radixsort.ops_per_sec 9.2% regression |
BTW, nobody seems to have noticed this part, which looks like a regression
report :)
> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
> | test parameters | cpufreq_governor=performance |
> | | nr_threads=100% |
> | | test=radixsort |
> | | testtime=60s |
> +------------------+---------------------------------------------------------------------------------------------+
>
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20241107/202411072132.a8d2cf0f-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-cpl-4sp2/malloc1/will-it-scale
>
> commit:
> 15e8156713 ("mm: shrinker: avoid memleak in alloc_shrinker_info")
> d4148aeab4 ("mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes")
>
> 15e8156713cc3803 d4148aeab412432bf928f311eca
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
<snip>
> 89.46 -89.3 0.20 ± 22% perf-profile.children.cycles-pp.clear_page_erms
This confirms we are clearing much less memory (4KB instead of 2MB) because
we spend way less time in the clearing function.
> ***************************************************************************************************
> lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/radixsort/stress-ng/60s
And this is the regression report of another benchmark that nobody noticed.
> commit:
> 15e8156713 ("mm: shrinker: avoid memleak in alloc_shrinker_info")
> d4148aeab4 ("mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes")
>
> 15e8156713cc3803 d4148aeab412432bf928f311eca
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 18.98 ± 5% +30.5% 24.78 ± 14% sched_debug.cpu.clock.stddev
> 0.05 ± 7% +886.3% 0.50 ± 86% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> 1.14 ± 74% -55.5% 0.51 ± 37% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 265008 ± 13% -37.2% 166403 ± 22% meminfo.Active
> 264976 ± 13% -37.2% 166371 ± 22% meminfo.Active(anon)
> 1315990 -63.8% 475807 ± 3% meminfo.AnonHugePages
So we get fewer THPs but unlike the malloc1 test, this one seems to be
benefiting from them, so having fewer of them causes a regression.
Maybe it does mmap()'s of some unfortunate size like 3MB?
Before commit d4148aeab412 such mmap() would be guaranteed a THP for the
first 2MB and no THP's for the other 1MB, i.e.:
| [ 2 MB | 1 MB ]
But now the result can be [ 1.5MB | 1.5 MB ]
(where | is a 2MB boundary) hence no THP backing at all.
This probably just shows that no heuristic can be optimal for every possible
use case.
> 164713 -9.2% 149574 stress-ng.radixsort.ops
> 2745 -9.2% 2492 stress-ng.radixsort.ops_per_sec
> 78925 +289.9% 307739 stress-ng.time.minor_page_faults
> 327.88 ± 3% -67.1% 107.89 ± 3% numa-vmstat.node0.nr_anon_transparent_hugepages
> 63711 ± 11% -40.7% 37791 ± 25% numa-vmstat.node1.nr_active_anon
> 317.83 ± 3% -60.5% 125.70 ± 8% numa-vmstat.node1.nr_anon_transparent_hugepages
> 293623 ± 18% +35.1% 396637 ± 14% numa-vmstat.node1.nr_inactive_anon
> 63710 ± 11% -40.7% 37791 ± 25% numa-vmstat.node1.nr_zone_active_anon
> 293622 ± 18% +35.1% 396635 ± 14% numa-vmstat.node1.nr_zone_inactive_anon
> 668445 ± 3% -67.1% 220222 ± 3% numa-meminfo.node0.AnonHugePages
> 1317034 ± 13% -23.2% 1010970 ± 16% numa-meminfo.node0.AnonPages.max
> 254241 ± 11% -40.5% 151233 ± 26% numa-meminfo.node1.Active
> 254219 ± 11% -40.5% 151227 ± 26% numa-meminfo.node1.Active(anon)
> 647491 ± 3% -60.4% 256719 ± 8% numa-meminfo.node1.AnonHugePages
> 1170941 ± 18% +35.3% 1583801 ± 14% numa-meminfo.node1.Inactive
> 1170783 ± 18% +35.3% 1583760 ± 14% numa-meminfo.node1.Inactive(anon)
> 66923 ± 13% -36.6% 42442 ± 22% proc-vmstat.nr_active_anon
> 642.60 -63.8% 232.40 ± 3% proc-vmstat.nr_anon_transparent_hugepages
> 597472 ± 2% +5.6% 630776 proc-vmstat.nr_inactive_anon
> 66923 ± 13% -36.6% 42442 ± 22% proc-vmstat.nr_zone_active_anon
> 597472 ± 2% +5.6% 630776 proc-vmstat.nr_zone_inactive_anon
> 1188533 +19.2% 1416356 proc-vmstat.numa_hit
> 2502 -63.5% 913.67 proc-vmstat.numa_huge_pte_updates
> 956729 +23.2% 1178316 proc-vmstat.numa_local
> 1495630 ± 2% -54.0% 687360 ± 6% proc-vmstat.numa_pte_updates
> 820025 +26.8% 1040066 proc-vmstat.pgfault
> 6.20 +9.6% 6.79 perf-stat.i.MPKI
> 4.203e+10 -6.8% 3.916e+10 perf-stat.i.branch-instructions
> 6.15 +0.2 6.30 perf-stat.i.branch-miss-rate%
> 2.636e+09 -6.2% 2.474e+09 perf-stat.i.branch-misses
> 2.56 ± 2% +8.5% 2.77 perf-stat.i.cpi
> 2.444e+11 -6.9% 2.276e+11 perf-stat.i.instructions
> 0.40 -9.1% 0.36 perf-stat.i.ipc
> 2.20 ± 2% +90.0% 4.18 ± 2% perf-stat.i.metric.K/sec
> 10808 +45.4% 15713 ± 2% perf-stat.i.minor-faults
> 10809 +45.4% 15713 ± 2% perf-stat.i.page-faults
> 6.34 +7.9% 6.84 perf-stat.overall.MPKI
> 37.76 -0.6 37.12 perf-stat.overall.cache-miss-rate%
> 2.54 +9.9% 2.80 perf-stat.overall.cpi
> 401.20 +1.8% 408.57 perf-stat.overall.cycles-between-cache-misses
> 0.39 -9.0% 0.36 perf-stat.overall.ipc
> 4.134e+10 ± 2% -7.4% 3.829e+10 perf-stat.ps.branch-instructions
> 2.592e+09 -6.7% 2.419e+09 perf-stat.ps.branch-misses
> 2.404e+11 ± 2% -7.4% 2.226e+11 perf-stat.ps.instructions
> 10358 +36.2% 14109 ± 2% perf-stat.ps.minor-faults
> 10358 +36.2% 14109 ± 2% perf-stat.ps.page-faults
> 1.525e+13 -9.9% 1.374e+13 perf-stat.total.instructions
> 0.60 ± 10% -0.3 0.28 ±100% perf-profile.calltrace.cycles-pp.__perf_mmap__read_init.perf_mmap__read_init.perf_mmap__push.record__mmap_read_evlist.__cmd_record
> 0.71 ± 4% +0.1 0.77 ± 3% perf-profile.calltrace.cycles-pp.update_load_avg.task_tick_fair.sched_tick.update_process_times.tick_nohz_handler
> 1.25 ± 9% +0.3 1.54 ± 4% perf-profile.calltrace.cycles-pp.update_curr.task_tick_fair.sched_tick.update_process_times.tick_nohz_handler
> 0.26 ±100% +0.3 0.57 ± 3% perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
> 2.64 ± 6% +0.4 3.09 ± 3% perf-profile.calltrace.cycles-pp.task_tick_fair.sched_tick.update_process_times.tick_nohz_handler.__hrtimer_run_queues
> 0.08 ±223% +0.5 0.55 ± 4% perf-profile.calltrace.cycles-pp.account_user_time.update_process_times.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt
> 5.00 ± 6% +0.7 5.75 ± 3% perf-profile.calltrace.cycles-pp.sched_tick.update_process_times.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt
> 7.66 ± 5% +1.1 8.72 ± 3% perf-profile.calltrace.cycles-pp.update_process_times.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
> 8.10 ± 5% +1.1 9.18 ± 3% perf-profile.calltrace.cycles-pp.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
> 8.38 ± 5% +1.1 9.51 ± 3% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
> 10.78 ± 5% +1.3 12.08 ± 3% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
> 11.04 ± 5% +1.3 12.36 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
> 11.77 ± 5% +1.4 13.19 ± 2% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
> 12.68 ± 5% +1.6 14.23 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt
> 12.84 -0.4 12.47 ± 2% perf-profile.children.cycles-pp.strcmp@plt
> 0.65 ± 10% -0.2 0.50 ± 30% perf-profile.children.cycles-pp.__perf_mmap__read_init
> 0.12 ± 3% +0.0 0.14 ± 4% perf-profile.children.cycles-pp.sched_clock
> 0.05 ± 8% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.rb_next
> 0.09 ± 7% +0.0 0.13 ± 10% perf-profile.children.cycles-pp.timerqueue_del
> 0.20 ± 4% +0.0 0.23 ± 4% perf-profile.children.cycles-pp.lapic_next_deadline
> 0.40 ± 7% +0.0 0.44 ± 5% perf-profile.children.cycles-pp.handle_softirqs
> 0.42 ± 4% +0.0 0.47 ± 5% perf-profile.children.cycles-pp.native_irq_return_iret
> 0.20 ± 9% +0.1 0.26 ± 6% perf-profile.children.cycles-pp.__cgroup_account_cputime_field
> 0.12 ± 4% +0.1 0.19 ± 10% perf-profile.children.cycles-pp._raw_spin_lock
> 0.75 ± 4% +0.1 0.82 ± 3% perf-profile.children.cycles-pp.update_load_avg
> 0.40 ± 4% +0.1 0.47 ± 2% perf-profile.children.cycles-pp.hrtimer_active
> 0.34 ± 9% +0.1 0.41 ± 5% perf-profile.children.cycles-pp.irqtime_account_irq
> 0.52 ± 5% +0.1 0.60 ± 3% perf-profile.children.cycles-pp.__irq_exit_rcu
> 0.12 ± 9% +0.1 0.20 ± 6% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
> 0.36 ± 2% +0.1 0.44 ± 3% perf-profile.children.cycles-pp.task_mm_cid_work
> 0.50 ± 6% +0.1 0.58 ± 4% perf-profile.children.cycles-pp.account_user_time
> 0.38 ± 2% +0.1 0.48 ± 3% perf-profile.children.cycles-pp.task_work_run
> 0.46 ± 2% +0.1 0.56 ± 3% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
> 1.32 ± 9% +0.3 1.64 ± 4% perf-profile.children.cycles-pp.update_curr
> 2.76 ± 6% +0.5 3.25 ± 3% perf-profile.children.cycles-pp.task_tick_fair
> 5.24 ± 6% +0.8 6.04 ± 3% perf-profile.children.cycles-pp.sched_tick
> 7.99 ± 5% +1.2 9.14 ± 3% perf-profile.children.cycles-pp.update_process_times
> 8.45 ± 5% +1.2 9.63 ± 3% perf-profile.children.cycles-pp.tick_nohz_handler
> 8.75 ± 5% +1.2 9.99 ± 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
> 11.23 ± 5% +1.4 12.66 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
> 11.50 ± 5% +1.4 12.94 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> 12.26 ± 5% +1.6 13.82 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> 13.22 ± 5% +1.7 14.92 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> 6.12 -0.2 5.92 ± 2% perf-profile.self.cycles-pp.strcmp@plt
> 0.09 ± 11% +0.0 0.11 ± 6% perf-profile.self.cycles-pp.__hrtimer_run_queues
> 0.10 ± 9% +0.0 0.12 ± 5% perf-profile.self.cycles-pp.task_tick_fair
> 0.05 ± 8% +0.0 0.08 ± 10% perf-profile.self.cycles-pp.rb_next
> 0.20 ± 4% +0.0 0.23 ± 3% perf-profile.self.cycles-pp.lapic_next_deadline
> 0.42 ± 4% +0.0 0.47 ± 5% perf-profile.self.cycles-pp.native_irq_return_iret
> 0.16 ± 14% +0.1 0.21 ± 4% perf-profile.self.cycles-pp.__cgroup_account_cputime_field
> 0.12 ± 4% +0.1 0.18 ± 10% perf-profile.self.cycles-pp._raw_spin_lock
> 0.24 ± 9% +0.1 0.30 ± 7% perf-profile.self.cycles-pp.irqtime_account_irq
> 0.40 ± 4% +0.1 0.46 ± 3% perf-profile.self.cycles-pp.hrtimer_active
> 0.10 ± 3% +0.1 0.18 ± 8% perf-profile.self.cycles-pp.hrtimer_interrupt
> 0.11 ± 11% +0.1 0.20 ± 6% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
> 0.33 ± 3% +0.1 0.42 ± 4% perf-profile.self.cycles-pp.task_mm_cid_work
> 0.88 ± 11% +0.2 1.05 ± 5% perf-profile.self.cycles-pp.update_curr
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-11-17 16:42 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-07 14:10 [linus:master] [mm, mmap] d4148aeab4: will-it-scale.per_process_ops 3888.9% improvement kernel test robot
2024-11-17 16:42 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox