* [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
@ 2023-12-19 15:41 kernel test robot
2023-12-20 5:27 ` Yang Shi
0 siblings, 1 reply; 24+ messages in thread
From: kernel test robot @ 2023-12-19 15:41 UTC (permalink / raw)
To: Rik van Riel
Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
Yang Shi, Matthew Wilcox, Christopher Lameter, ying.huang,
feng.tang, fengwei.yin, oliver.sang
Hello,
for this commit, we reported
"[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression"
in Aug, 2022 when it's in linux-next/master
https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
later, we reported
"[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
in Oct, 2022 when it's in linus/master
https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
and the commit was reverted finally by
commit 0ba09b1733878afe838fe35c310715fda3d46428
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun Dec 4 12:51:59 2022 -0800
now we noticed it goes into linux-next/master again.
we are not sure if there is an agreement that the benefit of this commit
has already overweight performance drop in some mirco benchmark.
we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
that
"This patch was applied to v6.1, but was reverted due to a regression
report. However it turned out the regression was not due to this patch.
I ping'ed Andrew to reapply this patch, Andrew may forget it. This
patch helps promote THP, so I rebased it onto the latest mm-unstable."
however, unfortunately, in our latest tests, we still observed below regression
upon this commit. just FYI.
kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: stress-ng
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
parameters:
nr_threads: 1
disk: 1HDD
testtime: 60s
fs: ext4
class: os
test: pthread
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression |
| test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory |
| test parameters | array_size=50000000 |
| | cpufreq_governor=performance |
| | iterations=10x |
| | loop=100 |
| | nr_threads=25% |
| | omp=true |
+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression |
| test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=Average |
| | option_b=Integer |
| | test=ramspeed-1.4.3 |
+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
| test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=Average |
| | option_b=Floating Point |
| | test=ramspeed-1.4.3 |
+------------------+-----------------------------------------------------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
commit:
30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
---------------- ---------------------------
%stddev %change %stddev
\ | \
13405796 -65.5% 4620124 cpuidle..usage
8.00 +8.2% 8.66 ± 2% iostat.cpu.system
1.61 -60.6% 0.63 iostat.cpu.user
597.50 ± 14% -64.3% 213.50 ± 14% perf-c2c.DRAM.local
1882 ± 14% -74.7% 476.83 ± 7% perf-c2c.HITM.local
3768436 -12.9% 3283395 vmstat.memory.cache
355105 -75.7% 86344 ± 3% vmstat.system.cs
385435 -20.7% 305714 ± 3% vmstat.system.in
1.13 -0.2 0.88 mpstat.cpu.all.irq%
0.29 -0.2 0.10 ± 2% mpstat.cpu.all.soft%
6.76 ± 2% +1.1 7.88 ± 2% mpstat.cpu.all.sys%
1.62 -1.0 0.62 ± 2% mpstat.cpu.all.usr%
2234397 -84.3% 350161 ± 5% stress-ng.pthread.ops
37237 -84.3% 5834 ± 5% stress-ng.pthread.ops_per_sec
294706 ± 2% -68.0% 94191 ± 6% stress-ng.time.involuntary_context_switches
41442 ± 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size
4466457 -83.9% 717053 ± 5% stress-ng.time.minor_page_faults
243.33 +13.5% 276.17 ± 3% stress-ng.time.percent_of_cpu_this_job_got
131.64 +27.7% 168.11 ± 3% stress-ng.time.system_time
19.73 -82.1% 3.53 ± 4% stress-ng.time.user_time
7715609 -80.2% 1530125 ± 4% stress-ng.time.voluntary_context_switches
494566 -59.5% 200338 ± 3% meminfo.Active
478287 -61.5% 184050 ± 3% meminfo.Active(anon)
58549 ± 17% +1532.8% 956006 ± 14% meminfo.AnonHugePages
424631 +194.9% 1252445 ± 10% meminfo.AnonPages
3677263 -13.0% 3197755 meminfo.Cached
5829485 ± 4% -19.0% 4724784 ± 10% meminfo.Committed_AS
692486 +108.6% 1444669 ± 8% meminfo.Inactive
662179 +113.6% 1414338 ± 9% meminfo.Inactive(anon)
182416 -50.2% 90759 meminfo.Mapped
4614466 +10.0% 5076604 ± 2% meminfo.Memused
6985 +47.6% 10307 ± 4% meminfo.PageTables
718445 -66.7% 238913 ± 3% meminfo.Shmem
35906 -20.7% 28471 ± 3% meminfo.VmallocUsed
4838522 +25.6% 6075302 meminfo.max_used_kB
488.83 -20.9% 386.67 ± 2% turbostat.Avg_MHz
12.95 -2.7 10.26 ± 2% turbostat.Busy%
7156734 -87.2% 919149 ± 4% turbostat.C1
10.59 -8.9 1.65 ± 5% turbostat.C1%
3702647 -55.1% 1663518 ± 2% turbostat.C1E
32.99 -20.6 12.36 ± 3% turbostat.C1E%
1161078 +64.5% 1909611 turbostat.C6
44.25 +31.8 76.10 turbostat.C6%
0.18 -33.3% 0.12 turbostat.IPC
74338573 ± 2% -33.9% 49159610 ± 4% turbostat.IRQ
1381661 -91.0% 124075 ± 6% turbostat.POLL
0.26 -0.2 0.04 ± 12% turbostat.POLL%
96.15 -5.4% 90.95 turbostat.PkgWatt
12.12 +19.3% 14.46 turbostat.RAMWatt
119573 -61.5% 46012 ± 3% proc-vmstat.nr_active_anon
106168 +195.8% 314047 ± 10% proc-vmstat.nr_anon_pages
28.60 ± 17% +1538.5% 468.68 ± 14% proc-vmstat.nr_anon_transparent_hugepages
923365 -13.0% 803489 proc-vmstat.nr_file_pages
165571 +113.5% 353493 ± 9% proc-vmstat.nr_inactive_anon
45605 -50.2% 22690 proc-vmstat.nr_mapped
1752 +47.1% 2578 ± 4% proc-vmstat.nr_page_table_pages
179613 -66.7% 59728 ± 3% proc-vmstat.nr_shmem
21490 -2.4% 20981 proc-vmstat.nr_slab_reclaimable
28260 -7.3% 26208 proc-vmstat.nr_slab_unreclaimable
119573 -61.5% 46012 ± 3% proc-vmstat.nr_zone_active_anon
165570 +113.5% 353492 ± 9% proc-vmstat.nr_zone_inactive_anon
17343640 -76.3% 4116748 ± 4% proc-vmstat.numa_hit
17364975 -76.3% 4118098 ± 4% proc-vmstat.numa_local
249252 -66.2% 84187 ± 2% proc-vmstat.pgactivate
27528916 +567.1% 1.836e+08 ± 5% proc-vmstat.pgalloc_normal
4912427 -79.2% 1019949 ± 3% proc-vmstat.pgfault
27227124 +574.1% 1.835e+08 ± 5% proc-vmstat.pgfree
8728 +3896.4% 348802 ± 5% proc-vmstat.thp_deferred_split_page
8730 +3895.3% 348814 ± 5% proc-vmstat.thp_fault_alloc
8728 +3896.4% 348802 ± 5% proc-vmstat.thp_split_pmd
316745 -21.5% 248756 ± 4% sched_debug.cfs_rq:/.avg_vruntime.avg
112735 ± 4% -34.3% 74061 ± 6% sched_debug.cfs_rq:/.avg_vruntime.min
0.49 ± 6% -17.2% 0.41 ± 8% sched_debug.cfs_rq:/.h_nr_running.stddev
12143 ±120% -99.9% 15.70 ±116% sched_debug.cfs_rq:/.left_vruntime.avg
414017 ±126% -99.9% 428.50 ±102% sched_debug.cfs_rq:/.left_vruntime.max
68492 ±125% -99.9% 78.15 ±106% sched_debug.cfs_rq:/.left_vruntime.stddev
41917 ± 24% -48.3% 21690 ± 57% sched_debug.cfs_rq:/.load.avg
176151 ± 30% -56.9% 75963 ± 57% sched_debug.cfs_rq:/.load.stddev
6489 ± 17% -29.0% 4608 ± 12% sched_debug.cfs_rq:/.load_avg.max
4.42 ± 45% -81.1% 0.83 ± 74% sched_debug.cfs_rq:/.load_avg.min
1112 ± 17% -31.0% 767.62 ± 11% sched_debug.cfs_rq:/.load_avg.stddev
316745 -21.5% 248756 ± 4% sched_debug.cfs_rq:/.min_vruntime.avg
112735 ± 4% -34.3% 74061 ± 6% sched_debug.cfs_rq:/.min_vruntime.min
0.49 ± 6% -17.2% 0.41 ± 8% sched_debug.cfs_rq:/.nr_running.stddev
12144 ±120% -99.9% 15.70 ±116% sched_debug.cfs_rq:/.right_vruntime.avg
414017 ±126% -99.9% 428.50 ±102% sched_debug.cfs_rq:/.right_vruntime.max
68492 ±125% -99.9% 78.15 ±106% sched_debug.cfs_rq:/.right_vruntime.stddev
14.25 ± 44% -76.6% 3.33 ± 58% sched_debug.cfs_rq:/.runnable_avg.min
11.58 ± 49% -77.7% 2.58 ± 58% sched_debug.cfs_rq:/.util_avg.min
423972 ± 23% +59.3% 675379 ± 3% sched_debug.cpu.avg_idle.avg
5720 ± 43% +439.5% 30864 sched_debug.cpu.avg_idle.min
99.79 ± 2% -23.7% 76.11 ± 2% sched_debug.cpu.clock_task.stddev
162475 ± 49% -95.8% 6813 ± 26% sched_debug.cpu.curr->pid.avg
1061268 -84.0% 170212 ± 4% sched_debug.cpu.curr->pid.max
365404 ± 20% -91.3% 31839 ± 10% sched_debug.cpu.curr->pid.stddev
0.51 ± 3% -20.1% 0.41 ± 9% sched_debug.cpu.nr_running.stddev
311923 -74.2% 80615 ± 2% sched_debug.cpu.nr_switches.avg
565973 ± 4% -77.8% 125597 ± 10% sched_debug.cpu.nr_switches.max
192666 ± 4% -70.6% 56695 ± 6% sched_debug.cpu.nr_switches.min
67485 ± 8% -79.9% 13558 ± 10% sched_debug.cpu.nr_switches.stddev
2.62 +102.1% 5.30 perf-stat.i.MPKI
2.09e+09 -47.6% 1.095e+09 ± 4% perf-stat.i.branch-instructions
1.56 -0.5 1.01 perf-stat.i.branch-miss-rate%
31951200 -60.9% 12481432 ± 2% perf-stat.i.branch-misses
19.38 +23.7 43.08 perf-stat.i.cache-miss-rate%
26413597 -5.7% 24899132 ± 4% perf-stat.i.cache-misses
1.363e+08 -58.3% 56906133 ± 4% perf-stat.i.cache-references
370628 -75.8% 89743 ± 3% perf-stat.i.context-switches
1.77 +65.1% 2.92 ± 2% perf-stat.i.cpi
1.748e+10 -21.8% 1.367e+10 ± 2% perf-stat.i.cpu-cycles
61611 -79.1% 12901 ± 6% perf-stat.i.cpu-migrations
716.97 ± 2% -17.2% 593.35 ± 2% perf-stat.i.cycles-between-cache-misses
0.12 ± 4% -0.1 0.05 perf-stat.i.dTLB-load-miss-rate%
3066100 ± 3% -81.3% 573066 ± 5% perf-stat.i.dTLB-load-misses
2.652e+09 -50.1% 1.324e+09 ± 4% perf-stat.i.dTLB-loads
0.08 ± 2% -0.0 0.03 perf-stat.i.dTLB-store-miss-rate%
1168195 ± 2% -82.9% 199438 ± 5% perf-stat.i.dTLB-store-misses
1.478e+09 -56.8% 6.384e+08 ± 3% perf-stat.i.dTLB-stores
8080423 -73.2% 2169371 ± 3% perf-stat.i.iTLB-load-misses
5601321 -74.3% 1440571 ± 2% perf-stat.i.iTLB-loads
1.028e+10 -49.7% 5.173e+09 ± 4% perf-stat.i.instructions
1450 +73.1% 2511 ± 2% perf-stat.i.instructions-per-iTLB-miss
0.61 -35.9% 0.39 perf-stat.i.ipc
0.48 -21.4% 0.38 ± 2% perf-stat.i.metric.GHz
616.28 -17.6% 507.69 ± 4% perf-stat.i.metric.K/sec
175.16 -50.8% 86.18 ± 4% perf-stat.i.metric.M/sec
76728 -80.8% 14724 ± 4% perf-stat.i.minor-faults
5600408 -61.4% 2160997 ± 5% perf-stat.i.node-loads
8873996 +52.1% 13499744 ± 5% perf-stat.i.node-stores
112409 -81.9% 20305 ± 4% perf-stat.i.page-faults
2.55 +89.6% 4.83 perf-stat.overall.MPKI
1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate%
19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate%
1.70 +56.4% 2.65 perf-stat.overall.cpi
665.84 -17.5% 549.51 ± 2% perf-stat.overall.cycles-between-cache-misses
0.12 ± 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate%
0.08 ± 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate%
59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate%
1278 +86.1% 2379 ± 2% perf-stat.overall.instructions-per-iTLB-miss
0.59 -36.1% 0.38 perf-stat.overall.ipc
2.078e+09 -48.3% 1.074e+09 ± 4% perf-stat.ps.branch-instructions
31292687 -61.2% 12133349 ± 2% perf-stat.ps.branch-misses
26057291 -5.9% 24512034 ± 4% perf-stat.ps.cache-misses
1.353e+08 -58.6% 56072195 ± 4% perf-stat.ps.cache-references
365254 -75.8% 88464 ± 3% perf-stat.ps.context-switches
1.735e+10 -22.4% 1.346e+10 ± 2% perf-stat.ps.cpu-cycles
60838 -79.1% 12727 ± 6% perf-stat.ps.cpu-migrations
3056601 ± 4% -81.5% 565354 ± 4% perf-stat.ps.dTLB-load-misses
2.636e+09 -50.7% 1.3e+09 ± 4% perf-stat.ps.dTLB-loads
1155253 ± 2% -83.0% 196581 ± 5% perf-stat.ps.dTLB-store-misses
1.473e+09 -57.4% 6.268e+08 ± 3% perf-stat.ps.dTLB-stores
7997726 -73.3% 2131477 ± 3% perf-stat.ps.iTLB-load-misses
5521346 -74.3% 1418623 ± 2% perf-stat.ps.iTLB-loads
1.023e+10 -50.4% 5.073e+09 ± 4% perf-stat.ps.instructions
75671 -80.9% 14479 ± 4% perf-stat.ps.minor-faults
5549722 -61.4% 2141750 ± 4% perf-stat.ps.node-loads
8769156 +51.6% 13296579 ± 5% perf-stat.ps.node-stores
110795 -82.0% 19977 ± 4% perf-stat.ps.page-faults
6.482e+11 -50.7% 3.197e+11 ± 4% perf-stat.total.instructions
0.00 ± 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
0.01 ± 18% +8373.1% 0.73 ± 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
0.01 ± 16% +4600.0% 0.38 ± 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
0.01 ±204% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
0.01 ± 8% +3678.9% 0.36 ± 79% perf-sched.sch_delay.avg.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
0.01 ± 14% -38.5% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
0.01 ± 5% +2946.2% 0.26 ± 43% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
0.00 ± 14% +125.0% 0.01 ± 12% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.02 ±170% -83.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 ± 69% +6578.6% 0.31 ± 4% perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
0.00 +100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
0.02 ± 86% +4234.4% 0.65 ± 4% perf-sched.sch_delay.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
0.01 ± 6% +6054.3% 0.47 perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
0.00 ± 14% +195.2% 0.01 ± 89% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.00 ±102% +340.0% 0.01 ± 85% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.00 +100.0% 0.00 perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.00 ± 11% +66.7% 0.01 ± 21% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.01 ± 89% +1096.1% 0.15 ± 30% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
0.00 +141.7% 0.01 ± 61% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.00 ±223% +9975.0% 0.07 ±203% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
0.00 ± 10% +789.3% 0.04 ± 69% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.00 ± 31% +6691.3% 0.26 ± 5% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
0.00 ± 28% +14612.5% 0.59 ± 4% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
0.00 ± 24% +4904.2% 0.20 ± 4% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
0.00 ± 28% +450.0% 0.01 ± 74% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.00 ± 17% +984.6% 0.02 ± 79% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.00 ± 20% +231.8% 0.01 ± 89% perf-sched.sch_delay.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.submit_bio_wait
0.00 +350.0% 0.01 ± 16% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.02 ± 16% +320.2% 0.07 ± 2% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.02 ± 2% +282.1% 0.09 ± 5% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.00 ± 14% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
0.05 ± 35% +3784.5% 1.92 ± 16% perf-sched.sch_delay.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
0.29 ±128% +563.3% 1.92 ± 7% perf-sched.sch_delay.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
0.14 ±217% -99.7% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
0.03 ± 49% -74.0% 0.01 ± 51% perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.01 ± 54% -57.4% 0.00 ± 75% perf-sched.sch_delay.max.ms.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
0.12 ± 21% +873.0% 1.19 ± 60% perf-sched.sch_delay.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
2.27 ±220% -99.7% 0.01 ± 19% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
0.02 ± 36% -54.4% 0.01 ± 55% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
0.04 ± 36% -77.1% 0.01 ± 31% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
0.12 ± 32% +1235.8% 1.58 ± 31% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
2.25 ±218% -99.3% 0.02 ± 52% perf-sched.sch_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.01 ± 85% +19836.4% 2.56 ± 7% perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
0.03 ± 70% -93.6% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
0.10 ± 16% +2984.2% 3.21 ± 6% perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
0.01 ± 20% +883.9% 0.05 ±177% perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.01 ± 15% +694.7% 0.08 ±123% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.00 ±223% +6966.7% 0.07 ±199% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
0.01 ± 38% +8384.6% 0.55 ± 72% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.01 ± 13% +12995.7% 1.51 ±103% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
117.80 ± 56% -96.4% 4.26 ± 36% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.01 ± 68% +331.9% 0.03 perf-sched.total_sch_delay.average.ms
4.14 +242.6% 14.20 ± 4% perf-sched.total_wait_and_delay.average.ms
700841 -69.6% 212977 ± 3% perf-sched.total_wait_and_delay.count.ms
4.14 +242.4% 14.16 ± 4% perf-sched.total_wait_time.average.ms
11.68 ± 8% +213.3% 36.59 ± 28% perf-sched.wait_and_delay.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
10.00 ± 2% +226.1% 32.62 ± 20% perf-sched.wait_and_delay.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
10.55 ± 3% +259.8% 37.96 ± 7% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
9.80 ± 12% +196.5% 29.07 ± 32% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
9.80 ± 4% +234.9% 32.83 ± 14% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
10.32 ± 2% +223.8% 33.42 ± 6% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
8.15 ± 14% +271.3% 30.25 ± 35% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
9.60 ± 4% +240.8% 32.73 ± 16% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
10.37 ± 4% +232.0% 34.41 ± 10% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
7.32 ± 46% +269.7% 27.07 ± 49% perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
9.88 +236.2% 33.23 ± 4% perf-sched.wait_and_delay.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
4.44 ± 4% +379.0% 21.27 ± 18% perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
10.05 ± 2% +235.6% 33.73 ± 11% perf-sched.wait_and_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.03 +462.6% 0.15 ± 6% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.78 ± 4% +482.1% 39.46 ± 3% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
3.17 +683.3% 24.85 ± 8% perf-sched.wait_and_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
36.64 ± 13% +244.7% 126.32 ± 6% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
9.81 +302.4% 39.47 ± 4% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
1.05 +48.2% 1.56 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
0.93 +14.2% 1.06 ± 2% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
9.93 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
12.02 ± 3% +139.8% 28.83 ± 6% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
6.09 ± 2% +403.0% 30.64 ± 5% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
23.17 ± 19% -83.5% 3.83 ±143% perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio
79.83 ± 9% -55.1% 35.83 ± 16% perf-sched.wait_and_delay.count.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
14.83 ± 14% -59.6% 6.00 ± 56% perf-sched.wait_and_delay.count.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
8.50 ± 17% -80.4% 1.67 ± 89% perf-sched.wait_and_delay.count.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
114.00 ± 14% -62.4% 42.83 ± 11% perf-sched.wait_and_delay.count.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
94.67 ± 7% -48.1% 49.17 ± 13% perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
59.83 ± 13% -76.0% 14.33 ± 48% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
103.00 ± 12% -48.1% 53.50 ± 20% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
19.33 ± 16% -56.0% 8.50 ± 29% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
68.17 ± 11% -39.1% 41.50 ± 19% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
36.67 ± 22% -79.1% 7.67 ± 46% perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
465.50 ± 9% -47.4% 244.83 ± 11% perf-sched.wait_and_delay.count.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
14492 ± 3% -96.3% 533.67 ± 10% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
128.67 ± 7% -53.5% 59.83 ± 10% perf-sched.wait_and_delay.count.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.67 ± 34% -80.4% 1.50 ±107% perf-sched.wait_and_delay.count.__cond_resched.vunmap_p4d_range.__vunmap_range_noflush.remove_vm_area.vfree
147533 -81.0% 28023 ± 5% perf-sched.wait_and_delay.count.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
4394 ± 4% -78.5% 942.83 ± 7% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
228791 -79.3% 47383 ± 4% perf-sched.wait_and_delay.count.futex_wait_queue.__futex_wait.futex_wait.do_futex
368.50 ± 2% -67.1% 121.33 ± 3% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
147506 -81.0% 28010 ± 5% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
5387 ± 6% -16.7% 4488 ± 5% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
8303 ± 2% -56.9% 3579 ± 5% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
14.67 ± 7% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
370.50 ±141% +221.9% 1192 ± 5% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
24395 ± 2% -51.2% 11914 ± 6% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
31053 ± 2% -80.5% 6047 ± 5% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
16.41 ± 2% +342.7% 72.65 ± 29% perf-sched.wait_and_delay.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
16.49 ± 3% +463.3% 92.90 ± 27% perf-sched.wait_and_delay.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
17.32 ± 5% +520.9% 107.52 ± 14% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
15.38 ± 6% +325.2% 65.41 ± 22% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
16.73 ± 4% +456.2% 93.04 ± 11% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
17.14 ± 3% +510.6% 104.68 ± 14% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
15.70 ± 4% +379.4% 75.25 ± 28% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
15.70 ± 3% +422.1% 81.97 ± 19% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
16.38 +528.4% 102.91 ± 21% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
45.20 ± 48% +166.0% 120.23 ± 27% perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
17.25 +495.5% 102.71 ± 2% perf-sched.wait_and_delay.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
402.57 ± 15% -52.8% 189.90 ± 14% perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
16.96 ± 4% +521.3% 105.40 ± 15% perf-sched.wait_and_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
28.45 +517.3% 175.65 ± 14% perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
22.49 +628.5% 163.83 ± 16% perf-sched.wait_and_delay.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
26.53 ± 30% +326.9% 113.25 ± 16% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
15.54 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
1.67 ±141% +284.6% 6.44 ± 4% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.07 ± 34% -93.6% 0.00 ±105% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
10.21 ± 15% +295.8% 40.43 ± 50% perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.89 ± 40% -99.8% 0.01 ±113% perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
11.67 ± 8% +213.5% 36.58 ± 28% perf-sched.wait_time.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
9.98 ± 2% +226.8% 32.61 ± 20% perf-sched.wait_time.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
1.03 +71.2% 1.77 ± 20% perf-sched.wait_time.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
0.06 ± 79% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
0.05 ± 22% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
0.08 ± 82% -98.2% 0.00 ±223% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
10.72 ± 10% +166.9% 28.61 ± 29% perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
10.53 ± 3% +260.5% 37.95 ± 7% perf-sched.wait_time.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
9.80 ± 12% +196.6% 29.06 ± 32% perf-sched.wait_time.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
9.80 ± 4% +235.1% 32.82 ± 14% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
9.50 ± 12% +281.9% 36.27 ± 70% perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
10.31 ± 2% +223.9% 33.40 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
8.04 ± 15% +276.1% 30.25 ± 35% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
9.60 ± 4% +240.9% 32.72 ± 16% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
0.06 ± 66% -98.3% 0.00 ±223% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
10.36 ± 4% +232.1% 34.41 ± 10% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
0.08 ± 50% -95.7% 0.00 ±100% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
0.01 ± 49% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
0.03 ± 73% -87.4% 0.00 ±145% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
8.01 ± 25% +238.0% 27.07 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
9.86 +237.0% 33.23 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
4.44 ± 4% +379.2% 21.26 ± 18% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
10.03 +236.3% 33.73 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.97 ± 8% -87.8% 0.12 ±221% perf-sched.wait_time.avg.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
0.02 ± 13% +1846.8% 0.45 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
1.01 +64.7% 1.66 perf-sched.wait_time.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
0.75 ± 4% +852.1% 7.10 ± 5% perf-sched.wait_time.avg.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.03 +462.6% 0.15 ± 6% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.24 ± 4% +25.3% 0.30 ± 8% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
1.98 ± 15% +595.7% 13.80 ± 90% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
2.78 ± 14% +444.7% 15.12 ± 16% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
6.77 ± 4% +483.0% 39.44 ± 3% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
3.17 +684.7% 24.85 ± 8% perf-sched.wait_time.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
36.64 ± 13% +244.7% 126.32 ± 6% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
9.79 +303.0% 39.45 ± 4% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
1.05 +23.8% 1.30 perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
0.86 +101.2% 1.73 ± 3% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
0.11 ± 21% +438.9% 0.61 ± 15% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.32 ± 4% +28.5% 0.41 ± 13% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
12.00 ± 3% +139.6% 28.76 ± 6% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
6.07 ± 2% +403.5% 30.56 ± 5% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.38 ± 41% -98.8% 0.00 ±105% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
0.36 ± 34% -84.3% 0.06 ±200% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.vma_alloc_folio.do_anonymous_page
0.36 ± 51% -92.9% 0.03 ±114% perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
15.98 ± 5% +361.7% 73.80 ± 23% perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.51 ± 14% -92.8% 0.04 ±196% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.__vmalloc_area_node.__vmalloc_node_range
8.56 ± 11% -99.9% 0.01 ±126% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
0.43 ± 32% -68.2% 0.14 ±119% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_node_trace.__get_vm_area_node.__vmalloc_node_range
0.46 ± 20% -89.3% 0.05 ±184% perf-sched.wait_time.max.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct
16.40 ± 2% +342.9% 72.65 ± 29% perf-sched.wait_time.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
0.31 ± 63% -76.2% 0.07 ±169% perf-sched.wait_time.max.ms.__cond_resched.cgroup_css_set_fork.cgroup_can_fork.copy_process.kernel_clone
0.14 ± 93% +258.7% 0.49 ± 14% perf-sched.wait_time.max.ms.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
16.49 ± 3% +463.5% 92.89 ± 27% perf-sched.wait_time.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
1.09 +171.0% 2.96 ± 10% perf-sched.wait_time.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
1.16 ± 7% +155.1% 2.97 ± 4% perf-sched.wait_time.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
0.19 ± 78% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
0.33 ± 35% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
0.20 ±101% -99.3% 0.00 ±223% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
17.31 ± 5% +521.0% 107.51 ± 14% perf-sched.wait_time.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
15.38 ± 6% +325.3% 65.40 ± 22% perf-sched.wait_time.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
16.72 ± 4% +456.6% 93.04 ± 11% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
1.16 ± 2% +88.7% 2.20 ± 33% perf-sched.wait_time.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
53.96 ± 32% +444.0% 293.53 ±109% perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
17.13 ± 2% +511.2% 104.68 ± 14% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
15.69 ± 4% +379.5% 75.25 ± 28% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
15.70 ± 3% +422.2% 81.97 ± 19% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
0.27 ± 80% -99.6% 0.00 ±223% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
16.37 +528.6% 102.90 ± 21% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
0.44 ± 33% -99.1% 0.00 ±104% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
0.02 ± 83% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
0.08 ± 83% -95.4% 0.00 ±147% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
1.16 ± 2% +134.7% 2.72 ± 19% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
49.88 ± 25% +141.0% 120.23 ± 27% perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
17.24 +495.7% 102.70 ± 2% perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
402.56 ± 15% -52.8% 189.89 ± 14% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
16.96 ± 4% +521.4% 105.39 ± 15% perf-sched.wait_time.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.06 +241.7% 3.61 ± 4% perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
1.07 -88.9% 0.12 ±221% perf-sched.wait_time.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
0.28 ± 27% +499.0% 1.67 ± 18% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
1.21 ± 2% +207.2% 3.71 ± 3% perf-sched.wait_time.max.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
13.43 ± 26% +38.8% 18.64 perf-sched.wait_time.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
28.45 +517.3% 175.65 ± 14% perf-sched.wait_time.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.79 ± 10% +62.2% 1.28 ± 25% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
13.22 ± 2% +317.2% 55.16 ± 35% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
834.29 ± 28% -48.5% 429.53 ± 94% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
22.48 +628.6% 163.83 ± 16% perf-sched.wait_time.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
22.74 ± 18% +398.0% 113.25 ± 16% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
7.72 ± 7% +80.6% 13.95 ± 2% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.74 ± 4% +77.2% 1.31 ± 32% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
5.01 +14.1% 5.72 ± 2% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
44.98 -19.7 25.32 ± 2% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
43.21 -19.6 23.65 ± 3% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
43.21 -19.6 23.65 ± 3% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
43.18 -19.5 23.63 ± 3% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
40.30 -17.5 22.75 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
41.10 -17.4 23.66 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
39.55 -17.3 22.24 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
24.76 ± 2% -8.5 16.23 ± 3% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
8.68 ± 4% -6.5 2.22 ± 6% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
7.23 ± 4% -5.8 1.46 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
7.23 ± 4% -5.8 1.46 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.11 ± 4% -5.7 1.39 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.09 ± 4% -5.7 1.39 ± 7% perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.59 ± 3% -5.1 1.47 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
6.59 ± 3% -5.1 1.47 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
6.59 ± 3% -5.1 1.47 ± 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
5.76 ± 2% -5.0 0.80 ± 9% perf-profile.calltrace.cycles-pp.start_thread
7.43 ± 2% -4.9 2.52 ± 7% perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
5.51 ± 3% -4.8 0.70 ± 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.start_thread
5.50 ± 3% -4.8 0.70 ± 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
5.48 ± 3% -4.8 0.69 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
5.42 ± 3% -4.7 0.69 ± 7% perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
5.90 ± 5% -3.9 2.01 ± 4% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
4.18 ± 5% -3.8 0.37 ± 71% perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.76 ± 5% -3.8 1.98 ± 4% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
5.04 ± 7% -3.7 1.32 ± 9% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__clone
5.03 ± 7% -3.7 1.32 ± 9% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
5.02 ± 7% -3.7 1.32 ± 9% perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
5.02 ± 7% -3.7 1.32 ± 9% perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
5.62 ± 5% -3.7 1.96 ± 3% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
4.03 ± 4% -3.1 0.92 ± 7% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
6.03 ± 5% -3.1 2.94 ± 3% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
3.43 ± 5% -2.8 0.67 ± 13% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
3.43 ± 5% -2.8 0.67 ± 13% perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
3.41 ± 5% -2.7 0.66 ± 13% perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
3.40 ± 5% -2.7 0.66 ± 13% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
3.67 ± 7% -2.7 0.94 ± 10% perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.92 ± 7% -2.4 0.50 ± 46% perf-profile.calltrace.cycles-pp.stress_pthread
2.54 ± 6% -2.2 0.38 ± 70% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.46 ± 6% -1.8 0.63 ± 10% perf-profile.calltrace.cycles-pp.dup_task_struct.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
3.00 ± 6% -1.6 1.43 ± 7% perf-profile.calltrace.cycles-pp.__munmap
2.96 ± 6% -1.5 1.42 ± 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
2.96 ± 6% -1.5 1.42 ± 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
2.95 ± 6% -1.5 1.41 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
2.95 ± 6% -1.5 1.41 ± 7% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
2.02 ± 4% -1.5 0.52 ± 46% perf-profile.calltrace.cycles-pp.__lll_lock_wait
1.78 ± 3% -1.5 0.30 ±100% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
1.77 ± 3% -1.5 0.30 ±100% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
1.54 ± 6% -1.3 0.26 ±100% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
2.54 ± 6% -1.2 1.38 ± 6% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.51 ± 6% -1.1 1.37 ± 7% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
1.13 -0.7 0.40 ± 70% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.15 ± 5% -0.7 0.46 ± 45% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
1.58 ± 5% -0.6 0.94 ± 7% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
0.99 ± 5% -0.5 0.51 ± 45% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
1.01 ± 5% -0.5 0.54 ± 45% perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.82 ± 4% -0.2 0.59 ± 5% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
0.00 +0.5 0.54 ± 5% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
0.00 +0.6 0.60 ± 5% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
0.00 +0.6 0.61 ± 6% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
0.00 +0.6 0.62 ± 6% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
0.53 ± 5% +0.6 1.17 ± 13% perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
1.94 ± 2% +0.7 2.64 ± 9% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
0.00 +0.7 0.73 ± 5% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
0.00 +0.8 0.75 ± 20% perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
2.02 ± 2% +0.8 2.85 ± 9% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.74 ± 5% +0.8 1.57 ± 11% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.00 +0.9 0.90 ± 4% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
0.00 +0.9 0.92 ± 13% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues
0.86 ± 4% +1.0 1.82 ± 10% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
0.86 ± 4% +1.0 1.83 ± 10% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.00 +1.0 0.98 ± 7% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked
0.09 ±223% +1.0 1.07 ± 11% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt
0.00 +1.0 0.99 ± 6% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd
0.00 +1.0 1.00 ± 7% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range
0.09 ±223% +1.0 1.10 ± 12% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
0.00 +1.0 1.01 ± 6% perf-profile.calltrace.cycles-pp.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
0.00 +1.1 1.10 ± 5% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath
0.00 +1.1 1.12 ± 5% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock
0.00 +1.2 1.23 ± 4% perf-profile.calltrace.cycles-pp.page_add_anon_rmap.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
0.00 +1.3 1.32 ± 4% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd
0.00 +1.4 1.38 ± 5% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
0.00 +2.4 2.44 ± 10% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range
0.00 +3.1 3.10 ± 5% perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
0.00 +3.5 3.52 ± 5% perf-profile.calltrace.cycles-pp.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
0.88 ± 4% +3.8 4.69 ± 4% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
6.30 ± 6% +13.5 19.85 ± 7% perf-profile.calltrace.cycles-pp.__clone
0.00 +16.7 16.69 ± 7% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
1.19 ± 29% +17.1 18.32 ± 7% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.00 +17.6 17.56 ± 7% perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.63 ± 7% +17.7 18.35 ± 7% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__clone
0.59 ± 5% +17.8 18.34 ± 7% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.__clone
0.59 ± 5% +17.8 18.34 ± 7% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
0.00 +17.9 17.90 ± 7% perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.36 ± 71% +18.0 18.33 ± 7% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
0.00 +32.0 32.03 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range
0.00 +32.6 32.62 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
0.00 +36.2 36.19 ± 2% perf-profile.calltrace.cycles-pp.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
7.97 ± 4% +36.6 44.52 ± 2% perf-profile.calltrace.cycles-pp.__madvise
7.91 ± 4% +36.6 44.46 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
7.90 ± 4% +36.6 44.46 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
7.87 ± 4% +36.6 44.44 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
7.86 ± 4% +36.6 44.44 ± 2% perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
7.32 ± 4% +36.8 44.07 ± 2% perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.25 ± 4% +36.8 44.06 ± 2% perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
1.04 ± 4% +40.0 41.08 ± 2% perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
1.00 ± 3% +40.1 41.06 ± 2% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
44.98 -19.7 25.32 ± 2% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
44.98 -19.7 25.32 ± 2% perf-profile.children.cycles-pp.cpu_startup_entry
44.96 -19.6 25.31 ± 2% perf-profile.children.cycles-pp.do_idle
43.21 -19.6 23.65 ± 3% perf-profile.children.cycles-pp.start_secondary
41.98 -17.6 24.40 ± 2% perf-profile.children.cycles-pp.cpuidle_idle_call
41.21 -17.3 23.86 ± 2% perf-profile.children.cycles-pp.cpuidle_enter
41.20 -17.3 23.86 ± 2% perf-profile.children.cycles-pp.cpuidle_enter_state
12.69 ± 3% -10.6 2.12 ± 6% perf-profile.children.cycles-pp.do_exit
12.60 ± 3% -10.5 2.08 ± 7% perf-profile.children.cycles-pp.__x64_sys_exit
24.76 ± 2% -8.5 16.31 ± 2% perf-profile.children.cycles-pp.intel_idle
12.34 ± 2% -8.4 3.90 ± 5% perf-profile.children.cycles-pp.intel_idle_irq
6.96 ± 4% -5.4 1.58 ± 7% perf-profile.children.cycles-pp.ret_from_fork_asm
6.69 ± 4% -5.2 1.51 ± 7% perf-profile.children.cycles-pp.ret_from_fork
6.59 ± 3% -5.1 1.47 ± 7% perf-profile.children.cycles-pp.kthread
5.78 ± 2% -5.0 0.80 ± 8% perf-profile.children.cycles-pp.start_thread
4.68 ± 4% -4.5 0.22 ± 10% perf-profile.children.cycles-pp._raw_spin_lock_irq
5.03 ± 7% -3.7 1.32 ± 9% perf-profile.children.cycles-pp.__do_sys_clone
5.02 ± 7% -3.7 1.32 ± 9% perf-profile.children.cycles-pp.kernel_clone
4.20 ± 5% -3.7 0.53 ± 9% perf-profile.children.cycles-pp.exit_notify
4.67 ± 5% -3.6 1.10 ± 9% perf-profile.children.cycles-pp.rcu_core
4.60 ± 4% -3.5 1.06 ± 10% perf-profile.children.cycles-pp.rcu_do_batch
4.89 ± 5% -3.4 1.44 ± 11% perf-profile.children.cycles-pp.__do_softirq
5.64 ± 3% -3.2 2.39 ± 6% perf-profile.children.cycles-pp.__schedule
6.27 ± 5% -3.2 3.03 ± 4% perf-profile.children.cycles-pp.flush_tlb_mm_range
4.03 ± 4% -3.1 0.92 ± 7% perf-profile.children.cycles-pp.smpboot_thread_fn
6.68 ± 4% -3.1 3.61 ± 3% perf-profile.children.cycles-pp.tlb_finish_mmu
6.04 ± 5% -3.1 2.99 ± 4% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
6.04 ± 5% -3.0 2.99 ± 4% perf-profile.children.cycles-pp.smp_call_function_many_cond
3.77 ± 2% -3.0 0.73 ± 16% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
7.78 -3.0 4.77 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
3.43 ± 5% -2.8 0.67 ± 13% perf-profile.children.cycles-pp.run_ksoftirqd
3.67 ± 7% -2.7 0.94 ± 10% perf-profile.children.cycles-pp.copy_process
2.80 ± 6% -2.5 0.34 ± 15% perf-profile.children.cycles-pp.queued_write_lock_slowpath
3.41 ± 2% -2.5 0.96 ± 16% perf-profile.children.cycles-pp.do_futex
3.06 ± 5% -2.4 0.68 ± 16% perf-profile.children.cycles-pp.free_unref_page_commit
3.02 ± 5% -2.4 0.67 ± 16% perf-profile.children.cycles-pp.free_pcppages_bulk
2.92 ± 7% -2.3 0.58 ± 14% perf-profile.children.cycles-pp.stress_pthread
3.22 ± 3% -2.3 0.90 ± 18% perf-profile.children.cycles-pp.__x64_sys_futex
2.52 ± 5% -2.2 0.35 ± 7% perf-profile.children.cycles-pp.release_task
2.54 ± 6% -2.0 0.53 ± 10% perf-profile.children.cycles-pp.worker_thread
3.12 ± 5% -1.9 1.17 ± 11% perf-profile.children.cycles-pp.free_unref_page
2.31 ± 6% -1.9 0.45 ± 11% perf-profile.children.cycles-pp.process_one_work
2.47 ± 6% -1.8 0.63 ± 10% perf-profile.children.cycles-pp.dup_task_struct
2.19 ± 5% -1.8 0.41 ± 12% perf-profile.children.cycles-pp.delayed_vfree_work
2.14 ± 5% -1.7 0.40 ± 11% perf-profile.children.cycles-pp.vfree
3.19 ± 2% -1.6 1.58 ± 8% perf-profile.children.cycles-pp.schedule
2.06 ± 3% -1.6 0.46 ± 7% perf-profile.children.cycles-pp.__sigtimedwait
3.02 ± 6% -1.6 1.44 ± 7% perf-profile.children.cycles-pp.__munmap
1.94 ± 4% -1.6 0.39 ± 14% perf-profile.children.cycles-pp.__unfreeze_partials
2.95 ± 6% -1.5 1.41 ± 7% perf-profile.children.cycles-pp.__x64_sys_munmap
2.95 ± 6% -1.5 1.41 ± 7% perf-profile.children.cycles-pp.__vm_munmap
2.14 ± 3% -1.5 0.60 ± 21% perf-profile.children.cycles-pp.futex_wait
2.08 ± 4% -1.5 0.60 ± 19% perf-profile.children.cycles-pp.__lll_lock_wait
2.04 ± 3% -1.5 0.56 ± 20% perf-profile.children.cycles-pp.__futex_wait
1.77 ± 5% -1.5 0.32 ± 10% perf-profile.children.cycles-pp.remove_vm_area
1.86 ± 5% -1.4 0.46 ± 10% perf-profile.children.cycles-pp.open64
1.74 ± 4% -1.4 0.37 ± 7% perf-profile.children.cycles-pp.__x64_sys_rt_sigtimedwait
1.71 ± 4% -1.4 0.36 ± 8% perf-profile.children.cycles-pp.do_sigtimedwait
1.79 ± 5% -1.3 0.46 ± 9% perf-profile.children.cycles-pp.__x64_sys_openat
1.78 ± 5% -1.3 0.46 ± 8% perf-profile.children.cycles-pp.do_sys_openat2
1.61 ± 4% -1.3 0.32 ± 12% perf-profile.children.cycles-pp.poll_idle
1.65 ± 9% -1.3 0.37 ± 14% perf-profile.children.cycles-pp.pthread_create@@GLIBC_2.2.5
1.56 ± 8% -1.2 0.35 ± 7% perf-profile.children.cycles-pp.alloc_thread_stack_node
2.32 ± 3% -1.2 1.13 ± 8% perf-profile.children.cycles-pp.pick_next_task_fair
2.59 ± 6% -1.2 1.40 ± 7% perf-profile.children.cycles-pp.do_vmi_munmap
1.55 ± 4% -1.2 0.40 ± 19% perf-profile.children.cycles-pp.futex_wait_queue
1.37 ± 5% -1.1 0.22 ± 12% perf-profile.children.cycles-pp.find_unlink_vmap_area
2.52 ± 6% -1.1 1.38 ± 6% perf-profile.children.cycles-pp.do_vmi_align_munmap
1.53 ± 5% -1.1 0.39 ± 8% perf-profile.children.cycles-pp.do_filp_open
1.52 ± 5% -1.1 0.39 ± 7% perf-profile.children.cycles-pp.path_openat
1.25 ± 3% -1.1 0.14 ± 12% perf-profile.children.cycles-pp.sigpending
1.58 ± 5% -1.1 0.50 ± 6% perf-profile.children.cycles-pp.schedule_idle
1.29 ± 5% -1.1 0.21 ± 21% perf-profile.children.cycles-pp.__mprotect
1.40 ± 8% -1.1 0.32 ± 4% perf-profile.children.cycles-pp.__vmalloc_node_range
2.06 ± 3% -1.0 1.02 ± 9% perf-profile.children.cycles-pp.newidle_balance
1.04 ± 3% -1.0 0.08 ± 23% perf-profile.children.cycles-pp.__x64_sys_rt_sigpending
1.14 ± 6% -1.0 0.18 ± 18% perf-profile.children.cycles-pp.__x64_sys_mprotect
1.13 ± 6% -1.0 0.18 ± 17% perf-profile.children.cycles-pp.do_mprotect_pkey
1.30 ± 7% -0.9 0.36 ± 10% perf-profile.children.cycles-pp.wake_up_new_task
1.14 ± 9% -0.9 0.22 ± 16% perf-profile.children.cycles-pp.do_anonymous_page
0.95 ± 3% -0.9 0.04 ± 71% perf-profile.children.cycles-pp.do_sigpending
1.24 ± 3% -0.9 0.34 ± 9% perf-profile.children.cycles-pp.futex_wake
1.02 ± 6% -0.9 0.14 ± 15% perf-profile.children.cycles-pp.mprotect_fixup
1.91 ± 2% -0.9 1.06 ± 9% perf-profile.children.cycles-pp.load_balance
1.38 ± 5% -0.8 0.53 ± 6% perf-profile.children.cycles-pp.select_task_rq_fair
1.14 ± 4% -0.8 0.31 ± 12% perf-profile.children.cycles-pp.__pthread_mutex_unlock_usercnt
2.68 ± 3% -0.8 1.91 ± 6% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
1.00 ± 4% -0.7 0.26 ± 10% perf-profile.children.cycles-pp.flush_smp_call_function_queue
1.44 ± 3% -0.7 0.73 ± 10% perf-profile.children.cycles-pp.find_busiest_group
0.81 ± 6% -0.7 0.10 ± 18% perf-profile.children.cycles-pp.vma_modify
1.29 ± 3% -0.7 0.60 ± 8% perf-profile.children.cycles-pp.exit_mm
1.40 ± 3% -0.7 0.71 ± 10% perf-profile.children.cycles-pp.update_sd_lb_stats
0.78 ± 7% -0.7 0.10 ± 19% perf-profile.children.cycles-pp.__split_vma
0.90 ± 8% -0.7 0.22 ± 10% perf-profile.children.cycles-pp.__vmalloc_area_node
0.75 ± 4% -0.7 0.10 ± 5% perf-profile.children.cycles-pp.__exit_signal
1.49 ± 2% -0.7 0.84 ± 7% perf-profile.children.cycles-pp.try_to_wake_up
0.89 ± 7% -0.6 0.24 ± 10% perf-profile.children.cycles-pp.find_idlest_cpu
1.59 ± 5% -0.6 0.95 ± 7% perf-profile.children.cycles-pp.unmap_region
0.86 ± 3% -0.6 0.22 ± 26% perf-profile.children.cycles-pp.pthread_cond_timedwait@@GLIBC_2.3.2
1.59 ± 3% -0.6 0.95 ± 9% perf-profile.children.cycles-pp.irq_exit_rcu
1.24 ± 3% -0.6 0.61 ± 10% perf-profile.children.cycles-pp.update_sg_lb_stats
0.94 ± 5% -0.6 0.32 ± 11% perf-profile.children.cycles-pp.do_task_dead
0.87 ± 3% -0.6 0.25 ± 19% perf-profile.children.cycles-pp.perf_iterate_sb
0.82 ± 4% -0.6 0.22 ± 10% perf-profile.children.cycles-pp.sched_ttwu_pending
1.14 ± 3% -0.6 0.54 ± 10% perf-profile.children.cycles-pp.activate_task
0.84 -0.6 0.25 ± 10% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.81 ± 6% -0.6 0.22 ± 11% perf-profile.children.cycles-pp.find_idlest_group
0.75 ± 5% -0.6 0.18 ± 14% perf-profile.children.cycles-pp.step_into
0.74 ± 8% -0.6 0.18 ± 14% perf-profile.children.cycles-pp.__alloc_pages_bulk
0.74 ± 6% -0.5 0.19 ± 11% perf-profile.children.cycles-pp.update_sg_wakeup_stats
0.72 ± 5% -0.5 0.18 ± 15% perf-profile.children.cycles-pp.pick_link
1.06 ± 2% -0.5 0.52 ± 9% perf-profile.children.cycles-pp.enqueue_task_fair
0.77 ± 6% -0.5 0.23 ± 12% perf-profile.children.cycles-pp.unmap_vmas
0.76 ± 2% -0.5 0.22 ± 8% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.94 ± 2% -0.5 0.42 ± 10% perf-profile.children.cycles-pp.dequeue_task_fair
0.65 ± 5% -0.5 0.15 ± 18% perf-profile.children.cycles-pp.open_last_lookups
1.37 ± 3% -0.5 0.87 ± 4% perf-profile.children.cycles-pp.llist_add_batch
0.70 ± 4% -0.5 0.22 ± 19% perf-profile.children.cycles-pp.memcpy_orig
0.91 ± 4% -0.5 0.44 ± 7% perf-profile.children.cycles-pp.update_load_avg
0.67 -0.5 0.20 ± 8% perf-profile.children.cycles-pp.switch_fpu_return
0.88 ± 3% -0.5 0.42 ± 8% perf-profile.children.cycles-pp.enqueue_entity
0.91 ± 4% -0.5 0.45 ± 12% perf-profile.children.cycles-pp.ttwu_do_activate
0.77 ± 4% -0.5 0.32 ± 10% perf-profile.children.cycles-pp.schedule_hrtimeout_range_clock
0.63 ± 5% -0.4 0.20 ± 21% perf-profile.children.cycles-pp.arch_dup_task_struct
0.74 ± 3% -0.4 0.32 ± 15% perf-profile.children.cycles-pp.dequeue_entity
0.62 ± 5% -0.4 0.21 ± 5% perf-profile.children.cycles-pp.finish_task_switch
0.56 -0.4 0.16 ± 7% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
0.53 ± 4% -0.4 0.13 ± 9% perf-profile.children.cycles-pp.syscall
0.50 ± 9% -0.4 0.11 ± 18% perf-profile.children.cycles-pp.__get_vm_area_node
0.51 ± 3% -0.4 0.12 ± 12% perf-profile.children.cycles-pp.__slab_free
0.52 ± 2% -0.4 0.14 ± 10% perf-profile.children.cycles-pp.kmem_cache_free
0.75 ± 3% -0.4 0.37 ± 9% perf-profile.children.cycles-pp.exit_mm_release
0.50 ± 6% -0.4 0.12 ± 21% perf-profile.children.cycles-pp.do_send_specific
0.74 ± 3% -0.4 0.37 ± 8% perf-profile.children.cycles-pp.futex_exit_release
0.45 ± 10% -0.4 0.09 ± 17% perf-profile.children.cycles-pp.alloc_vmap_area
0.47 ± 3% -0.4 0.11 ± 20% perf-profile.children.cycles-pp.tgkill
0.68 ± 11% -0.4 0.32 ± 12% perf-profile.children.cycles-pp.__mmap
0.48 ± 3% -0.4 0.13 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.76 ± 5% -0.3 0.41 ± 10% perf-profile.children.cycles-pp.wake_up_q
0.42 ± 7% -0.3 0.08 ± 22% perf-profile.children.cycles-pp.__close
0.49 ± 7% -0.3 0.14 ± 25% perf-profile.children.cycles-pp.kmem_cache_alloc
0.49 ± 9% -0.3 0.15 ± 14% perf-profile.children.cycles-pp.mas_store_gfp
0.46 ± 4% -0.3 0.12 ± 23% perf-profile.children.cycles-pp.perf_event_task_output
0.44 ± 10% -0.3 0.10 ± 28% perf-profile.children.cycles-pp.pthread_sigqueue
0.46 ± 4% -0.3 0.12 ± 15% perf-profile.children.cycles-pp.link_path_walk
0.42 ± 8% -0.3 0.10 ± 20% perf-profile.children.cycles-pp.proc_ns_get_link
0.63 ± 10% -0.3 0.32 ± 12% perf-profile.children.cycles-pp.vm_mmap_pgoff
0.45 ± 4% -0.3 0.14 ± 13% perf-profile.children.cycles-pp.sched_move_task
0.36 ± 8% -0.3 0.06 ± 49% perf-profile.children.cycles-pp.__x64_sys_close
0.46 ± 8% -0.3 0.17 ± 14% perf-profile.children.cycles-pp.prctl
0.65 ± 3% -0.3 0.35 ± 7% perf-profile.children.cycles-pp.futex_cleanup
0.42 ± 7% -0.3 0.12 ± 15% perf-profile.children.cycles-pp.mas_store_prealloc
0.49 ± 5% -0.3 0.20 ± 13% perf-profile.children.cycles-pp.__rmqueue_pcplist
0.37 ± 7% -0.3 0.08 ± 16% perf-profile.children.cycles-pp.do_tkill
0.36 ± 10% -0.3 0.08 ± 20% perf-profile.children.cycles-pp.ns_get_path
0.37 ± 4% -0.3 0.09 ± 18% perf-profile.children.cycles-pp.setns
0.67 ± 3% -0.3 0.41 ± 8% perf-profile.children.cycles-pp.hrtimer_wakeup
0.35 ± 5% -0.3 0.10 ± 16% perf-profile.children.cycles-pp.__task_pid_nr_ns
0.41 ± 5% -0.3 0.16 ± 12% perf-profile.children.cycles-pp.mas_wr_bnode
0.35 ± 4% -0.3 0.10 ± 20% perf-profile.children.cycles-pp.rcu_cblist_dequeue
0.37 ± 5% -0.2 0.12 ± 17% perf-profile.children.cycles-pp.exit_task_stack_account
0.56 ± 4% -0.2 0.31 ± 12% perf-profile.children.cycles-pp.select_task_rq
0.29 ± 6% -0.2 0.05 ± 46% perf-profile.children.cycles-pp.mas_wr_store_entry
0.34 ± 4% -0.2 0.10 ± 27% perf-profile.children.cycles-pp.perf_event_task
0.39 ± 9% -0.2 0.15 ± 12% perf-profile.children.cycles-pp.__switch_to_asm
0.35 ± 5% -0.2 0.11 ± 11% perf-profile.children.cycles-pp.account_kernel_stack
0.30 ± 7% -0.2 0.06 ± 48% perf-profile.children.cycles-pp.__ns_get_path
0.31 ± 9% -0.2 0.07 ± 17% perf-profile.children.cycles-pp.free_vmap_area_noflush
0.31 ± 5% -0.2 0.08 ± 19% perf-profile.children.cycles-pp.__do_sys_setns
0.33 ± 7% -0.2 0.10 ± 7% perf-profile.children.cycles-pp.__free_one_page
0.31 ± 11% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.__pte_alloc
0.36 ± 6% -0.2 0.13 ± 12% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.27 ± 12% -0.2 0.05 ± 71% perf-profile.children.cycles-pp.__fput
0.53 ± 9% -0.2 0.31 ± 12% perf-profile.children.cycles-pp.do_mmap
0.27 ± 12% -0.2 0.05 ± 77% perf-profile.children.cycles-pp.__x64_sys_rt_tgsigqueueinfo
0.28 ± 5% -0.2 0.06 ± 50% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.34 ± 10% -0.2 0.12 ± 29% perf-profile.children.cycles-pp.futex_wait_setup
0.27 ± 6% -0.2 0.06 ± 45% perf-profile.children.cycles-pp.__x64_sys_tgkill
0.31 ± 7% -0.2 0.11 ± 18% perf-profile.children.cycles-pp.__switch_to
0.26 ± 8% -0.2 0.06 ± 21% perf-profile.children.cycles-pp.__call_rcu_common
0.33 ± 9% -0.2 0.13 ± 18% perf-profile.children.cycles-pp.__do_sys_prctl
0.28 ± 5% -0.2 0.08 ± 17% perf-profile.children.cycles-pp.mm_release
0.52 ± 2% -0.2 0.32 ± 9% perf-profile.children.cycles-pp.__get_user_8
0.24 ± 10% -0.2 0.04 ± 72% perf-profile.children.cycles-pp.dput
0.25 ± 14% -0.2 0.05 ± 46% perf-profile.children.cycles-pp.perf_event_mmap
0.24 ± 7% -0.2 0.06 ± 50% perf-profile.children.cycles-pp.mas_walk
0.28 ± 6% -0.2 0.10 ± 24% perf-profile.children.cycles-pp.rmqueue_bulk
0.23 ± 15% -0.2 0.05 ± 46% perf-profile.children.cycles-pp.perf_event_mmap_event
0.25 ± 15% -0.2 0.08 ± 45% perf-profile.children.cycles-pp.___slab_alloc
0.20 ± 14% -0.2 0.03 ±100% perf-profile.children.cycles-pp.lookup_fast
0.20 ± 10% -0.2 0.04 ± 75% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
0.28 ± 7% -0.2 0.12 ± 24% perf-profile.children.cycles-pp.prepare_task_switch
0.22 ± 11% -0.2 0.05 ± 8% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.63 ± 5% -0.2 0.47 ± 12% perf-profile.children.cycles-pp.llist_reverse_order
0.25 ± 11% -0.2 0.09 ± 34% perf-profile.children.cycles-pp.futex_q_lock
0.21 ± 6% -0.2 0.06 ± 47% perf-profile.children.cycles-pp.kmem_cache_alloc_node
0.18 ± 11% -0.2 0.03 ±100% perf-profile.children.cycles-pp.alloc_empty_file
0.19 ± 5% -0.2 0.04 ± 71% perf-profile.children.cycles-pp.__put_task_struct
0.19 ± 15% -0.2 0.03 ± 70% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.24 ± 6% -0.2 0.09 ± 20% perf-profile.children.cycles-pp.___perf_sw_event
0.18 ± 7% -0.2 0.03 ±100% perf-profile.children.cycles-pp.perf_event_fork
0.19 ± 11% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.select_idle_core
0.30 ± 11% -0.1 0.15 ± 7% perf-profile.children.cycles-pp.pte_alloc_one
0.25 ± 6% -0.1 0.11 ± 10% perf-profile.children.cycles-pp.set_next_entity
0.20 ± 10% -0.1 0.06 ± 49% perf-profile.children.cycles-pp.__perf_event_header__init_id
0.18 ± 15% -0.1 0.03 ±101% perf-profile.children.cycles-pp.__radix_tree_lookup
0.22 ± 11% -0.1 0.08 ± 21% perf-profile.children.cycles-pp.mas_spanning_rebalance
0.20 ± 9% -0.1 0.06 ± 9% perf-profile.children.cycles-pp.stress_pthread_func
0.18 ± 12% -0.1 0.04 ± 73% perf-profile.children.cycles-pp.__getpid
0.16 ± 13% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.walk_component
0.28 ± 5% -0.1 0.15 ± 13% perf-profile.children.cycles-pp.update_curr
0.25 ± 5% -0.1 0.11 ± 22% perf-profile.children.cycles-pp.balance_fair
0.16 ± 9% -0.1 0.03 ±100% perf-profile.children.cycles-pp.futex_wake_mark
0.16 ± 12% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.get_futex_key
0.17 ± 6% -0.1 0.05 ± 47% perf-profile.children.cycles-pp.memcg_account_kmem
0.25 ± 11% -0.1 0.12 ± 11% perf-profile.children.cycles-pp._find_next_bit
0.15 ± 13% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.do_open
0.20 ± 8% -0.1 0.08 ± 16% perf-profile.children.cycles-pp.mas_rebalance
0.17 ± 13% -0.1 0.05 ± 45% perf-profile.children.cycles-pp.__memcg_kmem_charge_page
0.33 ± 6% -0.1 0.21 ± 10% perf-profile.children.cycles-pp.select_idle_sibling
0.14 ± 11% -0.1 0.03 ±100% perf-profile.children.cycles-pp.get_user_pages_fast
0.18 ± 7% -0.1 0.07 ± 14% perf-profile.children.cycles-pp.mas_alloc_nodes
0.14 ± 11% -0.1 0.03 ±101% perf-profile.children.cycles-pp.set_task_cpu
0.14 ± 12% -0.1 0.03 ±101% perf-profile.children.cycles-pp.vm_unmapped_area
0.38 ± 6% -0.1 0.27 ± 7% perf-profile.children.cycles-pp.native_sched_clock
0.16 ± 10% -0.1 0.05 ± 47% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
0.36 ± 9% -0.1 0.25 ± 12% perf-profile.children.cycles-pp.mmap_region
0.23 ± 7% -0.1 0.12 ± 9% perf-profile.children.cycles-pp.available_idle_cpu
0.13 ± 11% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.internal_get_user_pages_fast
0.16 ± 10% -0.1 0.06 ± 18% perf-profile.children.cycles-pp.get_unmapped_area
0.50 ± 7% -0.1 0.40 ± 6% perf-profile.children.cycles-pp.menu_select
0.24 ± 9% -0.1 0.14 ± 13% perf-profile.children.cycles-pp.rmqueue
0.17 ± 14% -0.1 0.07 ± 26% perf-profile.children.cycles-pp.perf_event_comm
0.17 ± 15% -0.1 0.07 ± 23% perf-profile.children.cycles-pp.perf_event_comm_event
0.17 ± 11% -0.1 0.07 ± 14% perf-profile.children.cycles-pp.pick_next_entity
0.13 ± 14% -0.1 0.03 ±102% perf-profile.children.cycles-pp.perf_output_begin
0.23 ± 6% -0.1 0.13 ± 21% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.14 ± 18% -0.1 0.04 ± 72% perf-profile.children.cycles-pp.perf_event_comm_output
0.21 ± 9% -0.1 0.12 ± 9% perf-profile.children.cycles-pp.update_rq_clock
0.16 ± 8% -0.1 0.06 ± 19% perf-profile.children.cycles-pp.mas_split
0.13 ± 14% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
0.13 ± 6% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.13 ± 7% -0.1 0.04 ± 72% perf-profile.children.cycles-pp.mas_topiary_replace
0.14 ± 8% -0.1 0.06 ± 9% perf-profile.children.cycles-pp.mas_preallocate
0.16 ± 11% -0.1 0.07 ± 18% perf-profile.children.cycles-pp.__pick_eevdf
0.11 ± 14% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.mas_empty_area_rev
0.25 ± 7% -0.1 0.17 ± 10% perf-profile.children.cycles-pp.select_idle_cpu
0.14 ± 12% -0.1 0.06 ± 14% perf-profile.children.cycles-pp.cpu_stopper_thread
0.14 ± 10% -0.1 0.06 ± 13% perf-profile.children.cycles-pp.active_load_balance_cpu_stop
0.14 ± 14% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.os_xsave
0.18 ± 6% -0.1 0.11 ± 14% perf-profile.children.cycles-pp.idle_cpu
0.17 ± 4% -0.1 0.10 ± 15% perf-profile.children.cycles-pp.hrtimer_start_range_ns
0.11 ± 14% -0.1 0.03 ±100% perf-profile.children.cycles-pp.__pthread_mutex_lock
0.32 ± 5% -0.1 0.25 ± 5% perf-profile.children.cycles-pp.sched_clock
0.11 ± 6% -0.1 0.03 ± 70% perf-profile.children.cycles-pp.wakeup_preempt
0.23 ± 7% -0.1 0.16 ± 13% perf-profile.children.cycles-pp.update_rq_clock_task
0.13 ± 8% -0.1 0.06 ± 16% perf-profile.children.cycles-pp.local_clock_noinstr
0.11 ± 10% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
0.34 ± 4% -0.1 0.27 ± 6% perf-profile.children.cycles-pp.sched_clock_cpu
0.11 ± 9% -0.1 0.04 ± 76% perf-profile.children.cycles-pp.avg_vruntime
0.15 ± 8% -0.1 0.08 ± 14% perf-profile.children.cycles-pp.update_cfs_group
0.10 ± 8% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
0.13 ± 8% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.sched_use_asym_prio
0.09 ± 12% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.getname_flags
0.18 ± 9% -0.1 0.12 ± 12% perf-profile.children.cycles-pp.__update_load_avg_se
0.11 ± 8% -0.1 0.05 ± 46% perf-profile.children.cycles-pp.place_entity
0.08 ± 12% -0.0 0.02 ± 99% perf-profile.children.cycles-pp.folio_add_lru_vma
0.10 ± 7% -0.0 0.05 ± 46% perf-profile.children.cycles-pp._find_next_and_bit
0.10 ± 6% -0.0 0.06 ± 24% perf-profile.children.cycles-pp.reweight_entity
0.03 ± 70% +0.0 0.08 ± 14% perf-profile.children.cycles-pp.perf_rotate_context
0.19 ± 10% +0.1 0.25 ± 7% perf-profile.children.cycles-pp.irqtime_account_irq
0.08 ± 11% +0.1 0.14 ± 21% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.00 +0.1 0.06 ± 14% perf-profile.children.cycles-pp.rcu_pending
0.10 ± 17% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.rebalance_domains
0.14 ± 16% +0.1 0.21 ± 12% perf-profile.children.cycles-pp.downgrade_write
0.14 ± 14% +0.1 0.21 ± 10% perf-profile.children.cycles-pp.down_read_killable
0.00 +0.1 0.07 ± 11% perf-profile.children.cycles-pp.free_tail_page_prepare
0.02 ±141% +0.1 0.09 ± 20% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.01 ±223% +0.1 0.08 ± 25% perf-profile.children.cycles-pp.arch_scale_freq_tick
0.55 ± 9% +0.1 0.62 ± 9% perf-profile.children.cycles-pp.__alloc_pages
0.34 ± 5% +0.1 0.41 ± 9% perf-profile.children.cycles-pp.clock_nanosleep
0.00 +0.1 0.08 ± 23% perf-profile.children.cycles-pp.tick_nohz_next_event
0.70 ± 2% +0.1 0.78 ± 5% perf-profile.children.cycles-pp.flush_tlb_func
0.14 ± 10% +0.1 0.23 ± 13% perf-profile.children.cycles-pp.__intel_pmu_enable_all
0.07 ± 19% +0.1 0.17 ± 17% perf-profile.children.cycles-pp.cgroup_rstat_updated
0.04 ± 71% +0.1 0.14 ± 11% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.25 ± 9% +0.1 0.38 ± 11% perf-profile.children.cycles-pp.down_read
0.43 ± 9% +0.1 0.56 ± 10% perf-profile.children.cycles-pp.get_page_from_freelist
0.00 +0.1 0.15 ± 6% perf-profile.children.cycles-pp.vm_normal_page
0.31 ± 7% +0.2 0.46 ± 9% perf-profile.children.cycles-pp.native_flush_tlb_local
0.00 +0.2 0.16 ± 8% perf-profile.children.cycles-pp.__tlb_remove_page_size
0.28 ± 11% +0.2 0.46 ± 13% perf-profile.children.cycles-pp.vma_alloc_folio
0.00 +0.2 0.24 ± 5% perf-profile.children.cycles-pp._compound_head
0.07 ± 16% +0.2 0.31 ± 6% perf-profile.children.cycles-pp.__mod_node_page_state
0.38 ± 5% +0.2 0.62 ± 7% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
0.22 ± 12% +0.2 0.47 ± 10% perf-profile.children.cycles-pp.schedule_preempt_disabled
0.38 ± 5% +0.3 0.64 ± 7% perf-profile.children.cycles-pp.perf_event_task_tick
0.00 +0.3 0.27 ± 5% perf-profile.children.cycles-pp.free_swap_cache
0.30 ± 10% +0.3 0.58 ± 10% perf-profile.children.cycles-pp.rwsem_down_read_slowpath
0.00 +0.3 0.30 ± 4% perf-profile.children.cycles-pp.free_pages_and_swap_cache
0.09 ± 10% +0.3 0.42 ± 7% perf-profile.children.cycles-pp.__mod_lruvec_state
0.00 +0.3 0.34 ± 9% perf-profile.children.cycles-pp.deferred_split_folio
0.00 +0.4 0.36 ± 13% perf-profile.children.cycles-pp.prep_compound_page
0.09 ± 10% +0.4 0.50 ± 9% perf-profile.children.cycles-pp.free_unref_page_prepare
0.00 +0.4 0.42 ± 11% perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page
1.67 ± 3% +0.4 2.12 ± 8% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.63 ± 3% +0.5 1.11 ± 12% perf-profile.children.cycles-pp.scheduler_tick
1.93 ± 3% +0.5 2.46 ± 8% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
1.92 ± 3% +0.5 2.45 ± 8% perf-profile.children.cycles-pp.hrtimer_interrupt
0.73 ± 3% +0.6 1.31 ± 11% perf-profile.children.cycles-pp.update_process_times
0.74 ± 3% +0.6 1.34 ± 11% perf-profile.children.cycles-pp.tick_sched_handle
0.20 ± 8% +0.6 0.83 ± 18% perf-profile.children.cycles-pp.__cond_resched
0.78 ± 4% +0.6 1.43 ± 12% perf-profile.children.cycles-pp.tick_nohz_highres_handler
0.12 ± 7% +0.7 0.81 ± 5% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.28 ± 7% +0.9 1.23 ± 4% perf-profile.children.cycles-pp.release_pages
0.00 +1.0 1.01 ± 6% perf-profile.children.cycles-pp.pmdp_invalidate
0.35 ± 6% +1.2 1.56 ± 5% perf-profile.children.cycles-pp.__mod_lruvec_page_state
0.30 ± 8% +1.2 1.53 ± 4% perf-profile.children.cycles-pp.tlb_batch_pages_flush
0.00 +1.3 1.26 ± 4% perf-profile.children.cycles-pp.page_add_anon_rmap
0.09 ± 11% +3.1 3.20 ± 5% perf-profile.children.cycles-pp.page_remove_rmap
1.60 ± 2% +3.4 5.04 ± 4% perf-profile.children.cycles-pp.zap_pte_range
0.03 ±100% +3.5 3.55 ± 5% perf-profile.children.cycles-pp.__split_huge_pmd_locked
41.36 +11.6 52.92 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
41.22 +11.7 52.88 ± 2% perf-profile.children.cycles-pp.do_syscall_64
6.42 ± 6% +13.5 19.88 ± 7% perf-profile.children.cycles-pp.__clone
0.82 ± 6% +16.2 16.98 ± 7% perf-profile.children.cycles-pp.clear_page_erms
2.62 ± 5% +16.4 19.04 ± 7% perf-profile.children.cycles-pp.asm_exc_page_fault
2.18 ± 5% +16.8 18.94 ± 7% perf-profile.children.cycles-pp.exc_page_fault
2.06 ± 6% +16.8 18.90 ± 7% perf-profile.children.cycles-pp.do_user_addr_fault
1.60 ± 8% +17.0 18.60 ± 7% perf-profile.children.cycles-pp.handle_mm_fault
1.52 ± 7% +17.1 18.58 ± 7% perf-profile.children.cycles-pp.__handle_mm_fault
0.30 ± 7% +17.4 17.72 ± 7% perf-profile.children.cycles-pp.clear_huge_page
0.31 ± 8% +17.6 17.90 ± 7% perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
11.66 ± 3% +22.2 33.89 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
3.29 ± 3% +30.2 33.46 perf-profile.children.cycles-pp._raw_spin_lock
0.04 ± 71% +36.2 36.21 ± 2% perf-profile.children.cycles-pp.__split_huge_pmd
8.00 ± 4% +36.5 44.54 ± 2% perf-profile.children.cycles-pp.__madvise
7.87 ± 4% +36.6 44.44 ± 2% perf-profile.children.cycles-pp.__x64_sys_madvise
7.86 ± 4% +36.6 44.44 ± 2% perf-profile.children.cycles-pp.do_madvise
7.32 ± 4% +36.8 44.07 ± 2% perf-profile.children.cycles-pp.madvise_vma_behavior
7.26 ± 4% +36.8 44.06 ± 2% perf-profile.children.cycles-pp.zap_page_range_single
1.78 +39.5 41.30 ± 2% perf-profile.children.cycles-pp.unmap_page_range
1.72 +39.6 41.28 ± 2% perf-profile.children.cycles-pp.zap_pmd_range
24.76 ± 2% -8.5 16.31 ± 2% perf-profile.self.cycles-pp.intel_idle
11.46 ± 2% -7.8 3.65 ± 5% perf-profile.self.cycles-pp.intel_idle_irq
3.16 ± 7% -2.1 1.04 ± 6% perf-profile.self.cycles-pp.smp_call_function_many_cond
1.49 ± 4% -1.2 0.30 ± 12% perf-profile.self.cycles-pp.poll_idle
1.15 ± 3% -0.6 0.50 ± 9% perf-profile.self.cycles-pp._raw_spin_lock
0.60 ± 6% -0.6 0.03 ±100% perf-profile.self.cycles-pp.queued_write_lock_slowpath
0.69 ± 4% -0.5 0.22 ± 20% perf-profile.self.cycles-pp.memcpy_orig
0.66 ± 7% -0.5 0.18 ± 11% perf-profile.self.cycles-pp.update_sg_wakeup_stats
0.59 ± 4% -0.5 0.13 ± 8% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.86 ± 3% -0.4 0.43 ± 12% perf-profile.self.cycles-pp.update_sg_lb_stats
0.56 -0.4 0.16 ± 7% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
0.48 ± 3% -0.4 0.12 ± 10% perf-profile.self.cycles-pp.__slab_free
1.18 ± 2% -0.4 0.82 ± 3% perf-profile.self.cycles-pp.llist_add_batch
0.54 ± 5% -0.3 0.19 ± 6% perf-profile.self.cycles-pp.__schedule
0.47 ± 7% -0.3 0.18 ± 13% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.34 ± 5% -0.2 0.09 ± 18% perf-profile.self.cycles-pp.kmem_cache_free
0.43 ± 4% -0.2 0.18 ± 11% perf-profile.self.cycles-pp.update_load_avg
0.35 ± 4% -0.2 0.10 ± 23% perf-profile.self.cycles-pp.rcu_cblist_dequeue
0.38 ± 9% -0.2 0.15 ± 10% perf-profile.self.cycles-pp.__switch_to_asm
0.33 ± 5% -0.2 0.10 ± 16% perf-profile.self.cycles-pp.__task_pid_nr_ns
0.36 ± 6% -0.2 0.13 ± 14% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.31 ± 6% -0.2 0.09 ± 6% perf-profile.self.cycles-pp.__free_one_page
0.28 ± 5% -0.2 0.06 ± 50% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.27 ± 13% -0.2 0.06 ± 23% perf-profile.self.cycles-pp.pthread_create@@GLIBC_2.2.5
0.30 ± 7% -0.2 0.10 ± 19% perf-profile.self.cycles-pp.__switch_to
0.27 ± 4% -0.2 0.10 ± 17% perf-profile.self.cycles-pp.finish_task_switch
0.23 ± 7% -0.2 0.06 ± 50% perf-profile.self.cycles-pp.mas_walk
0.22 ± 9% -0.2 0.05 ± 48% perf-profile.self.cycles-pp.__clone
0.63 ± 5% -0.2 0.46 ± 12% perf-profile.self.cycles-pp.llist_reverse_order
0.20 ± 4% -0.2 0.04 ± 72% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.24 ± 10% -0.1 0.09 ± 19% perf-profile.self.cycles-pp.rmqueue_bulk
0.18 ± 13% -0.1 0.03 ±101% perf-profile.self.cycles-pp.__radix_tree_lookup
0.18 ± 11% -0.1 0.04 ± 71% perf-profile.self.cycles-pp.stress_pthread_func
0.36 ± 8% -0.1 0.22 ± 11% perf-profile.self.cycles-pp.menu_select
0.22 ± 4% -0.1 0.08 ± 19% perf-profile.self.cycles-pp.___perf_sw_event
0.20 ± 13% -0.1 0.07 ± 20% perf-profile.self.cycles-pp.start_thread
0.16 ± 13% -0.1 0.03 ±101% perf-profile.self.cycles-pp.alloc_vmap_area
0.17 ± 10% -0.1 0.04 ± 73% perf-profile.self.cycles-pp.kmem_cache_alloc
0.14 ± 9% -0.1 0.03 ±100% perf-profile.self.cycles-pp.futex_wake
0.17 ± 4% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.dequeue_task_fair
0.23 ± 6% -0.1 0.12 ± 11% perf-profile.self.cycles-pp.available_idle_cpu
0.22 ± 13% -0.1 0.11 ± 12% perf-profile.self.cycles-pp._find_next_bit
0.21 ± 7% -0.1 0.10 ± 6% perf-profile.self.cycles-pp.__rmqueue_pcplist
0.37 ± 7% -0.1 0.26 ± 8% perf-profile.self.cycles-pp.native_sched_clock
0.22 ± 7% -0.1 0.12 ± 21% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.19 ± 7% -0.1 0.10 ± 11% perf-profile.self.cycles-pp.enqueue_entity
0.15 ± 5% -0.1 0.06 ± 45% perf-profile.self.cycles-pp.enqueue_task_fair
0.15 ± 11% -0.1 0.06 ± 17% perf-profile.self.cycles-pp.__pick_eevdf
0.13 ± 13% -0.1 0.05 ± 72% perf-profile.self.cycles-pp.prepare_task_switch
0.17 ± 10% -0.1 0.08 ± 8% perf-profile.self.cycles-pp.update_rq_clock_task
0.54 ± 4% -0.1 0.46 ± 6% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.14 ± 14% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.os_xsave
0.11 ± 10% -0.1 0.03 ± 70% perf-profile.self.cycles-pp.try_to_wake_up
0.10 ± 8% -0.1 0.03 ±100% perf-profile.self.cycles-pp.futex_wait
0.14 ± 9% -0.1 0.07 ± 10% perf-profile.self.cycles-pp.update_curr
0.18 ± 9% -0.1 0.11 ± 14% perf-profile.self.cycles-pp.idle_cpu
0.11 ± 11% -0.1 0.04 ± 76% perf-profile.self.cycles-pp.avg_vruntime
0.15 ± 10% -0.1 0.08 ± 14% perf-profile.self.cycles-pp.update_cfs_group
0.09 ± 9% -0.1 0.03 ±100% perf-profile.self.cycles-pp.reweight_entity
0.12 ± 13% -0.1 0.06 ± 8% perf-profile.self.cycles-pp.do_idle
0.18 ± 10% -0.1 0.12 ± 13% perf-profile.self.cycles-pp.__update_load_avg_se
0.09 ± 17% -0.1 0.04 ± 71% perf-profile.self.cycles-pp.cpuidle_idle_call
0.10 ± 11% -0.0 0.06 ± 45% perf-profile.self.cycles-pp.update_rq_clock
0.12 ± 15% -0.0 0.07 ± 16% perf-profile.self.cycles-pp.update_sd_lb_stats
0.09 ± 5% -0.0 0.05 ± 46% perf-profile.self.cycles-pp._find_next_and_bit
0.01 ±223% +0.1 0.08 ± 25% perf-profile.self.cycles-pp.arch_scale_freq_tick
0.78 ± 4% +0.1 0.87 ± 4% perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
0.14 ± 10% +0.1 0.23 ± 13% perf-profile.self.cycles-pp.__intel_pmu_enable_all
0.06 ± 46% +0.1 0.15 ± 19% perf-profile.self.cycles-pp.cgroup_rstat_updated
0.19 ± 3% +0.1 0.29 ± 4% perf-profile.self.cycles-pp.cpuidle_enter_state
0.00 +0.1 0.10 ± 11% perf-profile.self.cycles-pp.__mod_lruvec_state
0.00 +0.1 0.11 ± 18% perf-profile.self.cycles-pp.__tlb_remove_page_size
0.00 +0.1 0.12 ± 9% perf-profile.self.cycles-pp.vm_normal_page
0.23 ± 7% +0.1 0.36 ± 8% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
0.20 ± 8% +0.2 0.35 ± 7% perf-profile.self.cycles-pp.__mod_lruvec_page_state
1.12 ± 2% +0.2 1.28 ± 4% perf-profile.self.cycles-pp.zap_pte_range
0.31 ± 8% +0.2 0.46 ± 9% perf-profile.self.cycles-pp.native_flush_tlb_local
0.00 +0.2 0.16 ± 5% perf-profile.self.cycles-pp._compound_head
0.06 ± 17% +0.2 0.26 ± 4% perf-profile.self.cycles-pp.__mod_node_page_state
0.00 +0.2 0.24 ± 6% perf-profile.self.cycles-pp.free_swap_cache
0.00 +0.3 0.27 ± 15% perf-profile.self.cycles-pp.clear_huge_page
0.00 +0.3 0.27 ± 11% perf-profile.self.cycles-pp.deferred_split_folio
0.00 +0.4 0.36 ± 13% perf-profile.self.cycles-pp.prep_compound_page
0.05 ± 47% +0.4 0.43 ± 9% perf-profile.self.cycles-pp.free_unref_page_prepare
0.08 ± 7% +0.5 0.57 ± 23% perf-profile.self.cycles-pp.__cond_resched
0.08 ± 12% +0.5 0.58 ± 5% perf-profile.self.cycles-pp.release_pages
0.10 ± 10% +0.5 0.63 ± 6% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.00 +1.1 1.11 ± 7% perf-profile.self.cycles-pp.__split_huge_pmd_locked
0.00 +1.2 1.18 ± 4% perf-profile.self.cycles-pp.page_add_anon_rmap
0.03 ±101% +1.3 1.35 ± 7% perf-profile.self.cycles-pp.page_remove_rmap
0.82 ± 5% +16.1 16.88 ± 7% perf-profile.self.cycles-pp.clear_page_erms
11.65 ± 3% +20.2 31.88 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
***************************************************************************************************
lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
=========================================================================================
array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
commit:
30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
---------------- ---------------------------
%stddev %change %stddev
\ | \
10.50 ± 14% +55.6% 16.33 ± 16% perf-c2c.DRAM.local
6724 -11.4% 5954 ± 2% vmstat.system.cs
2.746e+09 +16.7% 3.205e+09 ± 2% cpuidle..time
2771516 +16.0% 3213723 ± 2% cpuidle..usage
0.06 ± 4% -0.0 0.05 ± 5% mpstat.cpu.all.soft%
0.47 ± 2% -0.1 0.39 ± 2% mpstat.cpu.all.sys%
0.01 ± 85% +1700.0% 0.20 ±188% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
15.11 ± 13% -28.8% 10.76 ± 34% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
15.09 ± 13% -30.3% 10.51 ± 38% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1023952 +13.4% 1161219 meminfo.AnonHugePages
1319741 +10.8% 1461995 meminfo.AnonPages
1331039 +11.2% 1480149 meminfo.Inactive
1330865 +11.2% 1479975 meminfo.Inactive(anon)
1266202 +16.0% 1469399 ± 2% turbostat.C1E
1509871 +16.6% 1760853 ± 2% turbostat.C6
3521203 +17.4% 4134075 ± 3% turbostat.IRQ
580.32 -3.8% 558.30 turbostat.PkgWatt
77.42 -14.0% 66.60 ± 2% turbostat.RAMWatt
330416 +10.8% 366020 proc-vmstat.nr_anon_pages
500.90 +13.4% 567.99 proc-vmstat.nr_anon_transparent_hugepages
333197 +11.2% 370536 proc-vmstat.nr_inactive_anon
333197 +11.2% 370536 proc-vmstat.nr_zone_inactive_anon
129879 ± 11% -46.7% 69207 ± 12% proc-vmstat.numa_pages_migrated
3879028 +5.9% 4109180 proc-vmstat.pgalloc_normal
3403414 +6.6% 3628929 proc-vmstat.pgfree
129879 ± 11% -46.7% 69207 ± 12% proc-vmstat.pgmigrate_success
5763 +9.8% 6327 proc-vmstat.thp_fault_alloc
350993 -15.6% 296081 ± 2% stream.add_bandwidth_MBps
349830 -16.1% 293492 ± 2% stream.add_bandwidth_MBps_harmonicMean
333973 -20.5% 265439 ± 3% stream.copy_bandwidth_MBps
332930 -21.7% 260548 ± 3% stream.copy_bandwidth_MBps_harmonicMean
302788 -16.2% 253817 ± 2% stream.scale_bandwidth_MBps
302157 -17.1% 250577 ± 2% stream.scale_bandwidth_MBps_harmonicMean
1177276 +9.3% 1286614 stream.time.maximum_resident_set_size
5038 +1.1% 5095 stream.time.percent_of_cpu_this_job_got
694.19 ± 2% +19.5% 829.85 ± 2% stream.time.user_time
339047 -12.1% 298061 stream.triad_bandwidth_MBps
338186 -12.4% 296218 stream.triad_bandwidth_MBps_harmonicMean
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode
0.84 ±103% +1.7 2.57 ± 59% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.84 ±103% +1.7 2.57 ± 59% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
0.31 ±223% +2.0 2.33 ± 44% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.31 ±223% +2.0 2.33 ± 44% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
3.07 ± 56% +2.8 5.88 ± 28% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.42 ±100% -8.4 0.00 perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
8.42 ±100% -8.1 0.36 ±223% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
12.32 ± 25% -6.6 5.69 ± 69% perf-profile.children.cycles-pp.vsnprintf
12.76 ± 27% -6.6 6.19 ± 67% perf-profile.children.cycles-pp.seq_printf
3.07 ± 56% +2.8 5.88 ± 28% perf-profile.children.cycles-pp.__x64_sys_exit_group
40.11 -11.0% 35.71 ± 2% perf-stat.i.MPKI
1.563e+10 -12.3% 1.371e+10 ± 2% perf-stat.i.branch-instructions
3.721e+09 ± 2% -23.2% 2.858e+09 ± 4% perf-stat.i.cache-misses
4.471e+09 ± 3% -22.7% 3.458e+09 ± 4% perf-stat.i.cache-references
5970 ± 5% -15.9% 5021 ± 4% perf-stat.i.context-switches
1.66 ± 2% +15.8% 1.92 ± 2% perf-stat.i.cpi
41.83 ± 4% +30.6% 54.63 ± 4% perf-stat.i.cycles-between-cache-misses
2.282e+10 ± 2% -14.5% 1.952e+10 ± 2% perf-stat.i.dTLB-loads
572602 ± 3% -9.2% 519922 ± 5% perf-stat.i.dTLB-store-misses
1.483e+10 ± 2% -15.7% 1.25e+10 ± 2% perf-stat.i.dTLB-stores
9.179e+10 -13.7% 7.924e+10 ± 2% perf-stat.i.instructions
0.61 -13.4% 0.52 ± 2% perf-stat.i.ipc
373.79 ± 4% -37.8% 232.60 ± 9% perf-stat.i.metric.K/sec
251.45 -13.4% 217.72 ± 2% perf-stat.i.metric.M/sec
21446 ± 3% -24.1% 16278 ± 8% perf-stat.i.minor-faults
15.07 ± 5% -6.0 9.10 ± 10% perf-stat.i.node-load-miss-rate%
68275790 ± 5% -44.9% 37626128 ± 12% perf-stat.i.node-load-misses
21448 ± 3% -24.1% 16281 ± 8% perf-stat.i.page-faults
40.71 -11.3% 36.10 ± 2% perf-stat.overall.MPKI
1.67 +15.3% 1.93 ± 2% perf-stat.overall.cpi
41.07 ± 3% +30.1% 53.42 ± 4% perf-stat.overall.cycles-between-cache-misses
0.00 ± 2% +0.0 0.00 ± 2% perf-stat.overall.dTLB-store-miss-rate%
0.60 -13.2% 0.52 ± 2% perf-stat.overall.ipc
15.19 ± 5% -6.2 9.03 ± 11% perf-stat.overall.node-load-miss-rate%
1.4e+10 -9.3% 1.269e+10 perf-stat.ps.branch-instructions
3.352e+09 ± 3% -20.9% 2.652e+09 ± 4% perf-stat.ps.cache-misses
4.026e+09 ± 3% -20.3% 3.208e+09 ± 4% perf-stat.ps.cache-references
4888 ± 4% -10.8% 4362 ± 3% perf-stat.ps.context-switches
206092 +2.1% 210375 perf-stat.ps.cpu-clock
1.375e+11 +2.8% 1.414e+11 perf-stat.ps.cpu-cycles
258.23 ± 5% +8.8% 280.85 ± 4% perf-stat.ps.cpu-migrations
2.048e+10 -11.7% 1.809e+10 ± 2% perf-stat.ps.dTLB-loads
1.333e+10 ± 2% -13.0% 1.16e+10 ± 2% perf-stat.ps.dTLB-stores
8.231e+10 -10.8% 7.342e+10 perf-stat.ps.instructions
15755 ± 3% -16.3% 13187 ± 6% perf-stat.ps.minor-faults
61706790 ± 6% -43.8% 34699716 ± 11% perf-stat.ps.node-load-misses
15757 ± 3% -16.3% 13189 ± 6% perf-stat.ps.page-faults
206092 +2.1% 210375 perf-stat.ps.task-clock
1.217e+12 +4.1% 1.267e+12 ± 2% perf-stat.total.instructions
***************************************************************************************************
lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
commit:
30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
---------------- ---------------------------
%stddev %change %stddev
\ | \
232.12 ± 7% -12.0% 204.18 ± 8% sched_debug.cfs_rq:/.load_avg.stddev
6797 -3.3% 6576 vmstat.system.cs
15161 -0.9% 15029 vmstat.system.in
349927 +44.3% 504820 meminfo.AnonHugePages
507807 +27.1% 645169 meminfo.AnonPages
1499332 +10.2% 1652612 meminfo.Inactive(anon)
8.67 ± 62% +184.6% 24.67 ± 25% turbostat.C10
1.50 -0.1 1.45 turbostat.C1E%
3.30 -3.2% 3.20 turbostat.RAMWatt
1.40 ± 14% -0.3 1.09 ± 13% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
1.44 ± 12% -0.3 1.12 ± 13% perf-profile.children.cycles-pp.asm_exc_page_fault
0.03 ±141% +0.1 0.10 ± 30% perf-profile.children.cycles-pp.next_uptodate_folio
0.02 ±141% +0.1 0.10 ± 22% perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
0.02 ±143% +0.1 0.10 ± 25% perf-profile.self.cycles-pp.next_uptodate_folio
0.01 ±223% +0.1 0.09 ± 19% perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
19806 -3.5% 19109 phoronix-test-suite.ramspeed.Average.Integer.mb_s
283.70 +3.8% 294.50 phoronix-test-suite.time.elapsed_time
283.70 +3.8% 294.50 phoronix-test-suite.time.elapsed_time.max
120454 +1.6% 122334 phoronix-test-suite.time.maximum_resident_set_size
281337 -54.8% 127194 phoronix-test-suite.time.minor_page_faults
259.13 +4.1% 269.81 phoronix-test-suite.time.user_time
126951 +27.0% 161291 proc-vmstat.nr_anon_pages
170.86 +44.3% 246.49 proc-vmstat.nr_anon_transparent_hugepages
355917 -1.0% 352250 proc-vmstat.nr_dirty_background_threshold
712705 -1.0% 705362 proc-vmstat.nr_dirty_threshold
3265201 -1.1% 3228465 proc-vmstat.nr_free_pages
374833 +10.2% 413153 proc-vmstat.nr_inactive_anon
1767 +4.8% 1853 proc-vmstat.nr_page_table_pages
374833 +10.2% 413153 proc-vmstat.nr_zone_inactive_anon
854665 -34.3% 561406 proc-vmstat.numa_hit
854632 -34.3% 561397 proc-vmstat.numa_local
5548755 +1.1% 5610598 proc-vmstat.pgalloc_normal
1083315 -26.2% 799129 proc-vmstat.pgfault
113425 +3.7% 117656 proc-vmstat.pgreuse
9025 +7.6% 9714 proc-vmstat.thp_fault_alloc
3.38 +0.1 3.45 perf-stat.i.branch-miss-rate%
4.135e+08 -3.2% 4.003e+08 perf-stat.i.cache-misses
5.341e+08 -2.7% 5.197e+08 perf-stat.i.cache-references
6832 -3.4% 6600 perf-stat.i.context-switches
4.06 +3.1% 4.19 perf-stat.i.cpi
438639 ± 5% -18.7% 356730 ± 6% perf-stat.i.dTLB-load-misses
1.119e+09 -3.8% 1.077e+09 perf-stat.i.dTLB-loads
0.02 ± 15% -0.0 0.01 ± 26% perf-stat.i.dTLB-store-miss-rate%
80407 ± 10% -63.5% 29387 ± 23% perf-stat.i.dTLB-store-misses
7.319e+08 -3.8% 7.043e+08 perf-stat.i.dTLB-stores
57.72 +0.8 58.52 perf-stat.i.iTLB-load-miss-rate%
129846 -3.8% 124973 perf-stat.i.iTLB-load-misses
144448 -5.3% 136837 perf-stat.i.iTLB-loads
2.389e+09 -3.5% 2.305e+09 perf-stat.i.instructions
0.28 -2.9% 0.27 perf-stat.i.ipc
220.59 -3.4% 213.11 perf-stat.i.metric.M/sec
3610 -31.2% 2483 perf-stat.i.minor-faults
49238342 +1.1% 49776834 perf-stat.i.node-loads
98106028 -3.1% 95018390 perf-stat.i.node-stores
3615 -31.2% 2487 perf-stat.i.page-faults
3.65 +3.7% 3.78 perf-stat.overall.cpi
21.08 +3.3% 21.79 perf-stat.overall.cycles-between-cache-misses
0.04 ± 5% -0.0 0.03 ± 6% perf-stat.overall.dTLB-load-miss-rate%
0.01 ± 10% -0.0 0.00 ± 23% perf-stat.overall.dTLB-store-miss-rate%
0.27 -3.6% 0.26 perf-stat.overall.ipc
4.122e+08 -3.2% 3.99e+08 perf-stat.ps.cache-misses
5.324e+08 -2.7% 5.181e+08 perf-stat.ps.cache-references
6809 -3.4% 6580 perf-stat.ps.context-switches
437062 ± 5% -18.7% 355481 ± 6% perf-stat.ps.dTLB-load-misses
1.115e+09 -3.8% 1.073e+09 perf-stat.ps.dTLB-loads
80134 ± 10% -63.5% 29283 ± 23% perf-stat.ps.dTLB-store-misses
7.295e+08 -3.8% 7.021e+08 perf-stat.ps.dTLB-stores
129362 -3.7% 124535 perf-stat.ps.iTLB-load-misses
143865 -5.2% 136338 perf-stat.ps.iTLB-loads
2.381e+09 -3.5% 2.297e+09 perf-stat.ps.instructions
3596 -31.2% 2473 perf-stat.ps.minor-faults
49081949 +1.1% 49621463 perf-stat.ps.node-loads
97795918 -3.1% 94724831 perf-stat.ps.node-stores
3600 -31.2% 2477 perf-stat.ps.page-faults
***************************************************************************************************
lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
commit:
30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
---------------- ---------------------------
%stddev %change %stddev
\ | \
167.28 ± 5% -13.1% 145.32 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.avg
6845 -2.5% 6674 vmstat.system.cs
351910 ± 2% +40.2% 493341 meminfo.AnonHugePages
505908 +27.2% 643328 meminfo.AnonPages
1497656 +10.2% 1650453 meminfo.Inactive(anon)
18957 ± 13% +26.3% 23947 ± 17% turbostat.C1
1.52 -0.0 1.48 turbostat.C1E%
3.32 -2.9% 3.23 turbostat.RAMWatt
19978 -3.0% 19379 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
280.71 +3.3% 289.93 phoronix-test-suite.time.elapsed_time
280.71 +3.3% 289.93 phoronix-test-suite.time.elapsed_time.max
120465 +1.5% 122257 phoronix-test-suite.time.maximum_resident_set_size
281047 -54.7% 127190 phoronix-test-suite.time.minor_page_faults
257.03 +3.5% 265.95 phoronix-test-suite.time.user_time
126473 +27.2% 160831 proc-vmstat.nr_anon_pages
171.83 ± 2% +40.2% 240.89 proc-vmstat.nr_anon_transparent_hugepages
355973 -1.0% 352304 proc-vmstat.nr_dirty_background_threshold
712818 -1.0% 705471 proc-vmstat.nr_dirty_threshold
3265800 -1.1% 3228879 proc-vmstat.nr_free_pages
374410 +10.2% 412613 proc-vmstat.nr_inactive_anon
1770 +4.4% 1848 proc-vmstat.nr_page_table_pages
374410 +10.2% 412613 proc-vmstat.nr_zone_inactive_anon
852082 -34.9% 555093 proc-vmstat.numa_hit
852125 -34.9% 555018 proc-vmstat.numa_local
1078293 -26.6% 791038 proc-vmstat.pgfault
112693 +2.9% 116004 proc-vmstat.pgreuse
9025 +7.6% 9713 proc-vmstat.thp_fault_alloc
3.63 ± 6% +0.6 4.25 ± 9% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.25 ± 55% -0.2 0.08 ± 68% perf-profile.children.cycles-pp.ret_from_fork_asm
0.25 ± 55% -0.2 0.08 ± 68% perf-profile.children.cycles-pp.ret_from_fork
0.23 ± 56% -0.2 0.07 ± 69% perf-profile.children.cycles-pp.kthread
0.14 ± 36% -0.1 0.05 ±120% perf-profile.children.cycles-pp.do_anonymous_page
0.14 ± 35% -0.1 0.05 ± 76% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string
0.04 ± 72% +0.0 0.08 ± 19% perf-profile.children.cycles-pp.try_to_wake_up
0.04 ±118% +0.1 0.10 ± 36% perf-profile.children.cycles-pp.update_rq_clock
0.07 ± 79% +0.1 0.17 ± 21% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
7.99 ± 11% +1.0 9.02 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.23 ± 28% -0.1 0.14 ± 49% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
0.14 ± 35% -0.1 0.05 ± 76% perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string
0.06 ± 79% +0.1 0.16 ± 21% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.21 ± 34% +0.2 0.36 ± 18% perf-profile.self.cycles-pp.ktime_get
1.187e+08 -4.6% 1.133e+08 perf-stat.i.branch-instructions
3.36 +0.1 3.42 perf-stat.i.branch-miss-rate%
5492420 -3.9% 5275592 perf-stat.i.branch-misses
4.148e+08 -2.8% 4.034e+08 perf-stat.i.cache-misses
5.251e+08 -2.6% 5.114e+08 perf-stat.i.cache-references
6880 -2.5% 6711 perf-stat.i.context-switches
4.30 +2.9% 4.43 perf-stat.i.cpi
0.10 ± 7% -0.0 0.09 ± 2% perf-stat.i.dTLB-load-miss-rate%
472268 ± 6% -19.9% 378489 perf-stat.i.dTLB-load-misses
8.107e+08 -3.4% 7.831e+08 perf-stat.i.dTLB-loads
0.02 ± 16% -0.0 0.01 ± 2% perf-stat.i.dTLB-store-miss-rate%
90535 ± 11% -59.8% 36371 ± 2% perf-stat.i.dTLB-store-misses
5.323e+08 -3.3% 5.145e+08 perf-stat.i.dTLB-stores
129981 -3.0% 126061 perf-stat.i.iTLB-load-misses
143662 -3.1% 139223 perf-stat.i.iTLB-loads
2.253e+09 -3.6% 2.172e+09 perf-stat.i.instructions
0.26 -3.2% 0.25 perf-stat.i.ipc
4.71 ± 2% -6.4% 4.41 ± 2% perf-stat.i.major-faults
180.03 -3.0% 174.57 perf-stat.i.metric.M/sec
3627 -30.8% 2510 ± 2% perf-stat.i.minor-faults
3632 -30.8% 2514 ± 2% perf-stat.i.page-faults
3.88 +3.6% 4.02 perf-stat.overall.cpi
21.08 +2.7% 21.65 perf-stat.overall.cycles-between-cache-misses
0.06 ± 6% -0.0 0.05 perf-stat.overall.dTLB-load-miss-rate%
0.02 ± 11% -0.0 0.01 ± 2% perf-stat.overall.dTLB-store-miss-rate%
0.26 -3.5% 0.25 perf-stat.overall.ipc
1.182e+08 -4.6% 1.128e+08 perf-stat.ps.branch-instructions
5468166 -4.0% 5251939 perf-stat.ps.branch-misses
4.135e+08 -2.7% 4.021e+08 perf-stat.ps.cache-misses
5.234e+08 -2.6% 5.098e+08 perf-stat.ps.cache-references
6859 -2.5% 6685 perf-stat.ps.context-switches
470567 ± 6% -19.9% 377127 perf-stat.ps.dTLB-load-misses
8.079e+08 -3.4% 7.805e+08 perf-stat.ps.dTLB-loads
90221 ± 11% -59.8% 36239 ± 2% perf-stat.ps.dTLB-store-misses
5.305e+08 -3.3% 5.128e+08 perf-stat.ps.dTLB-stores
129499 -3.0% 125601 perf-stat.ps.iTLB-load-misses
143121 -3.1% 138638 perf-stat.ps.iTLB-loads
2.246e+09 -3.6% 2.165e+09 perf-stat.ps.instructions
4.69 ± 2% -6.3% 4.39 ± 2% perf-stat.ps.major-faults
3613 -30.8% 2500 ± 2% perf-stat.ps.minor-faults
3617 -30.8% 2504 ± 2% perf-stat.ps.page-faults
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-19 15:41 [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression kernel test robot
@ 2023-12-20 5:27 ` Yang Shi
2023-12-20 8:29 ` Yin Fengwei
0 siblings, 1 reply; 24+ messages in thread
From: Yang Shi @ 2023-12-20 5:27 UTC (permalink / raw)
To: kernel test robot
Cc: Rik van Riel, oe-lkp, lkp, Linux Memory Management List,
Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang,
feng.tang, fengwei.yin
On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> for this commit, we reported
> "[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression"
> in Aug, 2022 when it's in linux-next/master
> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
>
> later, we reported
> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
> in Oct, 2022 when it's in linus/master
> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
>
> and the commit was reverted finally by
> commit 0ba09b1733878afe838fe35c310715fda3d46428
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Sun Dec 4 12:51:59 2022 -0800
>
> now we noticed it goes into linux-next/master again.
>
> we are not sure if there is an agreement that the benefit of this commit
> has already overweight performance drop in some mirco benchmark.
>
> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
> that
> "This patch was applied to v6.1, but was reverted due to a regression
> report. However it turned out the regression was not due to this patch.
> I ping'ed Andrew to reapply this patch, Andrew may forget it. This
> patch helps promote THP, so I rebased it onto the latest mm-unstable."
IIRC, Huang Ying's analysis showed the regression for will-it-scale
micro benchmark is fine, it was actually reverted due to kernel build
regression with LLVM reported by Nathan Chancellor. Then the
regression was resolved by commit
81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
if page in deferred queue already"). And this patch did improve kernel
build with GCC by ~3% if I remember correctly.
>
> however, unfortunately, in our latest tests, we still observed below regression
> upon this commit. just FYI.
>
>
>
> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
Interesting, wasn't the same regression seen last time? And I'm a
little bit confused about how pthread got regressed. I didn't see the
pthread benchmark do any intensive memory alloc/free operations. Do
the pthread APIs do any intensive memory operations? I saw the
benchmark does allocate memory for thread stack, but it should be just
8K per thread, so it should not trigger what this patch does. With
1024 threads, the thread stacks may get merged into one single VMA (8M
total), but it may do so even though the patch is not applied.
>
>
> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: stress-ng
> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
> parameters:
>
> nr_threads: 1
> disk: 1HDD
> testtime: 60s
> fs: ext4
> class: os
> test: pthread
> cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+-----------------------------------------------------------------------------------------------+
> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression |
> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory |
> | test parameters | array_size=50000000 |
> | | cpufreq_governor=performance |
> | | iterations=10x |
> | | loop=100 |
> | | nr_threads=25% |
> | | omp=true |
> +------------------+-----------------------------------------------------------------------------------------------+
> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression |
> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
> | test parameters | cpufreq_governor=performance |
> | | option_a=Average |
> | | option_b=Integer |
> | | test=ramspeed-1.4.3 |
> +------------------+-----------------------------------------------------------------------------------------------+
> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
> | test parameters | cpufreq_governor=performance |
> | | option_a=Average |
> | | option_b=Floating Point |
> | | test=ramspeed-1.4.3 |
> +------------------+-----------------------------------------------------------------------------------------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
>
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
>
> commit:
> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 13405796 -65.5% 4620124 cpuidle..usage
> 8.00 +8.2% 8.66 ą 2% iostat.cpu.system
> 1.61 -60.6% 0.63 iostat.cpu.user
> 597.50 ą 14% -64.3% 213.50 ą 14% perf-c2c.DRAM.local
> 1882 ą 14% -74.7% 476.83 ą 7% perf-c2c.HITM.local
> 3768436 -12.9% 3283395 vmstat.memory.cache
> 355105 -75.7% 86344 ą 3% vmstat.system.cs
> 385435 -20.7% 305714 ą 3% vmstat.system.in
> 1.13 -0.2 0.88 mpstat.cpu.all.irq%
> 0.29 -0.2 0.10 ą 2% mpstat.cpu.all.soft%
> 6.76 ą 2% +1.1 7.88 ą 2% mpstat.cpu.all.sys%
> 1.62 -1.0 0.62 ą 2% mpstat.cpu.all.usr%
> 2234397 -84.3% 350161 ą 5% stress-ng.pthread.ops
> 37237 -84.3% 5834 ą 5% stress-ng.pthread.ops_per_sec
> 294706 ą 2% -68.0% 94191 ą 6% stress-ng.time.involuntary_context_switches
> 41442 ą 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size
> 4466457 -83.9% 717053 ą 5% stress-ng.time.minor_page_faults
The larger RSS and fewer page faults are expected.
> 243.33 +13.5% 276.17 ą 3% stress-ng.time.percent_of_cpu_this_job_got
> 131.64 +27.7% 168.11 ą 3% stress-ng.time.system_time
> 19.73 -82.1% 3.53 ą 4% stress-ng.time.user_time
Much less user time. And it seems to match the drop of the pthread metric.
> 7715609 -80.2% 1530125 ą 4% stress-ng.time.voluntary_context_switches
> 494566 -59.5% 200338 ą 3% meminfo.Active
> 478287 -61.5% 184050 ą 3% meminfo.Active(anon)
> 58549 ą 17% +1532.8% 956006 ą 14% meminfo.AnonHugePages
> 424631 +194.9% 1252445 ą 10% meminfo.AnonPages
> 3677263 -13.0% 3197755 meminfo.Cached
> 5829485 ą 4% -19.0% 4724784 ą 10% meminfo.Committed_AS
> 692486 +108.6% 1444669 ą 8% meminfo.Inactive
> 662179 +113.6% 1414338 ą 9% meminfo.Inactive(anon)
> 182416 -50.2% 90759 meminfo.Mapped
> 4614466 +10.0% 5076604 ą 2% meminfo.Memused
> 6985 +47.6% 10307 ą 4% meminfo.PageTables
> 718445 -66.7% 238913 ą 3% meminfo.Shmem
> 35906 -20.7% 28471 ą 3% meminfo.VmallocUsed
> 4838522 +25.6% 6075302 meminfo.max_used_kB
> 488.83 -20.9% 386.67 ą 2% turbostat.Avg_MHz
> 12.95 -2.7 10.26 ą 2% turbostat.Busy%
> 7156734 -87.2% 919149 ą 4% turbostat.C1
> 10.59 -8.9 1.65 ą 5% turbostat.C1%
> 3702647 -55.1% 1663518 ą 2% turbostat.C1E
> 32.99 -20.6 12.36 ą 3% turbostat.C1E%
> 1161078 +64.5% 1909611 turbostat.C6
> 44.25 +31.8 76.10 turbostat.C6%
> 0.18 -33.3% 0.12 turbostat.IPC
> 74338573 ą 2% -33.9% 49159610 ą 4% turbostat.IRQ
> 1381661 -91.0% 124075 ą 6% turbostat.POLL
> 0.26 -0.2 0.04 ą 12% turbostat.POLL%
> 96.15 -5.4% 90.95 turbostat.PkgWatt
> 12.12 +19.3% 14.46 turbostat.RAMWatt
> 119573 -61.5% 46012 ą 3% proc-vmstat.nr_active_anon
> 106168 +195.8% 314047 ą 10% proc-vmstat.nr_anon_pages
> 28.60 ą 17% +1538.5% 468.68 ą 14% proc-vmstat.nr_anon_transparent_hugepages
> 923365 -13.0% 803489 proc-vmstat.nr_file_pages
> 165571 +113.5% 353493 ą 9% proc-vmstat.nr_inactive_anon
> 45605 -50.2% 22690 proc-vmstat.nr_mapped
> 1752 +47.1% 2578 ą 4% proc-vmstat.nr_page_table_pages
> 179613 -66.7% 59728 ą 3% proc-vmstat.nr_shmem
> 21490 -2.4% 20981 proc-vmstat.nr_slab_reclaimable
> 28260 -7.3% 26208 proc-vmstat.nr_slab_unreclaimable
> 119573 -61.5% 46012 ą 3% proc-vmstat.nr_zone_active_anon
> 165570 +113.5% 353492 ą 9% proc-vmstat.nr_zone_inactive_anon
> 17343640 -76.3% 4116748 ą 4% proc-vmstat.numa_hit
> 17364975 -76.3% 4118098 ą 4% proc-vmstat.numa_local
> 249252 -66.2% 84187 ą 2% proc-vmstat.pgactivate
> 27528916 +567.1% 1.836e+08 ą 5% proc-vmstat.pgalloc_normal
> 4912427 -79.2% 1019949 ą 3% proc-vmstat.pgfault
> 27227124 +574.1% 1.835e+08 ą 5% proc-vmstat.pgfree
> 8728 +3896.4% 348802 ą 5% proc-vmstat.thp_deferred_split_page
> 8730 +3895.3% 348814 ą 5% proc-vmstat.thp_fault_alloc
> 8728 +3896.4% 348802 ą 5% proc-vmstat.thp_split_pmd
> 316745 -21.5% 248756 ą 4% sched_debug.cfs_rq:/.avg_vruntime.avg
> 112735 ą 4% -34.3% 74061 ą 6% sched_debug.cfs_rq:/.avg_vruntime.min
> 0.49 ą 6% -17.2% 0.41 ą 8% sched_debug.cfs_rq:/.h_nr_running.stddev
> 12143 ą120% -99.9% 15.70 ą116% sched_debug.cfs_rq:/.left_vruntime.avg
> 414017 ą126% -99.9% 428.50 ą102% sched_debug.cfs_rq:/.left_vruntime.max
> 68492 ą125% -99.9% 78.15 ą106% sched_debug.cfs_rq:/.left_vruntime.stddev
> 41917 ą 24% -48.3% 21690 ą 57% sched_debug.cfs_rq:/.load.avg
> 176151 ą 30% -56.9% 75963 ą 57% sched_debug.cfs_rq:/.load.stddev
> 6489 ą 17% -29.0% 4608 ą 12% sched_debug.cfs_rq:/.load_avg.max
> 4.42 ą 45% -81.1% 0.83 ą 74% sched_debug.cfs_rq:/.load_avg.min
> 1112 ą 17% -31.0% 767.62 ą 11% sched_debug.cfs_rq:/.load_avg.stddev
> 316745 -21.5% 248756 ą 4% sched_debug.cfs_rq:/.min_vruntime.avg
> 112735 ą 4% -34.3% 74061 ą 6% sched_debug.cfs_rq:/.min_vruntime.min
> 0.49 ą 6% -17.2% 0.41 ą 8% sched_debug.cfs_rq:/.nr_running.stddev
> 12144 ą120% -99.9% 15.70 ą116% sched_debug.cfs_rq:/.right_vruntime.avg
> 414017 ą126% -99.9% 428.50 ą102% sched_debug.cfs_rq:/.right_vruntime.max
> 68492 ą125% -99.9% 78.15 ą106% sched_debug.cfs_rq:/.right_vruntime.stddev
> 14.25 ą 44% -76.6% 3.33 ą 58% sched_debug.cfs_rq:/.runnable_avg.min
> 11.58 ą 49% -77.7% 2.58 ą 58% sched_debug.cfs_rq:/.util_avg.min
> 423972 ą 23% +59.3% 675379 ą 3% sched_debug.cpu.avg_idle.avg
> 5720 ą 43% +439.5% 30864 sched_debug.cpu.avg_idle.min
> 99.79 ą 2% -23.7% 76.11 ą 2% sched_debug.cpu.clock_task.stddev
> 162475 ą 49% -95.8% 6813 ą 26% sched_debug.cpu.curr->pid.avg
> 1061268 -84.0% 170212 ą 4% sched_debug.cpu.curr->pid.max
> 365404 ą 20% -91.3% 31839 ą 10% sched_debug.cpu.curr->pid.stddev
> 0.51 ą 3% -20.1% 0.41 ą 9% sched_debug.cpu.nr_running.stddev
> 311923 -74.2% 80615 ą 2% sched_debug.cpu.nr_switches.avg
> 565973 ą 4% -77.8% 125597 ą 10% sched_debug.cpu.nr_switches.max
> 192666 ą 4% -70.6% 56695 ą 6% sched_debug.cpu.nr_switches.min
> 67485 ą 8% -79.9% 13558 ą 10% sched_debug.cpu.nr_switches.stddev
> 2.62 +102.1% 5.30 perf-stat.i.MPKI
> 2.09e+09 -47.6% 1.095e+09 ą 4% perf-stat.i.branch-instructions
> 1.56 -0.5 1.01 perf-stat.i.branch-miss-rate%
> 31951200 -60.9% 12481432 ą 2% perf-stat.i.branch-misses
> 19.38 +23.7 43.08 perf-stat.i.cache-miss-rate%
> 26413597 -5.7% 24899132 ą 4% perf-stat.i.cache-misses
> 1.363e+08 -58.3% 56906133 ą 4% perf-stat.i.cache-references
> 370628 -75.8% 89743 ą 3% perf-stat.i.context-switches
> 1.77 +65.1% 2.92 ą 2% perf-stat.i.cpi
> 1.748e+10 -21.8% 1.367e+10 ą 2% perf-stat.i.cpu-cycles
> 61611 -79.1% 12901 ą 6% perf-stat.i.cpu-migrations
> 716.97 ą 2% -17.2% 593.35 ą 2% perf-stat.i.cycles-between-cache-misses
> 0.12 ą 4% -0.1 0.05 perf-stat.i.dTLB-load-miss-rate%
> 3066100 ą 3% -81.3% 573066 ą 5% perf-stat.i.dTLB-load-misses
> 2.652e+09 -50.1% 1.324e+09 ą 4% perf-stat.i.dTLB-loads
> 0.08 ą 2% -0.0 0.03 perf-stat.i.dTLB-store-miss-rate%
> 1168195 ą 2% -82.9% 199438 ą 5% perf-stat.i.dTLB-store-misses
> 1.478e+09 -56.8% 6.384e+08 ą 3% perf-stat.i.dTLB-stores
> 8080423 -73.2% 2169371 ą 3% perf-stat.i.iTLB-load-misses
> 5601321 -74.3% 1440571 ą 2% perf-stat.i.iTLB-loads
> 1.028e+10 -49.7% 5.173e+09 ą 4% perf-stat.i.instructions
> 1450 +73.1% 2511 ą 2% perf-stat.i.instructions-per-iTLB-miss
> 0.61 -35.9% 0.39 perf-stat.i.ipc
> 0.48 -21.4% 0.38 ą 2% perf-stat.i.metric.GHz
> 616.28 -17.6% 507.69 ą 4% perf-stat.i.metric.K/sec
> 175.16 -50.8% 86.18 ą 4% perf-stat.i.metric.M/sec
> 76728 -80.8% 14724 ą 4% perf-stat.i.minor-faults
> 5600408 -61.4% 2160997 ą 5% perf-stat.i.node-loads
> 8873996 +52.1% 13499744 ą 5% perf-stat.i.node-stores
> 112409 -81.9% 20305 ą 4% perf-stat.i.page-faults
> 2.55 +89.6% 4.83 perf-stat.overall.MPKI
Much more TLB misses.
> 1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate%
> 19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate%
> 1.70 +56.4% 2.65 perf-stat.overall.cpi
> 665.84 -17.5% 549.51 ą 2% perf-stat.overall.cycles-between-cache-misses
> 0.12 ą 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate%
> 0.08 ą 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate%
> 59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate%
> 1278 +86.1% 2379 ą 2% perf-stat.overall.instructions-per-iTLB-miss
> 0.59 -36.1% 0.38 perf-stat.overall.ipc
Worse IPC and CPI.
> 2.078e+09 -48.3% 1.074e+09 ą 4% perf-stat.ps.branch-instructions
> 31292687 -61.2% 12133349 ą 2% perf-stat.ps.branch-misses
> 26057291 -5.9% 24512034 ą 4% perf-stat.ps.cache-misses
> 1.353e+08 -58.6% 56072195 ą 4% perf-stat.ps.cache-references
> 365254 -75.8% 88464 ą 3% perf-stat.ps.context-switches
> 1.735e+10 -22.4% 1.346e+10 ą 2% perf-stat.ps.cpu-cycles
> 60838 -79.1% 12727 ą 6% perf-stat.ps.cpu-migrations
> 3056601 ą 4% -81.5% 565354 ą 4% perf-stat.ps.dTLB-load-misses
> 2.636e+09 -50.7% 1.3e+09 ą 4% perf-stat.ps.dTLB-loads
> 1155253 ą 2% -83.0% 196581 ą 5% perf-stat.ps.dTLB-store-misses
> 1.473e+09 -57.4% 6.268e+08 ą 3% perf-stat.ps.dTLB-stores
> 7997726 -73.3% 2131477 ą 3% perf-stat.ps.iTLB-load-misses
> 5521346 -74.3% 1418623 ą 2% perf-stat.ps.iTLB-loads
> 1.023e+10 -50.4% 5.073e+09 ą 4% perf-stat.ps.instructions
> 75671 -80.9% 14479 ą 4% perf-stat.ps.minor-faults
> 5549722 -61.4% 2141750 ą 4% perf-stat.ps.node-loads
> 8769156 +51.6% 13296579 ą 5% perf-stat.ps.node-stores
> 110795 -82.0% 19977 ą 4% perf-stat.ps.page-faults
> 6.482e+11 -50.7% 3.197e+11 ą 4% perf-stat.total.instructions
> 0.00 ą 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
> 0.01 ą 18% +8373.1% 0.73 ą 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
> 0.01 ą 16% +4600.0% 0.38 ą 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
More time spent in madvise and munmap. but I'm not sure whether this
is caused by tearing down the address space when exiting the test. If
so it should not count in the regression.
> 0.01 ą204% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
> 0.01 ą 8% +3678.9% 0.36 ą 79% perf-sched.sch_delay.avg.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
> 0.01 ą 14% -38.5% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
> 0.01 ą 5% +2946.2% 0.26 ą 43% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
> 0.00 ą 14% +125.0% 0.01 ą 12% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 0.02 ą170% -83.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 ą 69% +6578.6% 0.31 ą 4% perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
> 0.00 +100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
> 0.02 ą 86% +4234.4% 0.65 ą 4% perf-sched.sch_delay.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
> 0.01 ą 6% +6054.3% 0.47 perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
> 0.00 ą 14% +195.2% 0.01 ą 89% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 0.00 ą102% +340.0% 0.01 ą 85% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> 0.00 +100.0% 0.00 perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
> 0.00 ą 11% +66.7% 0.01 ą 21% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
> 0.01 ą 89% +1096.1% 0.15 ą 30% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
> 0.00 +141.7% 0.01 ą 61% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
> 0.00 ą223% +9975.0% 0.07 ą203% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
> 0.00 ą 10% +789.3% 0.04 ą 69% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
> 0.00 ą 31% +6691.3% 0.26 ą 5% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
> 0.00 ą 28% +14612.5% 0.59 ą 4% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
> 0.00 ą 24% +4904.2% 0.20 ą 4% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
> 0.00 ą 28% +450.0% 0.01 ą 74% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
> 0.00 ą 17% +984.6% 0.02 ą 79% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 0.00 ą 20% +231.8% 0.01 ą 89% perf-sched.sch_delay.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.submit_bio_wait
> 0.00 +350.0% 0.01 ą 16% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 0.02 ą 16% +320.2% 0.07 ą 2% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 0.02 ą 2% +282.1% 0.09 ą 5% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.00 ą 14% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
> 0.05 ą 35% +3784.5% 1.92 ą 16% perf-sched.sch_delay.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
> 0.29 ą128% +563.3% 1.92 ą 7% perf-sched.sch_delay.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
> 0.14 ą217% -99.7% 0.00 ą223% perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
> 0.03 ą 49% -74.0% 0.01 ą 51% perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
> 0.01 ą 54% -57.4% 0.00 ą 75% perf-sched.sch_delay.max.ms.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
> 0.12 ą 21% +873.0% 1.19 ą 60% perf-sched.sch_delay.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
> 2.27 ą220% -99.7% 0.01 ą 19% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
> 0.02 ą 36% -54.4% 0.01 ą 55% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
> 0.04 ą 36% -77.1% 0.01 ą 31% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
> 0.12 ą 32% +1235.8% 1.58 ą 31% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
> 2.25 ą218% -99.3% 0.02 ą 52% perf-sched.sch_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.01 ą 85% +19836.4% 2.56 ą 7% perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
> 0.03 ą 70% -93.6% 0.00 ą223% perf-sched.sch_delay.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
> 0.10 ą 16% +2984.2% 3.21 ą 6% perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
> 0.01 ą 20% +883.9% 0.05 ą177% perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 0.01 ą 15% +694.7% 0.08 ą123% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
> 0.00 ą223% +6966.7% 0.07 ą199% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
> 0.01 ą 38% +8384.6% 0.55 ą 72% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 0.01 ą 13% +12995.7% 1.51 ą103% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 117.80 ą 56% -96.4% 4.26 ą 36% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 0.01 ą 68% +331.9% 0.03 perf-sched.total_sch_delay.average.ms
> 4.14 +242.6% 14.20 ą 4% perf-sched.total_wait_and_delay.average.ms
> 700841 -69.6% 212977 ą 3% perf-sched.total_wait_and_delay.count.ms
> 4.14 +242.4% 14.16 ą 4% perf-sched.total_wait_time.average.ms
> 11.68 ą 8% +213.3% 36.59 ą 28% perf-sched.wait_and_delay.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
> 10.00 ą 2% +226.1% 32.62 ą 20% perf-sched.wait_and_delay.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
> 10.55 ą 3% +259.8% 37.96 ą 7% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
> 9.80 ą 12% +196.5% 29.07 ą 32% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
> 9.80 ą 4% +234.9% 32.83 ą 14% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
> 10.32 ą 2% +223.8% 33.42 ą 6% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
> 8.15 ą 14% +271.3% 30.25 ą 35% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
> 9.60 ą 4% +240.8% 32.73 ą 16% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
> 10.37 ą 4% +232.0% 34.41 ą 10% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
> 7.32 ą 46% +269.7% 27.07 ą 49% perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> 9.88 +236.2% 33.23 ą 4% perf-sched.wait_and_delay.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
> 4.44 ą 4% +379.0% 21.27 ą 18% perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 10.05 ą 2% +235.6% 33.73 ą 11% perf-sched.wait_and_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.03 +462.6% 0.15 ą 6% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 6.78 ą 4% +482.1% 39.46 ą 3% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> 3.17 +683.3% 24.85 ą 8% perf-sched.wait_and_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
> 36.64 ą 13% +244.7% 126.32 ą 6% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
> 9.81 +302.4% 39.47 ą 4% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
> 1.05 +48.2% 1.56 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
> 0.93 +14.2% 1.06 ą 2% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
> 9.93 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
> 12.02 ą 3% +139.8% 28.83 ą 6% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 6.09 ą 2% +403.0% 30.64 ą 5% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 23.17 ą 19% -83.5% 3.83 ą143% perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio
> 79.83 ą 9% -55.1% 35.83 ą 16% perf-sched.wait_and_delay.count.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
> 14.83 ą 14% -59.6% 6.00 ą 56% perf-sched.wait_and_delay.count.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
> 8.50 ą 17% -80.4% 1.67 ą 89% perf-sched.wait_and_delay.count.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
> 114.00 ą 14% -62.4% 42.83 ą 11% perf-sched.wait_and_delay.count.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
> 94.67 ą 7% -48.1% 49.17 ą 13% perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
> 59.83 ą 13% -76.0% 14.33 ą 48% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
> 103.00 ą 12% -48.1% 53.50 ą 20% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
> 19.33 ą 16% -56.0% 8.50 ą 29% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
> 68.17 ą 11% -39.1% 41.50 ą 19% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
> 36.67 ą 22% -79.1% 7.67 ą 46% perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
> 465.50 ą 9% -47.4% 244.83 ą 11% perf-sched.wait_and_delay.count.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
> 14492 ą 3% -96.3% 533.67 ą 10% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 128.67 ą 7% -53.5% 59.83 ą 10% perf-sched.wait_and_delay.count.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 7.67 ą 34% -80.4% 1.50 ą107% perf-sched.wait_and_delay.count.__cond_resched.vunmap_p4d_range.__vunmap_range_noflush.remove_vm_area.vfree
> 147533 -81.0% 28023 ą 5% perf-sched.wait_and_delay.count.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 4394 ą 4% -78.5% 942.83 ą 7% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> 228791 -79.3% 47383 ą 4% perf-sched.wait_and_delay.count.futex_wait_queue.__futex_wait.futex_wait.do_futex
> 368.50 ą 2% -67.1% 121.33 ą 3% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
> 147506 -81.0% 28010 ą 5% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
> 5387 ą 6% -16.7% 4488 ą 5% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
> 8303 ą 2% -56.9% 3579 ą 5% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
> 14.67 ą 7% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
> 370.50 ą141% +221.9% 1192 ą 5% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 24395 ą 2% -51.2% 11914 ą 6% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 31053 ą 2% -80.5% 6047 ą 5% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 16.41 ą 2% +342.7% 72.65 ą 29% perf-sched.wait_and_delay.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
> 16.49 ą 3% +463.3% 92.90 ą 27% perf-sched.wait_and_delay.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
> 17.32 ą 5% +520.9% 107.52 ą 14% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
> 15.38 ą 6% +325.2% 65.41 ą 22% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
> 16.73 ą 4% +456.2% 93.04 ą 11% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
> 17.14 ą 3% +510.6% 104.68 ą 14% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
> 15.70 ą 4% +379.4% 75.25 ą 28% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
> 15.70 ą 3% +422.1% 81.97 ą 19% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
> 16.38 +528.4% 102.91 ą 21% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
> 45.20 ą 48% +166.0% 120.23 ą 27% perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> 17.25 +495.5% 102.71 ą 2% perf-sched.wait_and_delay.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
> 402.57 ą 15% -52.8% 189.90 ą 14% perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 16.96 ą 4% +521.3% 105.40 ą 15% perf-sched.wait_and_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 28.45 +517.3% 175.65 ą 14% perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 22.49 +628.5% 163.83 ą 16% perf-sched.wait_and_delay.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
> 26.53 ą 30% +326.9% 113.25 ą 16% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
> 15.54 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
> 1.67 ą141% +284.6% 6.44 ą 4% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 0.07 ą 34% -93.6% 0.00 ą105% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
> 10.21 ą 15% +295.8% 40.43 ą 50% perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 3.89 ą 40% -99.8% 0.01 ą113% perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
> 11.67 ą 8% +213.5% 36.58 ą 28% perf-sched.wait_time.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
> 9.98 ą 2% +226.8% 32.61 ą 20% perf-sched.wait_time.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
> 1.03 +71.2% 1.77 ą 20% perf-sched.wait_time.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
> 0.06 ą 79% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
> 0.05 ą 22% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
> 0.08 ą 82% -98.2% 0.00 ą223% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 10.72 ą 10% +166.9% 28.61 ą 29% perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
> 10.53 ą 3% +260.5% 37.95 ą 7% perf-sched.wait_time.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
> 9.80 ą 12% +196.6% 29.06 ą 32% perf-sched.wait_time.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
> 9.80 ą 4% +235.1% 32.82 ą 14% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
> 9.50 ą 12% +281.9% 36.27 ą 70% perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
> 10.31 ą 2% +223.9% 33.40 ą 6% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
> 8.04 ą 15% +276.1% 30.25 ą 35% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
> 9.60 ą 4% +240.9% 32.72 ą 16% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
> 0.06 ą 66% -98.3% 0.00 ą223% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
> 10.36 ą 4% +232.1% 34.41 ą 10% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
> 0.08 ą 50% -95.7% 0.00 ą100% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
> 0.01 ą 49% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
> 0.03 ą 73% -87.4% 0.00 ą145% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
> 8.01 ą 25% +238.0% 27.07 ą 49% perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> 9.86 +237.0% 33.23 ą 4% perf-sched.wait_time.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
> 4.44 ą 4% +379.2% 21.26 ą 18% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 10.03 +236.3% 33.73 ą 11% perf-sched.wait_time.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.97 ą 8% -87.8% 0.12 ą221% perf-sched.wait_time.avg.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
> 0.02 ą 13% +1846.8% 0.45 ą 11% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
> 1.01 +64.7% 1.66 perf-sched.wait_time.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
> 0.75 ą 4% +852.1% 7.10 ą 5% perf-sched.wait_time.avg.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 0.03 +462.6% 0.15 ą 6% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.24 ą 4% +25.3% 0.30 ą 8% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
> 1.98 ą 15% +595.7% 13.80 ą 90% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
> 2.78 ą 14% +444.7% 15.12 ą 16% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
> 6.77 ą 4% +483.0% 39.44 ą 3% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> 3.17 +684.7% 24.85 ą 8% perf-sched.wait_time.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
> 36.64 ą 13% +244.7% 126.32 ą 6% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
> 9.79 +303.0% 39.45 ą 4% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
> 1.05 +23.8% 1.30 perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
> 0.86 +101.2% 1.73 ą 3% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
> 0.11 ą 21% +438.9% 0.61 ą 15% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
> 0.32 ą 4% +28.5% 0.41 ą 13% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 12.00 ą 3% +139.6% 28.76 ą 6% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 6.07 ą 2% +403.5% 30.56 ą 5% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.38 ą 41% -98.8% 0.00 ą105% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
> 0.36 ą 34% -84.3% 0.06 ą200% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.vma_alloc_folio.do_anonymous_page
> 0.36 ą 51% -92.9% 0.03 ą114% perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
> 15.98 ą 5% +361.7% 73.80 ą 23% perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.51 ą 14% -92.8% 0.04 ą196% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.__vmalloc_area_node.__vmalloc_node_range
> 8.56 ą 11% -99.9% 0.01 ą126% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
> 0.43 ą 32% -68.2% 0.14 ą119% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_node_trace.__get_vm_area_node.__vmalloc_node_range
> 0.46 ą 20% -89.3% 0.05 ą184% perf-sched.wait_time.max.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct
> 16.40 ą 2% +342.9% 72.65 ą 29% perf-sched.wait_time.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
> 0.31 ą 63% -76.2% 0.07 ą169% perf-sched.wait_time.max.ms.__cond_resched.cgroup_css_set_fork.cgroup_can_fork.copy_process.kernel_clone
> 0.14 ą 93% +258.7% 0.49 ą 14% perf-sched.wait_time.max.ms.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
> 16.49 ą 3% +463.5% 92.89 ą 27% perf-sched.wait_time.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
> 1.09 +171.0% 2.96 ą 10% perf-sched.wait_time.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
> 1.16 ą 7% +155.1% 2.97 ą 4% perf-sched.wait_time.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
> 0.19 ą 78% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
> 0.33 ą 35% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
> 0.20 ą101% -99.3% 0.00 ą223% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 17.31 ą 5% +521.0% 107.51 ą 14% perf-sched.wait_time.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
> 15.38 ą 6% +325.3% 65.40 ą 22% perf-sched.wait_time.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
> 16.72 ą 4% +456.6% 93.04 ą 11% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
> 1.16 ą 2% +88.7% 2.20 ą 33% perf-sched.wait_time.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
> 53.96 ą 32% +444.0% 293.53 ą109% perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
> 17.13 ą 2% +511.2% 104.68 ą 14% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
> 15.69 ą 4% +379.5% 75.25 ą 28% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
> 15.70 ą 3% +422.2% 81.97 ą 19% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
> 0.27 ą 80% -99.6% 0.00 ą223% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
> 16.37 +528.6% 102.90 ą 21% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
> 0.44 ą 33% -99.1% 0.00 ą104% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
> 0.02 ą 83% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
> 0.08 ą 83% -95.4% 0.00 ą147% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
> 1.16 ą 2% +134.7% 2.72 ą 19% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
> 49.88 ą 25% +141.0% 120.23 ą 27% perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
> 17.24 +495.7% 102.70 ą 2% perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
> 402.56 ą 15% -52.8% 189.89 ą 14% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 16.96 ą 4% +521.4% 105.39 ą 15% perf-sched.wait_time.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.06 +241.7% 3.61 ą 4% perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
> 1.07 -88.9% 0.12 ą221% perf-sched.wait_time.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
> 0.28 ą 27% +499.0% 1.67 ą 18% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
> 1.21 ą 2% +207.2% 3.71 ą 3% perf-sched.wait_time.max.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
> 13.43 ą 26% +38.8% 18.64 perf-sched.wait_time.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
> 28.45 +517.3% 175.65 ą 14% perf-sched.wait_time.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.79 ą 10% +62.2% 1.28 ą 25% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
> 13.22 ą 2% +317.2% 55.16 ą 35% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
> 834.29 ą 28% -48.5% 429.53 ą 94% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
> 22.48 +628.6% 163.83 ą 16% perf-sched.wait_time.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
> 22.74 ą 18% +398.0% 113.25 ą 16% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
> 7.72 ą 7% +80.6% 13.95 ą 2% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
> 0.74 ą 4% +77.2% 1.31 ą 32% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
> 5.01 +14.1% 5.72 ą 2% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
> 44.98 -19.7 25.32 ą 2% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
> 43.21 -19.6 23.65 ą 3% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
> 43.21 -19.6 23.65 ą 3% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 43.18 -19.5 23.63 ą 3% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 40.30 -17.5 22.75 ą 3% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 41.10 -17.4 23.66 ą 2% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
> 39.55 -17.3 22.24 ą 3% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
> 24.76 ą 2% -8.5 16.23 ą 3% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 8.68 ą 4% -6.5 2.22 ą 6% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
> 7.23 ą 4% -5.8 1.46 ą 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> 7.23 ą 4% -5.8 1.46 ą 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 7.11 ą 4% -5.7 1.39 ą 7% perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 7.09 ą 4% -5.7 1.39 ą 7% perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 6.59 ą 3% -5.1 1.47 ą 7% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
> 6.59 ą 3% -5.1 1.47 ą 7% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> 6.59 ą 3% -5.1 1.47 ą 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> 5.76 ą 2% -5.0 0.80 ą 9% perf-profile.calltrace.cycles-pp.start_thread
> 7.43 ą 2% -4.9 2.52 ą 7% perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 5.51 ą 3% -4.8 0.70 ą 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.start_thread
> 5.50 ą 3% -4.8 0.70 ą 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
> 5.48 ą 3% -4.8 0.69 ą 7% perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
> 5.42 ą 3% -4.7 0.69 ą 7% perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
> 5.90 ą 5% -3.9 2.01 ą 4% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
> 4.18 ą 5% -3.8 0.37 ą 71% perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 5.76 ą 5% -3.8 1.98 ą 4% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
> 5.04 ą 7% -3.7 1.32 ą 9% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__clone
> 5.03 ą 7% -3.7 1.32 ą 9% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
> 5.02 ą 7% -3.7 1.32 ą 9% perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
> 5.02 ą 7% -3.7 1.32 ą 9% perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
> 5.62 ą 5% -3.7 1.96 ą 3% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
> 4.03 ą 4% -3.1 0.92 ą 7% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 6.03 ą 5% -3.1 2.94 ą 3% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
> 3.43 ą 5% -2.8 0.67 ą 13% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
> 3.43 ą 5% -2.8 0.67 ą 13% perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
> 3.41 ą 5% -2.7 0.66 ą 13% perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
> 3.40 ą 5% -2.7 0.66 ą 13% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
> 3.67 ą 7% -2.7 0.94 ą 10% perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.92 ą 7% -2.4 0.50 ą 46% perf-profile.calltrace.cycles-pp.stress_pthread
> 2.54 ą 6% -2.2 0.38 ą 70% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 2.46 ą 6% -1.8 0.63 ą 10% perf-profile.calltrace.cycles-pp.dup_task_struct.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
> 3.00 ą 6% -1.6 1.43 ą 7% perf-profile.calltrace.cycles-pp.__munmap
> 2.96 ą 6% -1.5 1.42 ą 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
> 2.96 ą 6% -1.5 1.42 ą 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
> 2.95 ą 6% -1.5 1.41 ą 7% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
> 2.95 ą 6% -1.5 1.41 ą 7% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
> 2.02 ą 4% -1.5 0.52 ą 46% perf-profile.calltrace.cycles-pp.__lll_lock_wait
> 1.78 ą 3% -1.5 0.30 ą100% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
> 1.77 ą 3% -1.5 0.30 ą100% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
> 1.54 ą 6% -1.3 0.26 ą100% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 2.54 ą 6% -1.2 1.38 ą 6% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 2.51 ą 6% -1.1 1.37 ą 7% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
> 1.13 -0.7 0.40 ą 70% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.15 ą 5% -0.7 0.46 ą 45% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
> 1.58 ą 5% -0.6 0.94 ą 7% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
> 0.99 ą 5% -0.5 0.51 ą 45% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
> 1.01 ą 5% -0.5 0.54 ą 45% perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
> 0.82 ą 4% -0.2 0.59 ą 5% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
> 0.00 +0.5 0.54 ą 5% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
> 0.00 +0.6 0.60 ą 5% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
> 0.00 +0.6 0.61 ą 6% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
> 0.00 +0.6 0.62 ą 6% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
> 0.53 ą 5% +0.6 1.17 ą 13% perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
> 1.94 ą 2% +0.7 2.64 ą 9% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
> 0.00 +0.7 0.73 ą 5% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
> 0.00 +0.8 0.75 ą 20% perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
> 2.02 ą 2% +0.8 2.85 ą 9% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 0.74 ą 5% +0.8 1.57 ą 11% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
> 0.00 +0.9 0.90 ą 4% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
> 0.00 +0.9 0.92 ą 13% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues
> 0.86 ą 4% +1.0 1.82 ą 10% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
> 0.86 ą 4% +1.0 1.83 ą 10% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
> 0.00 +1.0 0.98 ą 7% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked
> 0.09 ą223% +1.0 1.07 ą 11% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt
> 0.00 +1.0 0.99 ą 6% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd
> 0.00 +1.0 1.00 ą 7% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range
> 0.09 ą223% +1.0 1.10 ą 12% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
> 0.00 +1.0 1.01 ą 6% perf-profile.calltrace.cycles-pp.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
> 0.00 +1.1 1.10 ą 5% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath
> 0.00 +1.1 1.12 ą 5% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock
> 0.00 +1.2 1.23 ą 4% perf-profile.calltrace.cycles-pp.page_add_anon_rmap.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
> 0.00 +1.3 1.32 ą 4% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd
> 0.00 +1.4 1.38 ą 5% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
> 0.00 +2.4 2.44 ą 10% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range
> 0.00 +3.1 3.10 ą 5% perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
> 0.00 +3.5 3.52 ą 5% perf-profile.calltrace.cycles-pp.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
> 0.88 ą 4% +3.8 4.69 ą 4% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
> 6.30 ą 6% +13.5 19.85 ą 7% perf-profile.calltrace.cycles-pp.__clone
> 0.00 +16.7 16.69 ą 7% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
> 1.19 ą 29% +17.1 18.32 ą 7% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> 0.00 +17.6 17.56 ą 7% perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
> 0.63 ą 7% +17.7 18.35 ą 7% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__clone
> 0.59 ą 5% +17.8 18.34 ą 7% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.__clone
> 0.59 ą 5% +17.8 18.34 ą 7% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
> 0.00 +17.9 17.90 ą 7% perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
> 0.36 ą 71% +18.0 18.33 ą 7% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
> 0.00 +32.0 32.03 ą 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range
> 0.00 +32.6 32.62 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
> 0.00 +36.2 36.19 ą 2% perf-profile.calltrace.cycles-pp.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
> 7.97 ą 4% +36.6 44.52 ą 2% perf-profile.calltrace.cycles-pp.__madvise
> 7.91 ą 4% +36.6 44.46 ą 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
> 7.90 ą 4% +36.6 44.46 ą 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> 7.87 ą 4% +36.6 44.44 ą 2% perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> 7.86 ą 4% +36.6 44.44 ą 2% perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
> 7.32 ą 4% +36.8 44.07 ą 2% perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 7.25 ą 4% +36.8 44.06 ą 2% perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
> 1.04 ą 4% +40.0 41.08 ą 2% perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
> 1.00 ą 3% +40.1 41.06 ą 2% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
> 44.98 -19.7 25.32 ą 2% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
> 44.98 -19.7 25.32 ą 2% perf-profile.children.cycles-pp.cpu_startup_entry
> 44.96 -19.6 25.31 ą 2% perf-profile.children.cycles-pp.do_idle
> 43.21 -19.6 23.65 ą 3% perf-profile.children.cycles-pp.start_secondary
> 41.98 -17.6 24.40 ą 2% perf-profile.children.cycles-pp.cpuidle_idle_call
> 41.21 -17.3 23.86 ą 2% perf-profile.children.cycles-pp.cpuidle_enter
> 41.20 -17.3 23.86 ą 2% perf-profile.children.cycles-pp.cpuidle_enter_state
> 12.69 ą 3% -10.6 2.12 ą 6% perf-profile.children.cycles-pp.do_exit
> 12.60 ą 3% -10.5 2.08 ą 7% perf-profile.children.cycles-pp.__x64_sys_exit
> 24.76 ą 2% -8.5 16.31 ą 2% perf-profile.children.cycles-pp.intel_idle
> 12.34 ą 2% -8.4 3.90 ą 5% perf-profile.children.cycles-pp.intel_idle_irq
> 6.96 ą 4% -5.4 1.58 ą 7% perf-profile.children.cycles-pp.ret_from_fork_asm
> 6.69 ą 4% -5.2 1.51 ą 7% perf-profile.children.cycles-pp.ret_from_fork
> 6.59 ą 3% -5.1 1.47 ą 7% perf-profile.children.cycles-pp.kthread
> 5.78 ą 2% -5.0 0.80 ą 8% perf-profile.children.cycles-pp.start_thread
> 4.68 ą 4% -4.5 0.22 ą 10% perf-profile.children.cycles-pp._raw_spin_lock_irq
> 5.03 ą 7% -3.7 1.32 ą 9% perf-profile.children.cycles-pp.__do_sys_clone
> 5.02 ą 7% -3.7 1.32 ą 9% perf-profile.children.cycles-pp.kernel_clone
> 4.20 ą 5% -3.7 0.53 ą 9% perf-profile.children.cycles-pp.exit_notify
> 4.67 ą 5% -3.6 1.10 ą 9% perf-profile.children.cycles-pp.rcu_core
> 4.60 ą 4% -3.5 1.06 ą 10% perf-profile.children.cycles-pp.rcu_do_batch
> 4.89 ą 5% -3.4 1.44 ą 11% perf-profile.children.cycles-pp.__do_softirq
> 5.64 ą 3% -3.2 2.39 ą 6% perf-profile.children.cycles-pp.__schedule
> 6.27 ą 5% -3.2 3.03 ą 4% perf-profile.children.cycles-pp.flush_tlb_mm_range
> 4.03 ą 4% -3.1 0.92 ą 7% perf-profile.children.cycles-pp.smpboot_thread_fn
> 6.68 ą 4% -3.1 3.61 ą 3% perf-profile.children.cycles-pp.tlb_finish_mmu
> 6.04 ą 5% -3.1 2.99 ą 4% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
> 6.04 ą 5% -3.0 2.99 ą 4% perf-profile.children.cycles-pp.smp_call_function_many_cond
> 3.77 ą 2% -3.0 0.73 ą 16% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 7.78 -3.0 4.77 ą 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> 3.43 ą 5% -2.8 0.67 ą 13% perf-profile.children.cycles-pp.run_ksoftirqd
> 3.67 ą 7% -2.7 0.94 ą 10% perf-profile.children.cycles-pp.copy_process
> 2.80 ą 6% -2.5 0.34 ą 15% perf-profile.children.cycles-pp.queued_write_lock_slowpath
> 3.41 ą 2% -2.5 0.96 ą 16% perf-profile.children.cycles-pp.do_futex
> 3.06 ą 5% -2.4 0.68 ą 16% perf-profile.children.cycles-pp.free_unref_page_commit
> 3.02 ą 5% -2.4 0.67 ą 16% perf-profile.children.cycles-pp.free_pcppages_bulk
> 2.92 ą 7% -2.3 0.58 ą 14% perf-profile.children.cycles-pp.stress_pthread
> 3.22 ą 3% -2.3 0.90 ą 18% perf-profile.children.cycles-pp.__x64_sys_futex
> 2.52 ą 5% -2.2 0.35 ą 7% perf-profile.children.cycles-pp.release_task
> 2.54 ą 6% -2.0 0.53 ą 10% perf-profile.children.cycles-pp.worker_thread
> 3.12 ą 5% -1.9 1.17 ą 11% perf-profile.children.cycles-pp.free_unref_page
> 2.31 ą 6% -1.9 0.45 ą 11% perf-profile.children.cycles-pp.process_one_work
> 2.47 ą 6% -1.8 0.63 ą 10% perf-profile.children.cycles-pp.dup_task_struct
> 2.19 ą 5% -1.8 0.41 ą 12% perf-profile.children.cycles-pp.delayed_vfree_work
> 2.14 ą 5% -1.7 0.40 ą 11% perf-profile.children.cycles-pp.vfree
> 3.19 ą 2% -1.6 1.58 ą 8% perf-profile.children.cycles-pp.schedule
> 2.06 ą 3% -1.6 0.46 ą 7% perf-profile.children.cycles-pp.__sigtimedwait
> 3.02 ą 6% -1.6 1.44 ą 7% perf-profile.children.cycles-pp.__munmap
> 1.94 ą 4% -1.6 0.39 ą 14% perf-profile.children.cycles-pp.__unfreeze_partials
> 2.95 ą 6% -1.5 1.41 ą 7% perf-profile.children.cycles-pp.__x64_sys_munmap
> 2.95 ą 6% -1.5 1.41 ą 7% perf-profile.children.cycles-pp.__vm_munmap
> 2.14 ą 3% -1.5 0.60 ą 21% perf-profile.children.cycles-pp.futex_wait
> 2.08 ą 4% -1.5 0.60 ą 19% perf-profile.children.cycles-pp.__lll_lock_wait
> 2.04 ą 3% -1.5 0.56 ą 20% perf-profile.children.cycles-pp.__futex_wait
> 1.77 ą 5% -1.5 0.32 ą 10% perf-profile.children.cycles-pp.remove_vm_area
> 1.86 ą 5% -1.4 0.46 ą 10% perf-profile.children.cycles-pp.open64
> 1.74 ą 4% -1.4 0.37 ą 7% perf-profile.children.cycles-pp.__x64_sys_rt_sigtimedwait
> 1.71 ą 4% -1.4 0.36 ą 8% perf-profile.children.cycles-pp.do_sigtimedwait
> 1.79 ą 5% -1.3 0.46 ą 9% perf-profile.children.cycles-pp.__x64_sys_openat
> 1.78 ą 5% -1.3 0.46 ą 8% perf-profile.children.cycles-pp.do_sys_openat2
> 1.61 ą 4% -1.3 0.32 ą 12% perf-profile.children.cycles-pp.poll_idle
> 1.65 ą 9% -1.3 0.37 ą 14% perf-profile.children.cycles-pp.pthread_create@@GLIBC_2.2.5
> 1.56 ą 8% -1.2 0.35 ą 7% perf-profile.children.cycles-pp.alloc_thread_stack_node
> 2.32 ą 3% -1.2 1.13 ą 8% perf-profile.children.cycles-pp.pick_next_task_fair
> 2.59 ą 6% -1.2 1.40 ą 7% perf-profile.children.cycles-pp.do_vmi_munmap
> 1.55 ą 4% -1.2 0.40 ą 19% perf-profile.children.cycles-pp.futex_wait_queue
> 1.37 ą 5% -1.1 0.22 ą 12% perf-profile.children.cycles-pp.find_unlink_vmap_area
> 2.52 ą 6% -1.1 1.38 ą 6% perf-profile.children.cycles-pp.do_vmi_align_munmap
> 1.53 ą 5% -1.1 0.39 ą 8% perf-profile.children.cycles-pp.do_filp_open
> 1.52 ą 5% -1.1 0.39 ą 7% perf-profile.children.cycles-pp.path_openat
> 1.25 ą 3% -1.1 0.14 ą 12% perf-profile.children.cycles-pp.sigpending
> 1.58 ą 5% -1.1 0.50 ą 6% perf-profile.children.cycles-pp.schedule_idle
> 1.29 ą 5% -1.1 0.21 ą 21% perf-profile.children.cycles-pp.__mprotect
> 1.40 ą 8% -1.1 0.32 ą 4% perf-profile.children.cycles-pp.__vmalloc_node_range
> 2.06 ą 3% -1.0 1.02 ą 9% perf-profile.children.cycles-pp.newidle_balance
> 1.04 ą 3% -1.0 0.08 ą 23% perf-profile.children.cycles-pp.__x64_sys_rt_sigpending
> 1.14 ą 6% -1.0 0.18 ą 18% perf-profile.children.cycles-pp.__x64_sys_mprotect
> 1.13 ą 6% -1.0 0.18 ą 17% perf-profile.children.cycles-pp.do_mprotect_pkey
> 1.30 ą 7% -0.9 0.36 ą 10% perf-profile.children.cycles-pp.wake_up_new_task
> 1.14 ą 9% -0.9 0.22 ą 16% perf-profile.children.cycles-pp.do_anonymous_page
> 0.95 ą 3% -0.9 0.04 ą 71% perf-profile.children.cycles-pp.do_sigpending
> 1.24 ą 3% -0.9 0.34 ą 9% perf-profile.children.cycles-pp.futex_wake
> 1.02 ą 6% -0.9 0.14 ą 15% perf-profile.children.cycles-pp.mprotect_fixup
> 1.91 ą 2% -0.9 1.06 ą 9% perf-profile.children.cycles-pp.load_balance
> 1.38 ą 5% -0.8 0.53 ą 6% perf-profile.children.cycles-pp.select_task_rq_fair
> 1.14 ą 4% -0.8 0.31 ą 12% perf-profile.children.cycles-pp.__pthread_mutex_unlock_usercnt
> 2.68 ą 3% -0.8 1.91 ą 6% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
> 1.00 ą 4% -0.7 0.26 ą 10% perf-profile.children.cycles-pp.flush_smp_call_function_queue
> 1.44 ą 3% -0.7 0.73 ą 10% perf-profile.children.cycles-pp.find_busiest_group
> 0.81 ą 6% -0.7 0.10 ą 18% perf-profile.children.cycles-pp.vma_modify
> 1.29 ą 3% -0.7 0.60 ą 8% perf-profile.children.cycles-pp.exit_mm
> 1.40 ą 3% -0.7 0.71 ą 10% perf-profile.children.cycles-pp.update_sd_lb_stats
> 0.78 ą 7% -0.7 0.10 ą 19% perf-profile.children.cycles-pp.__split_vma
> 0.90 ą 8% -0.7 0.22 ą 10% perf-profile.children.cycles-pp.__vmalloc_area_node
> 0.75 ą 4% -0.7 0.10 ą 5% perf-profile.children.cycles-pp.__exit_signal
> 1.49 ą 2% -0.7 0.84 ą 7% perf-profile.children.cycles-pp.try_to_wake_up
> 0.89 ą 7% -0.6 0.24 ą 10% perf-profile.children.cycles-pp.find_idlest_cpu
> 1.59 ą 5% -0.6 0.95 ą 7% perf-profile.children.cycles-pp.unmap_region
> 0.86 ą 3% -0.6 0.22 ą 26% perf-profile.children.cycles-pp.pthread_cond_timedwait@@GLIBC_2.3.2
> 1.59 ą 3% -0.6 0.95 ą 9% perf-profile.children.cycles-pp.irq_exit_rcu
> 1.24 ą 3% -0.6 0.61 ą 10% perf-profile.children.cycles-pp.update_sg_lb_stats
> 0.94 ą 5% -0.6 0.32 ą 11% perf-profile.children.cycles-pp.do_task_dead
> 0.87 ą 3% -0.6 0.25 ą 19% perf-profile.children.cycles-pp.perf_iterate_sb
> 0.82 ą 4% -0.6 0.22 ą 10% perf-profile.children.cycles-pp.sched_ttwu_pending
> 1.14 ą 3% -0.6 0.54 ą 10% perf-profile.children.cycles-pp.activate_task
> 0.84 -0.6 0.25 ą 10% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> 0.81 ą 6% -0.6 0.22 ą 11% perf-profile.children.cycles-pp.find_idlest_group
> 0.75 ą 5% -0.6 0.18 ą 14% perf-profile.children.cycles-pp.step_into
> 0.74 ą 8% -0.6 0.18 ą 14% perf-profile.children.cycles-pp.__alloc_pages_bulk
> 0.74 ą 6% -0.5 0.19 ą 11% perf-profile.children.cycles-pp.update_sg_wakeup_stats
> 0.72 ą 5% -0.5 0.18 ą 15% perf-profile.children.cycles-pp.pick_link
> 1.06 ą 2% -0.5 0.52 ą 9% perf-profile.children.cycles-pp.enqueue_task_fair
> 0.77 ą 6% -0.5 0.23 ą 12% perf-profile.children.cycles-pp.unmap_vmas
> 0.76 ą 2% -0.5 0.22 ą 8% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> 0.94 ą 2% -0.5 0.42 ą 10% perf-profile.children.cycles-pp.dequeue_task_fair
> 0.65 ą 5% -0.5 0.15 ą 18% perf-profile.children.cycles-pp.open_last_lookups
> 1.37 ą 3% -0.5 0.87 ą 4% perf-profile.children.cycles-pp.llist_add_batch
> 0.70 ą 4% -0.5 0.22 ą 19% perf-profile.children.cycles-pp.memcpy_orig
> 0.91 ą 4% -0.5 0.44 ą 7% perf-profile.children.cycles-pp.update_load_avg
> 0.67 -0.5 0.20 ą 8% perf-profile.children.cycles-pp.switch_fpu_return
> 0.88 ą 3% -0.5 0.42 ą 8% perf-profile.children.cycles-pp.enqueue_entity
> 0.91 ą 4% -0.5 0.45 ą 12% perf-profile.children.cycles-pp.ttwu_do_activate
> 0.77 ą 4% -0.5 0.32 ą 10% perf-profile.children.cycles-pp.schedule_hrtimeout_range_clock
> 0.63 ą 5% -0.4 0.20 ą 21% perf-profile.children.cycles-pp.arch_dup_task_struct
> 0.74 ą 3% -0.4 0.32 ą 15% perf-profile.children.cycles-pp.dequeue_entity
> 0.62 ą 5% -0.4 0.21 ą 5% perf-profile.children.cycles-pp.finish_task_switch
> 0.56 -0.4 0.16 ą 7% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
> 0.53 ą 4% -0.4 0.13 ą 9% perf-profile.children.cycles-pp.syscall
> 0.50 ą 9% -0.4 0.11 ą 18% perf-profile.children.cycles-pp.__get_vm_area_node
> 0.51 ą 3% -0.4 0.12 ą 12% perf-profile.children.cycles-pp.__slab_free
> 0.52 ą 2% -0.4 0.14 ą 10% perf-profile.children.cycles-pp.kmem_cache_free
> 0.75 ą 3% -0.4 0.37 ą 9% perf-profile.children.cycles-pp.exit_mm_release
> 0.50 ą 6% -0.4 0.12 ą 21% perf-profile.children.cycles-pp.do_send_specific
> 0.74 ą 3% -0.4 0.37 ą 8% perf-profile.children.cycles-pp.futex_exit_release
> 0.45 ą 10% -0.4 0.09 ą 17% perf-profile.children.cycles-pp.alloc_vmap_area
> 0.47 ą 3% -0.4 0.11 ą 20% perf-profile.children.cycles-pp.tgkill
> 0.68 ą 11% -0.4 0.32 ą 12% perf-profile.children.cycles-pp.__mmap
> 0.48 ą 3% -0.4 0.13 ą 6% perf-profile.children.cycles-pp.entry_SYSCALL_64
> 0.76 ą 5% -0.3 0.41 ą 10% perf-profile.children.cycles-pp.wake_up_q
> 0.42 ą 7% -0.3 0.08 ą 22% perf-profile.children.cycles-pp.__close
> 0.49 ą 7% -0.3 0.14 ą 25% perf-profile.children.cycles-pp.kmem_cache_alloc
> 0.49 ą 9% -0.3 0.15 ą 14% perf-profile.children.cycles-pp.mas_store_gfp
> 0.46 ą 4% -0.3 0.12 ą 23% perf-profile.children.cycles-pp.perf_event_task_output
> 0.44 ą 10% -0.3 0.10 ą 28% perf-profile.children.cycles-pp.pthread_sigqueue
> 0.46 ą 4% -0.3 0.12 ą 15% perf-profile.children.cycles-pp.link_path_walk
> 0.42 ą 8% -0.3 0.10 ą 20% perf-profile.children.cycles-pp.proc_ns_get_link
> 0.63 ą 10% -0.3 0.32 ą 12% perf-profile.children.cycles-pp.vm_mmap_pgoff
> 0.45 ą 4% -0.3 0.14 ą 13% perf-profile.children.cycles-pp.sched_move_task
> 0.36 ą 8% -0.3 0.06 ą 49% perf-profile.children.cycles-pp.__x64_sys_close
> 0.46 ą 8% -0.3 0.17 ą 14% perf-profile.children.cycles-pp.prctl
> 0.65 ą 3% -0.3 0.35 ą 7% perf-profile.children.cycles-pp.futex_cleanup
> 0.42 ą 7% -0.3 0.12 ą 15% perf-profile.children.cycles-pp.mas_store_prealloc
> 0.49 ą 5% -0.3 0.20 ą 13% perf-profile.children.cycles-pp.__rmqueue_pcplist
> 0.37 ą 7% -0.3 0.08 ą 16% perf-profile.children.cycles-pp.do_tkill
> 0.36 ą 10% -0.3 0.08 ą 20% perf-profile.children.cycles-pp.ns_get_path
> 0.37 ą 4% -0.3 0.09 ą 18% perf-profile.children.cycles-pp.setns
> 0.67 ą 3% -0.3 0.41 ą 8% perf-profile.children.cycles-pp.hrtimer_wakeup
> 0.35 ą 5% -0.3 0.10 ą 16% perf-profile.children.cycles-pp.__task_pid_nr_ns
> 0.41 ą 5% -0.3 0.16 ą 12% perf-profile.children.cycles-pp.mas_wr_bnode
> 0.35 ą 4% -0.3 0.10 ą 20% perf-profile.children.cycles-pp.rcu_cblist_dequeue
> 0.37 ą 5% -0.2 0.12 ą 17% perf-profile.children.cycles-pp.exit_task_stack_account
> 0.56 ą 4% -0.2 0.31 ą 12% perf-profile.children.cycles-pp.select_task_rq
> 0.29 ą 6% -0.2 0.05 ą 46% perf-profile.children.cycles-pp.mas_wr_store_entry
> 0.34 ą 4% -0.2 0.10 ą 27% perf-profile.children.cycles-pp.perf_event_task
> 0.39 ą 9% -0.2 0.15 ą 12% perf-profile.children.cycles-pp.__switch_to_asm
> 0.35 ą 5% -0.2 0.11 ą 11% perf-profile.children.cycles-pp.account_kernel_stack
> 0.30 ą 7% -0.2 0.06 ą 48% perf-profile.children.cycles-pp.__ns_get_path
> 0.31 ą 9% -0.2 0.07 ą 17% perf-profile.children.cycles-pp.free_vmap_area_noflush
> 0.31 ą 5% -0.2 0.08 ą 19% perf-profile.children.cycles-pp.__do_sys_setns
> 0.33 ą 7% -0.2 0.10 ą 7% perf-profile.children.cycles-pp.__free_one_page
> 0.31 ą 11% -0.2 0.08 ą 13% perf-profile.children.cycles-pp.__pte_alloc
> 0.36 ą 6% -0.2 0.13 ą 12% perf-profile.children.cycles-pp.switch_mm_irqs_off
> 0.27 ą 12% -0.2 0.05 ą 71% perf-profile.children.cycles-pp.__fput
> 0.53 ą 9% -0.2 0.31 ą 12% perf-profile.children.cycles-pp.do_mmap
> 0.27 ą 12% -0.2 0.05 ą 77% perf-profile.children.cycles-pp.__x64_sys_rt_tgsigqueueinfo
> 0.28 ą 5% -0.2 0.06 ą 50% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.34 ą 10% -0.2 0.12 ą 29% perf-profile.children.cycles-pp.futex_wait_setup
> 0.27 ą 6% -0.2 0.06 ą 45% perf-profile.children.cycles-pp.__x64_sys_tgkill
> 0.31 ą 7% -0.2 0.11 ą 18% perf-profile.children.cycles-pp.__switch_to
> 0.26 ą 8% -0.2 0.06 ą 21% perf-profile.children.cycles-pp.__call_rcu_common
> 0.33 ą 9% -0.2 0.13 ą 18% perf-profile.children.cycles-pp.__do_sys_prctl
> 0.28 ą 5% -0.2 0.08 ą 17% perf-profile.children.cycles-pp.mm_release
> 0.52 ą 2% -0.2 0.32 ą 9% perf-profile.children.cycles-pp.__get_user_8
> 0.24 ą 10% -0.2 0.04 ą 72% perf-profile.children.cycles-pp.dput
> 0.25 ą 14% -0.2 0.05 ą 46% perf-profile.children.cycles-pp.perf_event_mmap
> 0.24 ą 7% -0.2 0.06 ą 50% perf-profile.children.cycles-pp.mas_walk
> 0.28 ą 6% -0.2 0.10 ą 24% perf-profile.children.cycles-pp.rmqueue_bulk
> 0.23 ą 15% -0.2 0.05 ą 46% perf-profile.children.cycles-pp.perf_event_mmap_event
> 0.25 ą 15% -0.2 0.08 ą 45% perf-profile.children.cycles-pp.___slab_alloc
> 0.20 ą 14% -0.2 0.03 ą100% perf-profile.children.cycles-pp.lookup_fast
> 0.20 ą 10% -0.2 0.04 ą 75% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
> 0.28 ą 7% -0.2 0.12 ą 24% perf-profile.children.cycles-pp.prepare_task_switch
> 0.22 ą 11% -0.2 0.05 ą 8% perf-profile.children.cycles-pp.ttwu_queue_wakelist
> 0.63 ą 5% -0.2 0.47 ą 12% perf-profile.children.cycles-pp.llist_reverse_order
> 0.25 ą 11% -0.2 0.09 ą 34% perf-profile.children.cycles-pp.futex_q_lock
> 0.21 ą 6% -0.2 0.06 ą 47% perf-profile.children.cycles-pp.kmem_cache_alloc_node
> 0.18 ą 11% -0.2 0.03 ą100% perf-profile.children.cycles-pp.alloc_empty_file
> 0.19 ą 5% -0.2 0.04 ą 71% perf-profile.children.cycles-pp.__put_task_struct
> 0.19 ą 15% -0.2 0.03 ą 70% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
> 0.24 ą 6% -0.2 0.09 ą 20% perf-profile.children.cycles-pp.___perf_sw_event
> 0.18 ą 7% -0.2 0.03 ą100% perf-profile.children.cycles-pp.perf_event_fork
> 0.19 ą 11% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.select_idle_core
> 0.30 ą 11% -0.1 0.15 ą 7% perf-profile.children.cycles-pp.pte_alloc_one
> 0.25 ą 6% -0.1 0.11 ą 10% perf-profile.children.cycles-pp.set_next_entity
> 0.20 ą 10% -0.1 0.06 ą 49% perf-profile.children.cycles-pp.__perf_event_header__init_id
> 0.18 ą 15% -0.1 0.03 ą101% perf-profile.children.cycles-pp.__radix_tree_lookup
> 0.22 ą 11% -0.1 0.08 ą 21% perf-profile.children.cycles-pp.mas_spanning_rebalance
> 0.20 ą 9% -0.1 0.06 ą 9% perf-profile.children.cycles-pp.stress_pthread_func
> 0.18 ą 12% -0.1 0.04 ą 73% perf-profile.children.cycles-pp.__getpid
> 0.16 ą 13% -0.1 0.02 ą 99% perf-profile.children.cycles-pp.walk_component
> 0.28 ą 5% -0.1 0.15 ą 13% perf-profile.children.cycles-pp.update_curr
> 0.25 ą 5% -0.1 0.11 ą 22% perf-profile.children.cycles-pp.balance_fair
> 0.16 ą 9% -0.1 0.03 ą100% perf-profile.children.cycles-pp.futex_wake_mark
> 0.16 ą 12% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.get_futex_key
> 0.17 ą 6% -0.1 0.05 ą 47% perf-profile.children.cycles-pp.memcg_account_kmem
> 0.25 ą 11% -0.1 0.12 ą 11% perf-profile.children.cycles-pp._find_next_bit
> 0.15 ą 13% -0.1 0.02 ą 99% perf-profile.children.cycles-pp.do_open
> 0.20 ą 8% -0.1 0.08 ą 16% perf-profile.children.cycles-pp.mas_rebalance
> 0.17 ą 13% -0.1 0.05 ą 45% perf-profile.children.cycles-pp.__memcg_kmem_charge_page
> 0.33 ą 6% -0.1 0.21 ą 10% perf-profile.children.cycles-pp.select_idle_sibling
> 0.14 ą 11% -0.1 0.03 ą100% perf-profile.children.cycles-pp.get_user_pages_fast
> 0.18 ą 7% -0.1 0.07 ą 14% perf-profile.children.cycles-pp.mas_alloc_nodes
> 0.14 ą 11% -0.1 0.03 ą101% perf-profile.children.cycles-pp.set_task_cpu
> 0.14 ą 12% -0.1 0.03 ą101% perf-profile.children.cycles-pp.vm_unmapped_area
> 0.38 ą 6% -0.1 0.27 ą 7% perf-profile.children.cycles-pp.native_sched_clock
> 0.16 ą 10% -0.1 0.05 ą 47% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
> 0.36 ą 9% -0.1 0.25 ą 12% perf-profile.children.cycles-pp.mmap_region
> 0.23 ą 7% -0.1 0.12 ą 9% perf-profile.children.cycles-pp.available_idle_cpu
> 0.13 ą 11% -0.1 0.02 ą 99% perf-profile.children.cycles-pp.internal_get_user_pages_fast
> 0.16 ą 10% -0.1 0.06 ą 18% perf-profile.children.cycles-pp.get_unmapped_area
> 0.50 ą 7% -0.1 0.40 ą 6% perf-profile.children.cycles-pp.menu_select
> 0.24 ą 9% -0.1 0.14 ą 13% perf-profile.children.cycles-pp.rmqueue
> 0.17 ą 14% -0.1 0.07 ą 26% perf-profile.children.cycles-pp.perf_event_comm
> 0.17 ą 15% -0.1 0.07 ą 23% perf-profile.children.cycles-pp.perf_event_comm_event
> 0.17 ą 11% -0.1 0.07 ą 14% perf-profile.children.cycles-pp.pick_next_entity
> 0.13 ą 14% -0.1 0.03 ą102% perf-profile.children.cycles-pp.perf_output_begin
> 0.23 ą 6% -0.1 0.13 ą 21% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> 0.14 ą 18% -0.1 0.04 ą 72% perf-profile.children.cycles-pp.perf_event_comm_output
> 0.21 ą 9% -0.1 0.12 ą 9% perf-profile.children.cycles-pp.update_rq_clock
> 0.16 ą 8% -0.1 0.06 ą 19% perf-profile.children.cycles-pp.mas_split
> 0.13 ą 14% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
> 0.13 ą 6% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.syscall_return_via_sysret
> 0.13 ą 7% -0.1 0.04 ą 72% perf-profile.children.cycles-pp.mas_topiary_replace
> 0.14 ą 8% -0.1 0.06 ą 9% perf-profile.children.cycles-pp.mas_preallocate
> 0.16 ą 11% -0.1 0.07 ą 18% perf-profile.children.cycles-pp.__pick_eevdf
> 0.11 ą 14% -0.1 0.02 ą 99% perf-profile.children.cycles-pp.mas_empty_area_rev
> 0.25 ą 7% -0.1 0.17 ą 10% perf-profile.children.cycles-pp.select_idle_cpu
> 0.14 ą 12% -0.1 0.06 ą 14% perf-profile.children.cycles-pp.cpu_stopper_thread
> 0.14 ą 10% -0.1 0.06 ą 13% perf-profile.children.cycles-pp.active_load_balance_cpu_stop
> 0.14 ą 14% -0.1 0.06 ą 11% perf-profile.children.cycles-pp.os_xsave
> 0.18 ą 6% -0.1 0.11 ą 14% perf-profile.children.cycles-pp.idle_cpu
> 0.17 ą 4% -0.1 0.10 ą 15% perf-profile.children.cycles-pp.hrtimer_start_range_ns
> 0.11 ą 14% -0.1 0.03 ą100% perf-profile.children.cycles-pp.__pthread_mutex_lock
> 0.32 ą 5% -0.1 0.25 ą 5% perf-profile.children.cycles-pp.sched_clock
> 0.11 ą 6% -0.1 0.03 ą 70% perf-profile.children.cycles-pp.wakeup_preempt
> 0.23 ą 7% -0.1 0.16 ą 13% perf-profile.children.cycles-pp.update_rq_clock_task
> 0.13 ą 8% -0.1 0.06 ą 16% perf-profile.children.cycles-pp.local_clock_noinstr
> 0.11 ą 10% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
> 0.34 ą 4% -0.1 0.27 ą 6% perf-profile.children.cycles-pp.sched_clock_cpu
> 0.11 ą 9% -0.1 0.04 ą 76% perf-profile.children.cycles-pp.avg_vruntime
> 0.15 ą 8% -0.1 0.08 ą 14% perf-profile.children.cycles-pp.update_cfs_group
> 0.10 ą 8% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
> 0.13 ą 8% -0.1 0.06 ą 11% perf-profile.children.cycles-pp.sched_use_asym_prio
> 0.09 ą 12% -0.1 0.02 ą 99% perf-profile.children.cycles-pp.getname_flags
> 0.18 ą 9% -0.1 0.12 ą 12% perf-profile.children.cycles-pp.__update_load_avg_se
> 0.11 ą 8% -0.1 0.05 ą 46% perf-profile.children.cycles-pp.place_entity
> 0.08 ą 12% -0.0 0.02 ą 99% perf-profile.children.cycles-pp.folio_add_lru_vma
> 0.10 ą 7% -0.0 0.05 ą 46% perf-profile.children.cycles-pp._find_next_and_bit
> 0.10 ą 6% -0.0 0.06 ą 24% perf-profile.children.cycles-pp.reweight_entity
> 0.03 ą 70% +0.0 0.08 ą 14% perf-profile.children.cycles-pp.perf_rotate_context
> 0.19 ą 10% +0.1 0.25 ą 7% perf-profile.children.cycles-pp.irqtime_account_irq
> 0.08 ą 11% +0.1 0.14 ą 21% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
> 0.00 +0.1 0.06 ą 14% perf-profile.children.cycles-pp.rcu_pending
> 0.10 ą 17% +0.1 0.16 ą 13% perf-profile.children.cycles-pp.rebalance_domains
> 0.14 ą 16% +0.1 0.21 ą 12% perf-profile.children.cycles-pp.downgrade_write
> 0.14 ą 14% +0.1 0.21 ą 10% perf-profile.children.cycles-pp.down_read_killable
> 0.00 +0.1 0.07 ą 11% perf-profile.children.cycles-pp.free_tail_page_prepare
> 0.02 ą141% +0.1 0.09 ą 20% perf-profile.children.cycles-pp.rcu_sched_clock_irq
> 0.01 ą223% +0.1 0.08 ą 25% perf-profile.children.cycles-pp.arch_scale_freq_tick
> 0.55 ą 9% +0.1 0.62 ą 9% perf-profile.children.cycles-pp.__alloc_pages
> 0.34 ą 5% +0.1 0.41 ą 9% perf-profile.children.cycles-pp.clock_nanosleep
> 0.00 +0.1 0.08 ą 23% perf-profile.children.cycles-pp.tick_nohz_next_event
> 0.70 ą 2% +0.1 0.78 ą 5% perf-profile.children.cycles-pp.flush_tlb_func
> 0.14 ą 10% +0.1 0.23 ą 13% perf-profile.children.cycles-pp.__intel_pmu_enable_all
> 0.07 ą 19% +0.1 0.17 ą 17% perf-profile.children.cycles-pp.cgroup_rstat_updated
> 0.04 ą 71% +0.1 0.14 ą 11% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
> 0.25 ą 9% +0.1 0.38 ą 11% perf-profile.children.cycles-pp.down_read
> 0.43 ą 9% +0.1 0.56 ą 10% perf-profile.children.cycles-pp.get_page_from_freelist
> 0.00 +0.1 0.15 ą 6% perf-profile.children.cycles-pp.vm_normal_page
> 0.31 ą 7% +0.2 0.46 ą 9% perf-profile.children.cycles-pp.native_flush_tlb_local
> 0.00 +0.2 0.16 ą 8% perf-profile.children.cycles-pp.__tlb_remove_page_size
> 0.28 ą 11% +0.2 0.46 ą 13% perf-profile.children.cycles-pp.vma_alloc_folio
> 0.00 +0.2 0.24 ą 5% perf-profile.children.cycles-pp._compound_head
> 0.07 ą 16% +0.2 0.31 ą 6% perf-profile.children.cycles-pp.__mod_node_page_state
> 0.38 ą 5% +0.2 0.62 ą 7% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
> 0.22 ą 12% +0.2 0.47 ą 10% perf-profile.children.cycles-pp.schedule_preempt_disabled
> 0.38 ą 5% +0.3 0.64 ą 7% perf-profile.children.cycles-pp.perf_event_task_tick
> 0.00 +0.3 0.27 ą 5% perf-profile.children.cycles-pp.free_swap_cache
> 0.30 ą 10% +0.3 0.58 ą 10% perf-profile.children.cycles-pp.rwsem_down_read_slowpath
> 0.00 +0.3 0.30 ą 4% perf-profile.children.cycles-pp.free_pages_and_swap_cache
> 0.09 ą 10% +0.3 0.42 ą 7% perf-profile.children.cycles-pp.__mod_lruvec_state
> 0.00 +0.3 0.34 ą 9% perf-profile.children.cycles-pp.deferred_split_folio
> 0.00 +0.4 0.36 ą 13% perf-profile.children.cycles-pp.prep_compound_page
> 0.09 ą 10% +0.4 0.50 ą 9% perf-profile.children.cycles-pp.free_unref_page_prepare
> 0.00 +0.4 0.42 ą 11% perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page
> 1.67 ą 3% +0.4 2.12 ą 8% perf-profile.children.cycles-pp.__hrtimer_run_queues
> 0.63 ą 3% +0.5 1.11 ą 12% perf-profile.children.cycles-pp.scheduler_tick
> 1.93 ą 3% +0.5 2.46 ą 8% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> 1.92 ą 3% +0.5 2.45 ą 8% perf-profile.children.cycles-pp.hrtimer_interrupt
> 0.73 ą 3% +0.6 1.31 ą 11% perf-profile.children.cycles-pp.update_process_times
> 0.74 ą 3% +0.6 1.34 ą 11% perf-profile.children.cycles-pp.tick_sched_handle
> 0.20 ą 8% +0.6 0.83 ą 18% perf-profile.children.cycles-pp.__cond_resched
> 0.78 ą 4% +0.6 1.43 ą 12% perf-profile.children.cycles-pp.tick_nohz_highres_handler
> 0.12 ą 7% +0.7 0.81 ą 5% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
> 0.28 ą 7% +0.9 1.23 ą 4% perf-profile.children.cycles-pp.release_pages
> 0.00 +1.0 1.01 ą 6% perf-profile.children.cycles-pp.pmdp_invalidate
> 0.35 ą 6% +1.2 1.56 ą 5% perf-profile.children.cycles-pp.__mod_lruvec_page_state
> 0.30 ą 8% +1.2 1.53 ą 4% perf-profile.children.cycles-pp.tlb_batch_pages_flush
> 0.00 +1.3 1.26 ą 4% perf-profile.children.cycles-pp.page_add_anon_rmap
> 0.09 ą 11% +3.1 3.20 ą 5% perf-profile.children.cycles-pp.page_remove_rmap
> 1.60 ą 2% +3.4 5.04 ą 4% perf-profile.children.cycles-pp.zap_pte_range
> 0.03 ą100% +3.5 3.55 ą 5% perf-profile.children.cycles-pp.__split_huge_pmd_locked
> 41.36 +11.6 52.92 ą 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 41.22 +11.7 52.88 ą 2% perf-profile.children.cycles-pp.do_syscall_64
> 6.42 ą 6% +13.5 19.88 ą 7% perf-profile.children.cycles-pp.__clone
> 0.82 ą 6% +16.2 16.98 ą 7% perf-profile.children.cycles-pp.clear_page_erms
> 2.62 ą 5% +16.4 19.04 ą 7% perf-profile.children.cycles-pp.asm_exc_page_fault
> 2.18 ą 5% +16.8 18.94 ą 7% perf-profile.children.cycles-pp.exc_page_fault
> 2.06 ą 6% +16.8 18.90 ą 7% perf-profile.children.cycles-pp.do_user_addr_fault
> 1.60 ą 8% +17.0 18.60 ą 7% perf-profile.children.cycles-pp.handle_mm_fault
> 1.52 ą 7% +17.1 18.58 ą 7% perf-profile.children.cycles-pp.__handle_mm_fault
> 0.30 ą 7% +17.4 17.72 ą 7% perf-profile.children.cycles-pp.clear_huge_page
> 0.31 ą 8% +17.6 17.90 ą 7% perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
> 11.66 ą 3% +22.2 33.89 ą 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 3.29 ą 3% +30.2 33.46 perf-profile.children.cycles-pp._raw_spin_lock
> 0.04 ą 71% +36.2 36.21 ą 2% perf-profile.children.cycles-pp.__split_huge_pmd
> 8.00 ą 4% +36.5 44.54 ą 2% perf-profile.children.cycles-pp.__madvise
> 7.87 ą 4% +36.6 44.44 ą 2% perf-profile.children.cycles-pp.__x64_sys_madvise
> 7.86 ą 4% +36.6 44.44 ą 2% perf-profile.children.cycles-pp.do_madvise
> 7.32 ą 4% +36.8 44.07 ą 2% perf-profile.children.cycles-pp.madvise_vma_behavior
> 7.26 ą 4% +36.8 44.06 ą 2% perf-profile.children.cycles-pp.zap_page_range_single
> 1.78 +39.5 41.30 ą 2% perf-profile.children.cycles-pp.unmap_page_range
> 1.72 +39.6 41.28 ą 2% perf-profile.children.cycles-pp.zap_pmd_range
> 24.76 ą 2% -8.5 16.31 ą 2% perf-profile.self.cycles-pp.intel_idle
> 11.46 ą 2% -7.8 3.65 ą 5% perf-profile.self.cycles-pp.intel_idle_irq
> 3.16 ą 7% -2.1 1.04 ą 6% perf-profile.self.cycles-pp.smp_call_function_many_cond
> 1.49 ą 4% -1.2 0.30 ą 12% perf-profile.self.cycles-pp.poll_idle
> 1.15 ą 3% -0.6 0.50 ą 9% perf-profile.self.cycles-pp._raw_spin_lock
> 0.60 ą 6% -0.6 0.03 ą100% perf-profile.self.cycles-pp.queued_write_lock_slowpath
> 0.69 ą 4% -0.5 0.22 ą 20% perf-profile.self.cycles-pp.memcpy_orig
> 0.66 ą 7% -0.5 0.18 ą 11% perf-profile.self.cycles-pp.update_sg_wakeup_stats
> 0.59 ą 4% -0.5 0.13 ą 8% perf-profile.self.cycles-pp._raw_spin_lock_irq
> 0.86 ą 3% -0.4 0.43 ą 12% perf-profile.self.cycles-pp.update_sg_lb_stats
> 0.56 -0.4 0.16 ą 7% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
> 0.48 ą 3% -0.4 0.12 ą 10% perf-profile.self.cycles-pp.__slab_free
> 1.18 ą 2% -0.4 0.82 ą 3% perf-profile.self.cycles-pp.llist_add_batch
> 0.54 ą 5% -0.3 0.19 ą 6% perf-profile.self.cycles-pp.__schedule
> 0.47 ą 7% -0.3 0.18 ą 13% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> 0.34 ą 5% -0.2 0.09 ą 18% perf-profile.self.cycles-pp.kmem_cache_free
> 0.43 ą 4% -0.2 0.18 ą 11% perf-profile.self.cycles-pp.update_load_avg
> 0.35 ą 4% -0.2 0.10 ą 23% perf-profile.self.cycles-pp.rcu_cblist_dequeue
> 0.38 ą 9% -0.2 0.15 ą 10% perf-profile.self.cycles-pp.__switch_to_asm
> 0.33 ą 5% -0.2 0.10 ą 16% perf-profile.self.cycles-pp.__task_pid_nr_ns
> 0.36 ą 6% -0.2 0.13 ą 14% perf-profile.self.cycles-pp.switch_mm_irqs_off
> 0.31 ą 6% -0.2 0.09 ą 6% perf-profile.self.cycles-pp.__free_one_page
> 0.28 ą 5% -0.2 0.06 ą 50% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.27 ą 13% -0.2 0.06 ą 23% perf-profile.self.cycles-pp.pthread_create@@GLIBC_2.2.5
> 0.30 ą 7% -0.2 0.10 ą 19% perf-profile.self.cycles-pp.__switch_to
> 0.27 ą 4% -0.2 0.10 ą 17% perf-profile.self.cycles-pp.finish_task_switch
> 0.23 ą 7% -0.2 0.06 ą 50% perf-profile.self.cycles-pp.mas_walk
> 0.22 ą 9% -0.2 0.05 ą 48% perf-profile.self.cycles-pp.__clone
> 0.63 ą 5% -0.2 0.46 ą 12% perf-profile.self.cycles-pp.llist_reverse_order
> 0.20 ą 4% -0.2 0.04 ą 72% perf-profile.self.cycles-pp.entry_SYSCALL_64
> 0.24 ą 10% -0.1 0.09 ą 19% perf-profile.self.cycles-pp.rmqueue_bulk
> 0.18 ą 13% -0.1 0.03 ą101% perf-profile.self.cycles-pp.__radix_tree_lookup
> 0.18 ą 11% -0.1 0.04 ą 71% perf-profile.self.cycles-pp.stress_pthread_func
> 0.36 ą 8% -0.1 0.22 ą 11% perf-profile.self.cycles-pp.menu_select
> 0.22 ą 4% -0.1 0.08 ą 19% perf-profile.self.cycles-pp.___perf_sw_event
> 0.20 ą 13% -0.1 0.07 ą 20% perf-profile.self.cycles-pp.start_thread
> 0.16 ą 13% -0.1 0.03 ą101% perf-profile.self.cycles-pp.alloc_vmap_area
> 0.17 ą 10% -0.1 0.04 ą 73% perf-profile.self.cycles-pp.kmem_cache_alloc
> 0.14 ą 9% -0.1 0.03 ą100% perf-profile.self.cycles-pp.futex_wake
> 0.17 ą 4% -0.1 0.06 ą 11% perf-profile.self.cycles-pp.dequeue_task_fair
> 0.23 ą 6% -0.1 0.12 ą 11% perf-profile.self.cycles-pp.available_idle_cpu
> 0.22 ą 13% -0.1 0.11 ą 12% perf-profile.self.cycles-pp._find_next_bit
> 0.21 ą 7% -0.1 0.10 ą 6% perf-profile.self.cycles-pp.__rmqueue_pcplist
> 0.37 ą 7% -0.1 0.26 ą 8% perf-profile.self.cycles-pp.native_sched_clock
> 0.22 ą 7% -0.1 0.12 ą 21% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
> 0.19 ą 7% -0.1 0.10 ą 11% perf-profile.self.cycles-pp.enqueue_entity
> 0.15 ą 5% -0.1 0.06 ą 45% perf-profile.self.cycles-pp.enqueue_task_fair
> 0.15 ą 11% -0.1 0.06 ą 17% perf-profile.self.cycles-pp.__pick_eevdf
> 0.13 ą 13% -0.1 0.05 ą 72% perf-profile.self.cycles-pp.prepare_task_switch
> 0.17 ą 10% -0.1 0.08 ą 8% perf-profile.self.cycles-pp.update_rq_clock_task
> 0.54 ą 4% -0.1 0.46 ą 6% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
> 0.14 ą 14% -0.1 0.06 ą 11% perf-profile.self.cycles-pp.os_xsave
> 0.11 ą 10% -0.1 0.03 ą 70% perf-profile.self.cycles-pp.try_to_wake_up
> 0.10 ą 8% -0.1 0.03 ą100% perf-profile.self.cycles-pp.futex_wait
> 0.14 ą 9% -0.1 0.07 ą 10% perf-profile.self.cycles-pp.update_curr
> 0.18 ą 9% -0.1 0.11 ą 14% perf-profile.self.cycles-pp.idle_cpu
> 0.11 ą 11% -0.1 0.04 ą 76% perf-profile.self.cycles-pp.avg_vruntime
> 0.15 ą 10% -0.1 0.08 ą 14% perf-profile.self.cycles-pp.update_cfs_group
> 0.09 ą 9% -0.1 0.03 ą100% perf-profile.self.cycles-pp.reweight_entity
> 0.12 ą 13% -0.1 0.06 ą 8% perf-profile.self.cycles-pp.do_idle
> 0.18 ą 10% -0.1 0.12 ą 13% perf-profile.self.cycles-pp.__update_load_avg_se
> 0.09 ą 17% -0.1 0.04 ą 71% perf-profile.self.cycles-pp.cpuidle_idle_call
> 0.10 ą 11% -0.0 0.06 ą 45% perf-profile.self.cycles-pp.update_rq_clock
> 0.12 ą 15% -0.0 0.07 ą 16% perf-profile.self.cycles-pp.update_sd_lb_stats
> 0.09 ą 5% -0.0 0.05 ą 46% perf-profile.self.cycles-pp._find_next_and_bit
> 0.01 ą223% +0.1 0.08 ą 25% perf-profile.self.cycles-pp.arch_scale_freq_tick
> 0.78 ą 4% +0.1 0.87 ą 4% perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
> 0.14 ą 10% +0.1 0.23 ą 13% perf-profile.self.cycles-pp.__intel_pmu_enable_all
> 0.06 ą 46% +0.1 0.15 ą 19% perf-profile.self.cycles-pp.cgroup_rstat_updated
> 0.19 ą 3% +0.1 0.29 ą 4% perf-profile.self.cycles-pp.cpuidle_enter_state
> 0.00 +0.1 0.10 ą 11% perf-profile.self.cycles-pp.__mod_lruvec_state
> 0.00 +0.1 0.11 ą 18% perf-profile.self.cycles-pp.__tlb_remove_page_size
> 0.00 +0.1 0.12 ą 9% perf-profile.self.cycles-pp.vm_normal_page
> 0.23 ą 7% +0.1 0.36 ą 8% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
> 0.20 ą 8% +0.2 0.35 ą 7% perf-profile.self.cycles-pp.__mod_lruvec_page_state
> 1.12 ą 2% +0.2 1.28 ą 4% perf-profile.self.cycles-pp.zap_pte_range
> 0.31 ą 8% +0.2 0.46 ą 9% perf-profile.self.cycles-pp.native_flush_tlb_local
> 0.00 +0.2 0.16 ą 5% perf-profile.self.cycles-pp._compound_head
> 0.06 ą 17% +0.2 0.26 ą 4% perf-profile.self.cycles-pp.__mod_node_page_state
> 0.00 +0.2 0.24 ą 6% perf-profile.self.cycles-pp.free_swap_cache
> 0.00 +0.3 0.27 ą 15% perf-profile.self.cycles-pp.clear_huge_page
> 0.00 +0.3 0.27 ą 11% perf-profile.self.cycles-pp.deferred_split_folio
> 0.00 +0.4 0.36 ą 13% perf-profile.self.cycles-pp.prep_compound_page
> 0.05 ą 47% +0.4 0.43 ą 9% perf-profile.self.cycles-pp.free_unref_page_prepare
> 0.08 ą 7% +0.5 0.57 ą 23% perf-profile.self.cycles-pp.__cond_resched
> 0.08 ą 12% +0.5 0.58 ą 5% perf-profile.self.cycles-pp.release_pages
> 0.10 ą 10% +0.5 0.63 ą 6% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
> 0.00 +1.1 1.11 ą 7% perf-profile.self.cycles-pp.__split_huge_pmd_locked
> 0.00 +1.2 1.18 ą 4% perf-profile.self.cycles-pp.page_add_anon_rmap
> 0.03 ą101% +1.3 1.35 ą 7% perf-profile.self.cycles-pp.page_remove_rmap
> 0.82 ą 5% +16.1 16.88 ą 7% perf-profile.self.cycles-pp.clear_page_erms
> 11.65 ą 3% +20.2 31.88 ą 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>
>
> ***************************************************************************************************
> lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> =========================================================================================
> array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
> 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
>
> commit:
> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 10.50 ą 14% +55.6% 16.33 ą 16% perf-c2c.DRAM.local
> 6724 -11.4% 5954 ą 2% vmstat.system.cs
> 2.746e+09 +16.7% 3.205e+09 ą 2% cpuidle..time
> 2771516 +16.0% 3213723 ą 2% cpuidle..usage
> 0.06 ą 4% -0.0 0.05 ą 5% mpstat.cpu.all.soft%
> 0.47 ą 2% -0.1 0.39 ą 2% mpstat.cpu.all.sys%
> 0.01 ą 85% +1700.0% 0.20 ą188% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
> 15.11 ą 13% -28.8% 10.76 ą 34% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 15.09 ą 13% -30.3% 10.51 ą 38% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 1023952 +13.4% 1161219 meminfo.AnonHugePages
> 1319741 +10.8% 1461995 meminfo.AnonPages
> 1331039 +11.2% 1480149 meminfo.Inactive
> 1330865 +11.2% 1479975 meminfo.Inactive(anon)
> 1266202 +16.0% 1469399 ą 2% turbostat.C1E
> 1509871 +16.6% 1760853 ą 2% turbostat.C6
> 3521203 +17.4% 4134075 ą 3% turbostat.IRQ
> 580.32 -3.8% 558.30 turbostat.PkgWatt
> 77.42 -14.0% 66.60 ą 2% turbostat.RAMWatt
> 330416 +10.8% 366020 proc-vmstat.nr_anon_pages
> 500.90 +13.4% 567.99 proc-vmstat.nr_anon_transparent_hugepages
> 333197 +11.2% 370536 proc-vmstat.nr_inactive_anon
> 333197 +11.2% 370536 proc-vmstat.nr_zone_inactive_anon
> 129879 ą 11% -46.7% 69207 ą 12% proc-vmstat.numa_pages_migrated
> 3879028 +5.9% 4109180 proc-vmstat.pgalloc_normal
> 3403414 +6.6% 3628929 proc-vmstat.pgfree
> 129879 ą 11% -46.7% 69207 ą 12% proc-vmstat.pgmigrate_success
> 5763 +9.8% 6327 proc-vmstat.thp_fault_alloc
> 350993 -15.6% 296081 ą 2% stream.add_bandwidth_MBps
> 349830 -16.1% 293492 ą 2% stream.add_bandwidth_MBps_harmonicMean
> 333973 -20.5% 265439 ą 3% stream.copy_bandwidth_MBps
> 332930 -21.7% 260548 ą 3% stream.copy_bandwidth_MBps_harmonicMean
> 302788 -16.2% 253817 ą 2% stream.scale_bandwidth_MBps
> 302157 -17.1% 250577 ą 2% stream.scale_bandwidth_MBps_harmonicMean
> 1177276 +9.3% 1286614 stream.time.maximum_resident_set_size
> 5038 +1.1% 5095 stream.time.percent_of_cpu_this_job_got
> 694.19 ą 2% +19.5% 829.85 ą 2% stream.time.user_time
> 339047 -12.1% 298061 stream.triad_bandwidth_MBps
> 338186 -12.4% 296218 stream.triad_bandwidth_MBps_harmonicMean
> 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi
> 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
> 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
> 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
> 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
> 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode
> 0.84 ą103% +1.7 2.57 ą 59% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 0.84 ą103% +1.7 2.57 ą 59% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
> 0.31 ą223% +2.0 2.33 ą 44% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
> 0.31 ą223% +2.0 2.33 ą 44% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
> 3.07 ą 56% +2.8 5.88 ą 28% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 8.42 ą100% -8.4 0.00 perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
> 8.42 ą100% -8.1 0.36 ą223% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
> 12.32 ą 25% -6.6 5.69 ą 69% perf-profile.children.cycles-pp.vsnprintf
> 12.76 ą 27% -6.6 6.19 ą 67% perf-profile.children.cycles-pp.seq_printf
> 3.07 ą 56% +2.8 5.88 ą 28% perf-profile.children.cycles-pp.__x64_sys_exit_group
> 40.11 -11.0% 35.71 ą 2% perf-stat.i.MPKI
> 1.563e+10 -12.3% 1.371e+10 ą 2% perf-stat.i.branch-instructions
> 3.721e+09 ą 2% -23.2% 2.858e+09 ą 4% perf-stat.i.cache-misses
> 4.471e+09 ą 3% -22.7% 3.458e+09 ą 4% perf-stat.i.cache-references
> 5970 ą 5% -15.9% 5021 ą 4% perf-stat.i.context-switches
> 1.66 ą 2% +15.8% 1.92 ą 2% perf-stat.i.cpi
> 41.83 ą 4% +30.6% 54.63 ą 4% perf-stat.i.cycles-between-cache-misses
> 2.282e+10 ą 2% -14.5% 1.952e+10 ą 2% perf-stat.i.dTLB-loads
> 572602 ą 3% -9.2% 519922 ą 5% perf-stat.i.dTLB-store-misses
> 1.483e+10 ą 2% -15.7% 1.25e+10 ą 2% perf-stat.i.dTLB-stores
> 9.179e+10 -13.7% 7.924e+10 ą 2% perf-stat.i.instructions
> 0.61 -13.4% 0.52 ą 2% perf-stat.i.ipc
> 373.79 ą 4% -37.8% 232.60 ą 9% perf-stat.i.metric.K/sec
> 251.45 -13.4% 217.72 ą 2% perf-stat.i.metric.M/sec
> 21446 ą 3% -24.1% 16278 ą 8% perf-stat.i.minor-faults
> 15.07 ą 5% -6.0 9.10 ą 10% perf-stat.i.node-load-miss-rate%
> 68275790 ą 5% -44.9% 37626128 ą 12% perf-stat.i.node-load-misses
> 21448 ą 3% -24.1% 16281 ą 8% perf-stat.i.page-faults
> 40.71 -11.3% 36.10 ą 2% perf-stat.overall.MPKI
> 1.67 +15.3% 1.93 ą 2% perf-stat.overall.cpi
> 41.07 ą 3% +30.1% 53.42 ą 4% perf-stat.overall.cycles-between-cache-misses
> 0.00 ą 2% +0.0 0.00 ą 2% perf-stat.overall.dTLB-store-miss-rate%
> 0.60 -13.2% 0.52 ą 2% perf-stat.overall.ipc
> 15.19 ą 5% -6.2 9.03 ą 11% perf-stat.overall.node-load-miss-rate%
> 1.4e+10 -9.3% 1.269e+10 perf-stat.ps.branch-instructions
> 3.352e+09 ą 3% -20.9% 2.652e+09 ą 4% perf-stat.ps.cache-misses
> 4.026e+09 ą 3% -20.3% 3.208e+09 ą 4% perf-stat.ps.cache-references
> 4888 ą 4% -10.8% 4362 ą 3% perf-stat.ps.context-switches
> 206092 +2.1% 210375 perf-stat.ps.cpu-clock
> 1.375e+11 +2.8% 1.414e+11 perf-stat.ps.cpu-cycles
> 258.23 ą 5% +8.8% 280.85 ą 4% perf-stat.ps.cpu-migrations
> 2.048e+10 -11.7% 1.809e+10 ą 2% perf-stat.ps.dTLB-loads
> 1.333e+10 ą 2% -13.0% 1.16e+10 ą 2% perf-stat.ps.dTLB-stores
> 8.231e+10 -10.8% 7.342e+10 perf-stat.ps.instructions
> 15755 ą 3% -16.3% 13187 ą 6% perf-stat.ps.minor-faults
> 61706790 ą 6% -43.8% 34699716 ą 11% perf-stat.ps.node-load-misses
> 15757 ą 3% -16.3% 13189 ą 6% perf-stat.ps.page-faults
> 206092 +2.1% 210375 perf-stat.ps.task-clock
> 1.217e+12 +4.1% 1.267e+12 ą 2% perf-stat.total.instructions
>
>
>
> ***************************************************************************************************
> lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> commit:
> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 232.12 ą 7% -12.0% 204.18 ą 8% sched_debug.cfs_rq:/.load_avg.stddev
> 6797 -3.3% 6576 vmstat.system.cs
> 15161 -0.9% 15029 vmstat.system.in
> 349927 +44.3% 504820 meminfo.AnonHugePages
> 507807 +27.1% 645169 meminfo.AnonPages
> 1499332 +10.2% 1652612 meminfo.Inactive(anon)
> 8.67 ą 62% +184.6% 24.67 ą 25% turbostat.C10
> 1.50 -0.1 1.45 turbostat.C1E%
> 3.30 -3.2% 3.20 turbostat.RAMWatt
> 1.40 ą 14% -0.3 1.09 ą 13% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
> 1.44 ą 12% -0.3 1.12 ą 13% perf-profile.children.cycles-pp.asm_exc_page_fault
> 0.03 ą141% +0.1 0.10 ą 30% perf-profile.children.cycles-pp.next_uptodate_folio
> 0.02 ą141% +0.1 0.10 ą 22% perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
> 0.02 ą143% +0.1 0.10 ą 25% perf-profile.self.cycles-pp.next_uptodate_folio
> 0.01 ą223% +0.1 0.09 ą 19% perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
> 19806 -3.5% 19109 phoronix-test-suite.ramspeed.Average.Integer.mb_s
> 283.70 +3.8% 294.50 phoronix-test-suite.time.elapsed_time
> 283.70 +3.8% 294.50 phoronix-test-suite.time.elapsed_time.max
> 120454 +1.6% 122334 phoronix-test-suite.time.maximum_resident_set_size
> 281337 -54.8% 127194 phoronix-test-suite.time.minor_page_faults
> 259.13 +4.1% 269.81 phoronix-test-suite.time.user_time
> 126951 +27.0% 161291 proc-vmstat.nr_anon_pages
> 170.86 +44.3% 246.49 proc-vmstat.nr_anon_transparent_hugepages
> 355917 -1.0% 352250 proc-vmstat.nr_dirty_background_threshold
> 712705 -1.0% 705362 proc-vmstat.nr_dirty_threshold
> 3265201 -1.1% 3228465 proc-vmstat.nr_free_pages
> 374833 +10.2% 413153 proc-vmstat.nr_inactive_anon
> 1767 +4.8% 1853 proc-vmstat.nr_page_table_pages
> 374833 +10.2% 413153 proc-vmstat.nr_zone_inactive_anon
> 854665 -34.3% 561406 proc-vmstat.numa_hit
> 854632 -34.3% 561397 proc-vmstat.numa_local
> 5548755 +1.1% 5610598 proc-vmstat.pgalloc_normal
> 1083315 -26.2% 799129 proc-vmstat.pgfault
> 113425 +3.7% 117656 proc-vmstat.pgreuse
> 9025 +7.6% 9714 proc-vmstat.thp_fault_alloc
> 3.38 +0.1 3.45 perf-stat.i.branch-miss-rate%
> 4.135e+08 -3.2% 4.003e+08 perf-stat.i.cache-misses
> 5.341e+08 -2.7% 5.197e+08 perf-stat.i.cache-references
> 6832 -3.4% 6600 perf-stat.i.context-switches
> 4.06 +3.1% 4.19 perf-stat.i.cpi
> 438639 ą 5% -18.7% 356730 ą 6% perf-stat.i.dTLB-load-misses
> 1.119e+09 -3.8% 1.077e+09 perf-stat.i.dTLB-loads
> 0.02 ą 15% -0.0 0.01 ą 26% perf-stat.i.dTLB-store-miss-rate%
> 80407 ą 10% -63.5% 29387 ą 23% perf-stat.i.dTLB-store-misses
> 7.319e+08 -3.8% 7.043e+08 perf-stat.i.dTLB-stores
> 57.72 +0.8 58.52 perf-stat.i.iTLB-load-miss-rate%
> 129846 -3.8% 124973 perf-stat.i.iTLB-load-misses
> 144448 -5.3% 136837 perf-stat.i.iTLB-loads
> 2.389e+09 -3.5% 2.305e+09 perf-stat.i.instructions
> 0.28 -2.9% 0.27 perf-stat.i.ipc
> 220.59 -3.4% 213.11 perf-stat.i.metric.M/sec
> 3610 -31.2% 2483 perf-stat.i.minor-faults
> 49238342 +1.1% 49776834 perf-stat.i.node-loads
> 98106028 -3.1% 95018390 perf-stat.i.node-stores
> 3615 -31.2% 2487 perf-stat.i.page-faults
> 3.65 +3.7% 3.78 perf-stat.overall.cpi
> 21.08 +3.3% 21.79 perf-stat.overall.cycles-between-cache-misses
> 0.04 ą 5% -0.0 0.03 ą 6% perf-stat.overall.dTLB-load-miss-rate%
> 0.01 ą 10% -0.0 0.00 ą 23% perf-stat.overall.dTLB-store-miss-rate%
> 0.27 -3.6% 0.26 perf-stat.overall.ipc
> 4.122e+08 -3.2% 3.99e+08 perf-stat.ps.cache-misses
> 5.324e+08 -2.7% 5.181e+08 perf-stat.ps.cache-references
> 6809 -3.4% 6580 perf-stat.ps.context-switches
> 437062 ą 5% -18.7% 355481 ą 6% perf-stat.ps.dTLB-load-misses
> 1.115e+09 -3.8% 1.073e+09 perf-stat.ps.dTLB-loads
> 80134 ą 10% -63.5% 29283 ą 23% perf-stat.ps.dTLB-store-misses
> 7.295e+08 -3.8% 7.021e+08 perf-stat.ps.dTLB-stores
> 129362 -3.7% 124535 perf-stat.ps.iTLB-load-misses
> 143865 -5.2% 136338 perf-stat.ps.iTLB-loads
> 2.381e+09 -3.5% 2.297e+09 perf-stat.ps.instructions
> 3596 -31.2% 2473 perf-stat.ps.minor-faults
> 49081949 +1.1% 49621463 perf-stat.ps.node-loads
> 97795918 -3.1% 94724831 perf-stat.ps.node-stores
> 3600 -31.2% 2477 perf-stat.ps.page-faults
>
>
>
> ***************************************************************************************************
> lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> commit:
> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 167.28 ą 5% -13.1% 145.32 ą 6% sched_debug.cfs_rq:/.util_est_enqueued.avg
> 6845 -2.5% 6674 vmstat.system.cs
> 351910 ą 2% +40.2% 493341 meminfo.AnonHugePages
> 505908 +27.2% 643328 meminfo.AnonPages
> 1497656 +10.2% 1650453 meminfo.Inactive(anon)
> 18957 ą 13% +26.3% 23947 ą 17% turbostat.C1
> 1.52 -0.0 1.48 turbostat.C1E%
> 3.32 -2.9% 3.23 turbostat.RAMWatt
> 19978 -3.0% 19379 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
> 280.71 +3.3% 289.93 phoronix-test-suite.time.elapsed_time
> 280.71 +3.3% 289.93 phoronix-test-suite.time.elapsed_time.max
> 120465 +1.5% 122257 phoronix-test-suite.time.maximum_resident_set_size
> 281047 -54.7% 127190 phoronix-test-suite.time.minor_page_faults
> 257.03 +3.5% 265.95 phoronix-test-suite.time.user_time
> 126473 +27.2% 160831 proc-vmstat.nr_anon_pages
> 171.83 ą 2% +40.2% 240.89 proc-vmstat.nr_anon_transparent_hugepages
> 355973 -1.0% 352304 proc-vmstat.nr_dirty_background_threshold
> 712818 -1.0% 705471 proc-vmstat.nr_dirty_threshold
> 3265800 -1.1% 3228879 proc-vmstat.nr_free_pages
> 374410 +10.2% 412613 proc-vmstat.nr_inactive_anon
> 1770 +4.4% 1848 proc-vmstat.nr_page_table_pages
> 374410 +10.2% 412613 proc-vmstat.nr_zone_inactive_anon
> 852082 -34.9% 555093 proc-vmstat.numa_hit
> 852125 -34.9% 555018 proc-vmstat.numa_local
> 1078293 -26.6% 791038 proc-vmstat.pgfault
> 112693 +2.9% 116004 proc-vmstat.pgreuse
> 9025 +7.6% 9713 proc-vmstat.thp_fault_alloc
> 3.63 ą 6% +0.6 4.25 ą 9% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
> 0.25 ą 55% -0.2 0.08 ą 68% perf-profile.children.cycles-pp.ret_from_fork_asm
> 0.25 ą 55% -0.2 0.08 ą 68% perf-profile.children.cycles-pp.ret_from_fork
> 0.23 ą 56% -0.2 0.07 ą 69% perf-profile.children.cycles-pp.kthread
> 0.14 ą 36% -0.1 0.05 ą120% perf-profile.children.cycles-pp.do_anonymous_page
> 0.14 ą 35% -0.1 0.05 ą 76% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string
> 0.04 ą 72% +0.0 0.08 ą 19% perf-profile.children.cycles-pp.try_to_wake_up
> 0.04 ą118% +0.1 0.10 ą 36% perf-profile.children.cycles-pp.update_rq_clock
> 0.07 ą 79% +0.1 0.17 ą 21% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 7.99 ą 11% +1.0 9.02 ą 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> 0.23 ą 28% -0.1 0.14 ą 49% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
> 0.14 ą 35% -0.1 0.05 ą 76% perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string
> 0.06 ą 79% +0.1 0.16 ą 21% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> 0.21 ą 34% +0.2 0.36 ą 18% perf-profile.self.cycles-pp.ktime_get
> 1.187e+08 -4.6% 1.133e+08 perf-stat.i.branch-instructions
> 3.36 +0.1 3.42 perf-stat.i.branch-miss-rate%
> 5492420 -3.9% 5275592 perf-stat.i.branch-misses
> 4.148e+08 -2.8% 4.034e+08 perf-stat.i.cache-misses
> 5.251e+08 -2.6% 5.114e+08 perf-stat.i.cache-references
> 6880 -2.5% 6711 perf-stat.i.context-switches
> 4.30 +2.9% 4.43 perf-stat.i.cpi
> 0.10 ą 7% -0.0 0.09 ą 2% perf-stat.i.dTLB-load-miss-rate%
> 472268 ą 6% -19.9% 378489 perf-stat.i.dTLB-load-misses
> 8.107e+08 -3.4% 7.831e+08 perf-stat.i.dTLB-loads
> 0.02 ą 16% -0.0 0.01 ą 2% perf-stat.i.dTLB-store-miss-rate%
> 90535 ą 11% -59.8% 36371 ą 2% perf-stat.i.dTLB-store-misses
> 5.323e+08 -3.3% 5.145e+08 perf-stat.i.dTLB-stores
> 129981 -3.0% 126061 perf-stat.i.iTLB-load-misses
> 143662 -3.1% 139223 perf-stat.i.iTLB-loads
> 2.253e+09 -3.6% 2.172e+09 perf-stat.i.instructions
> 0.26 -3.2% 0.25 perf-stat.i.ipc
> 4.71 ą 2% -6.4% 4.41 ą 2% perf-stat.i.major-faults
> 180.03 -3.0% 174.57 perf-stat.i.metric.M/sec
> 3627 -30.8% 2510 ą 2% perf-stat.i.minor-faults
> 3632 -30.8% 2514 ą 2% perf-stat.i.page-faults
> 3.88 +3.6% 4.02 perf-stat.overall.cpi
> 21.08 +2.7% 21.65 perf-stat.overall.cycles-between-cache-misses
> 0.06 ą 6% -0.0 0.05 perf-stat.overall.dTLB-load-miss-rate%
> 0.02 ą 11% -0.0 0.01 ą 2% perf-stat.overall.dTLB-store-miss-rate%
> 0.26 -3.5% 0.25 perf-stat.overall.ipc
> 1.182e+08 -4.6% 1.128e+08 perf-stat.ps.branch-instructions
> 5468166 -4.0% 5251939 perf-stat.ps.branch-misses
> 4.135e+08 -2.7% 4.021e+08 perf-stat.ps.cache-misses
> 5.234e+08 -2.6% 5.098e+08 perf-stat.ps.cache-references
> 6859 -2.5% 6685 perf-stat.ps.context-switches
> 470567 ą 6% -19.9% 377127 perf-stat.ps.dTLB-load-misses
> 8.079e+08 -3.4% 7.805e+08 perf-stat.ps.dTLB-loads
> 90221 ą 11% -59.8% 36239 ą 2% perf-stat.ps.dTLB-store-misses
> 5.305e+08 -3.3% 5.128e+08 perf-stat.ps.dTLB-stores
> 129499 -3.0% 125601 perf-stat.ps.iTLB-load-misses
> 143121 -3.1% 138638 perf-stat.ps.iTLB-loads
> 2.246e+09 -3.6% 2.165e+09 perf-stat.ps.instructions
> 4.69 ą 2% -6.3% 4.39 ą 2% perf-stat.ps.major-faults
> 3613 -30.8% 2500 ą 2% perf-stat.ps.minor-faults
> 3617 -30.8% 2504 ą 2% perf-stat.ps.page-faults
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-20 5:27 ` Yang Shi
@ 2023-12-20 8:29 ` Yin Fengwei
2023-12-20 15:42 ` Christoph Lameter (Ampere)
2023-12-20 20:09 ` Yang Shi
0 siblings, 2 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-12-20 8:29 UTC (permalink / raw)
To: Yang Shi, kernel test robot
Cc: Rik van Riel, oe-lkp, lkp, Linux Memory Management List,
Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang,
feng.tang
On 2023/12/20 13:27, Yang Shi wrote:
> On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote:
>>
>>
>>
>> Hello,
>>
>> for this commit, we reported
>> "[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression"
>> in Aug, 2022 when it's in linux-next/master
>> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
>>
>> later, we reported
>> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
>> in Oct, 2022 when it's in linus/master
>> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
>>
>> and the commit was reverted finally by
>> commit 0ba09b1733878afe838fe35c310715fda3d46428
>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>> Date: Sun Dec 4 12:51:59 2022 -0800
>>
>> now we noticed it goes into linux-next/master again.
>>
>> we are not sure if there is an agreement that the benefit of this commit
>> has already overweight performance drop in some mirco benchmark.
>>
>> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
>> that
>> "This patch was applied to v6.1, but was reverted due to a regression
>> report. However it turned out the regression was not due to this patch.
>> I ping'ed Andrew to reapply this patch, Andrew may forget it. This
>> patch helps promote THP, so I rebased it onto the latest mm-unstable."
>
> IIRC, Huang Ying's analysis showed the regression for will-it-scale
> micro benchmark is fine, it was actually reverted due to kernel build
> regression with LLVM reported by Nathan Chancellor. Then the
> regression was resolved by commit
> 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
> if page in deferred queue already"). And this patch did improve kernel
> build with GCC by ~3% if I remember correctly.
>
>>
>> however, unfortunately, in our latest tests, we still observed below regression
>> upon this commit. just FYI.
>>
>>
>>
>> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
>
> Interesting, wasn't the same regression seen last time? And I'm a
> little bit confused about how pthread got regressed. I didn't see the
> pthread benchmark do any intensive memory alloc/free operations. Do
> the pthread APIs do any intensive memory operations? I saw the
> benchmark does allocate memory for thread stack, but it should be just
> 8K per thread, so it should not trigger what this patch does. With
> 1024 threads, the thread stacks may get merged into one single VMA (8M
> total), but it may do so even though the patch is not applied.
stress-ng.pthread test code is strange here:
https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
Even it allocates its own stack, but that attr is not passed
to pthread_create. So it's still glibc to allocate stack for
pthread which is 8M size. This is why this patch can impact
the stress-ng.pthread testing.
My understanding is this is different regression (if it's a valid
regression). The previous hotspot was in:
deferred_split_huge_page
deferred_split_huge_page
deferred_split_huge_page
spin_lock
while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
- 55.02% zap_pmd_range.isra.0
- 53.42% __split_huge_pmd
- 51.74% _raw_spin_lock
- 51.73% native_queued_spin_lock_slowpath
+ 3.03% asm_sysvec_call_function
- 1.67% __split_huge_pmd_locked
- 0.87% pmdp_invalidate
+ 0.86% flush_tlb_mm_range
- 1.60% zap_pte_range
- 1.04% page_remove_rmap
0.55% __mod_lruvec_page_state
>
>>
>>
>> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>> testcase: stress-ng
>> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
>> parameters:
>>
>> nr_threads: 1
>> disk: 1HDD
>> testtime: 60s
>> fs: ext4
>> class: os
>> test: pthread
>> cpufreq_governor: performance
>>
>>
>> In addition to that, the commit also has significant impact on the following tests:
>>
>> +------------------+-----------------------------------------------------------------------------------------------+
>> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression |
>> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory |
>> | test parameters | array_size=50000000 |
>> | | cpufreq_governor=performance |
>> | | iterations=10x |
>> | | loop=100 |
>> | | nr_threads=25% |
>> | | omp=true |
>> +------------------+-----------------------------------------------------------------------------------------------+
>> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression |
>> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
>> | test parameters | cpufreq_governor=performance |
>> | | option_a=Average |
>> | | option_b=Integer |
>> | | test=ramspeed-1.4.3 |
>> +------------------+-----------------------------------------------------------------------------------------------+
>> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
>> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
>> | test parameters | cpufreq_governor=performance |
>> | | option_a=Average |
>> | | option_b=Floating Point |
>> | | test=ramspeed-1.4.3 |
>> +------------------+-----------------------------------------------------------------------------------------------+
>>
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>> The kernel config and materials to reproduce are available at:
>> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
>>
>> =========================================================================================
>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
>>
>> commit:
>> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>>
>> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
>> ---------------- ---------------------------
>> %stddev %change %stddev
>> \ | \
>> 13405796 -65.5% 4620124 cpuidle..usage
>> 8.00 +8.2% 8.66 ą 2% iostat.cpu.system
>> 1.61 -60.6% 0.63 iostat.cpu.user
>> 597.50 ą 14% -64.3% 213.50 ą 14% perf-c2c.DRAM.local
>> 1882 ą 14% -74.7% 476.83 ą 7% perf-c2c.HITM.local
>> 3768436 -12.9% 3283395 vmstat.memory.cache
>> 355105 -75.7% 86344 ą 3% vmstat.system.cs
>> 385435 -20.7% 305714 ą 3% vmstat.system.in
>> 1.13 -0.2 0.88 mpstat.cpu.all.irq%
>> 0.29 -0.2 0.10 ą 2% mpstat.cpu.all.soft%
>> 6.76 ą 2% +1.1 7.88 ą 2% mpstat.cpu.all.sys%
>> 1.62 -1.0 0.62 ą 2% mpstat.cpu.all.usr%
>> 2234397 -84.3% 350161 ą 5% stress-ng.pthread.ops
>> 37237 -84.3% 5834 ą 5% stress-ng.pthread.ops_per_sec
>> 294706 ą 2% -68.0% 94191 ą 6% stress-ng.time.involuntary_context_switches
>> 41442 ą 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size
>> 4466457 -83.9% 717053 ą 5% stress-ng.time.minor_page_faults
>
> The larger RSS and fewer page faults are expected.
>
>> 243.33 +13.5% 276.17 ą 3% stress-ng.time.percent_of_cpu_this_job_got
>> 131.64 +27.7% 168.11 ą 3% stress-ng.time.system_time
>> 19.73 -82.1% 3.53 ą 4% stress-ng.time.user_time
>
> Much less user time. And it seems to match the drop of the pthread metric.
>
>> 7715609 -80.2% 1530125 ą 4% stress-ng.time.voluntary_context_switches
>> 76728 -80.8% 14724 ą 4% perf-stat.i.minor-faults
>> 5600408 -61.4% 2160997 ą 5% perf-stat.i.node-loads
>> 8873996 +52.1% 13499744 ą 5% perf-stat.i.node-stores
>> 112409 -81.9% 20305 ą 4% perf-stat.i.page-faults
>> 2.55 +89.6% 4.83 perf-stat.overall.MPKI
>
> Much more TLB misses.
>
>> 1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate%
>> 19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate%
>> 1.70 +56.4% 2.65 perf-stat.overall.cpi
>> 665.84 -17.5% 549.51 ą 2% perf-stat.overall.cycles-between-cache-misses
>> 0.12 ą 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate%
>> 0.08 ą 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate%
>> 59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate%
>> 1278 +86.1% 2379 ą 2% perf-stat.overall.instructions-per-iTLB-miss
>> 0.59 -36.1% 0.38 perf-stat.overall.ipc
>
> Worse IPC and CPI.
>
>> 2.078e+09 -48.3% 1.074e+09 ą 4% perf-stat.ps.branch-instructions
>> 31292687 -61.2% 12133349 ą 2% perf-stat.ps.branch-misses
>> 26057291 -5.9% 24512034 ą 4% perf-stat.ps.cache-misses
>> 1.353e+08 -58.6% 56072195 ą 4% perf-stat.ps.cache-references
>> 365254 -75.8% 88464 ą 3% perf-stat.ps.context-switches
>> 1.735e+10 -22.4% 1.346e+10 ą 2% perf-stat.ps.cpu-cycles
>> 60838 -79.1% 12727 ą 6% perf-stat.ps.cpu-migrations
>> 3056601 ą 4% -81.5% 565354 ą 4% perf-stat.ps.dTLB-load-misses
>> 2.636e+09 -50.7% 1.3e+09 ą 4% perf-stat.ps.dTLB-loads
>> 1155253 ą 2% -83.0% 196581 ą 5% perf-stat.ps.dTLB-store-misses
>> 1.473e+09 -57.4% 6.268e+08 ą 3% perf-stat.ps.dTLB-stores
>> 7997726 -73.3% 2131477 ą 3% perf-stat.ps.iTLB-load-misses
>> 5521346 -74.3% 1418623 ą 2% perf-stat.ps.iTLB-loads
>> 1.023e+10 -50.4% 5.073e+09 ą 4% perf-stat.ps.instructions
>> 75671 -80.9% 14479 ą 4% perf-stat.ps.minor-faults
>> 5549722 -61.4% 2141750 ą 4% perf-stat.ps.node-loads
>> 8769156 +51.6% 13296579 ą 5% perf-stat.ps.node-stores
>> 110795 -82.0% 19977 ą 4% perf-stat.ps.page-faults
>> 6.482e+11 -50.7% 3.197e+11 ą 4% perf-stat.total.instructions
>> 0.00 ą 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>> 0.01 ą 18% +8373.1% 0.73 ą 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>> 0.01 ą 16% +4600.0% 0.38 ą 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
>
> More time spent in madvise and munmap. but I'm not sure whether this
> is caused by tearing down the address space when exiting the test. If
> so it should not count in the regression.
It's not for the whole address space tearing down. It's for pthread
stack tearing down when pthread exit (can be treated as address space
tearing down? I suppose so).
https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
Another thing is whether it's worthy to make stack use THP? It may be
useful for some apps which need large stack size?
Regards
Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-20 8:29 ` Yin Fengwei
@ 2023-12-20 15:42 ` Christoph Lameter (Ampere)
2023-12-20 20:14 ` Yang Shi
2023-12-20 20:09 ` Yang Shi
1 sibling, 1 reply; 24+ messages in thread
From: Christoph Lameter (Ampere) @ 2023-12-20 15:42 UTC (permalink / raw)
To: Yin Fengwei
Cc: Yang Shi, kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
ying.huang, feng.tang
On Wed, 20 Dec 2023, Yin Fengwei wrote:
>> Interesting, wasn't the same regression seen last time? And I'm a
>> little bit confused about how pthread got regressed. I didn't see the
>> pthread benchmark do any intensive memory alloc/free operations. Do
>> the pthread APIs do any intensive memory operations? I saw the
>> benchmark does allocate memory for thread stack, but it should be just
>> 8K per thread, so it should not trigger what this patch does. With
>> 1024 threads, the thread stacks may get merged into one single VMA (8M
>> total), but it may do so even though the patch is not applied.
> stress-ng.pthread test code is strange here:
>
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
>
> Even it allocates its own stack, but that attr is not passed
> to pthread_create. So it's still glibc to allocate stack for
> pthread which is 8M size. This is why this patch can impact
> the stress-ng.pthread testing.
Hmmm... The use of calloc() for 8M triggers an mmap I guess.
Why is that memory slower if we align the adress to a 2M boundary? Because
THP can act faster and creates more overhead?
> while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
> - 55.02% zap_pmd_range.isra.0
> - 53.42% __split_huge_pmd
> - 51.74% _raw_spin_lock
> - 51.73% native_queued_spin_lock_slowpath
> + 3.03% asm_sysvec_call_function
> - 1.67% __split_huge_pmd_locked
> - 0.87% pmdp_invalidate
> + 0.86% flush_tlb_mm_range
> - 1.60% zap_pte_range
> - 1.04% page_remove_rmap
> 0.55% __mod_lruvec_page_state
Ok so we have 2M mappings and they are split because of some action on 4K
segments? Guess because of the guard pages?
>> More time spent in madvise and munmap. but I'm not sure whether this
>> is caused by tearing down the address space when exiting the test. If
>> so it should not count in the regression.
> It's not for the whole address space tearing down. It's for pthread
> stack tearing down when pthread exit (can be treated as address space
> tearing down? I suppose so).
>
> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
> https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
>
> Another thing is whether it's worthy to make stack use THP? It may be
> useful for some apps which need large stack size?
No can do since a calloc is used to allocate the stack. How can the kernel
distinguish the allocation?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-20 8:29 ` Yin Fengwei
2023-12-20 15:42 ` Christoph Lameter (Ampere)
@ 2023-12-20 20:09 ` Yang Shi
2023-12-21 0:26 ` Yang Shi
1 sibling, 1 reply; 24+ messages in thread
From: Yang Shi @ 2023-12-20 20:09 UTC (permalink / raw)
To: Yin Fengwei
Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang
On Wed, Dec 20, 2023 at 12:34 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 2023/12/20 13:27, Yang Shi wrote:
> > On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote:
> >>
> >>
> >>
> >> Hello,
> >>
> >> for this commit, we reported
> >> "[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression"
> >> in Aug, 2022 when it's in linux-next/master
> >> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
> >>
> >> later, we reported
> >> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
> >> in Oct, 2022 when it's in linus/master
> >> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
> >>
> >> and the commit was reverted finally by
> >> commit 0ba09b1733878afe838fe35c310715fda3d46428
> >> Author: Linus Torvalds <torvalds@linux-foundation.org>
> >> Date: Sun Dec 4 12:51:59 2022 -0800
> >>
> >> now we noticed it goes into linux-next/master again.
> >>
> >> we are not sure if there is an agreement that the benefit of this commit
> >> has already overweight performance drop in some mirco benchmark.
> >>
> >> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
> >> that
> >> "This patch was applied to v6.1, but was reverted due to a regression
> >> report. However it turned out the regression was not due to this patch.
> >> I ping'ed Andrew to reapply this patch, Andrew may forget it. This
> >> patch helps promote THP, so I rebased it onto the latest mm-unstable."
> >
> > IIRC, Huang Ying's analysis showed the regression for will-it-scale
> > micro benchmark is fine, it was actually reverted due to kernel build
> > regression with LLVM reported by Nathan Chancellor. Then the
> > regression was resolved by commit
> > 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
> > if page in deferred queue already"). And this patch did improve kernel
> > build with GCC by ~3% if I remember correctly.
> >
> >>
> >> however, unfortunately, in our latest tests, we still observed below regression
> >> upon this commit. just FYI.
> >>
> >>
> >>
> >> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
> >
> > Interesting, wasn't the same regression seen last time? And I'm a
> > little bit confused about how pthread got regressed. I didn't see the
> > pthread benchmark do any intensive memory alloc/free operations. Do
> > the pthread APIs do any intensive memory operations? I saw the
> > benchmark does allocate memory for thread stack, but it should be just
> > 8K per thread, so it should not trigger what this patch does. With
> > 1024 threads, the thread stacks may get merged into one single VMA (8M
> > total), but it may do so even though the patch is not applied.
> stress-ng.pthread test code is strange here:
>
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
>
> Even it allocates its own stack, but that attr is not passed
> to pthread_create. So it's still glibc to allocate stack for
> pthread which is 8M size. This is why this patch can impact
> the stress-ng.pthread testing.
Aha, nice catch, I overlooked that.
>
>
> My understanding is this is different regression (if it's a valid
> regression). The previous hotspot was in:
> deferred_split_huge_page
> deferred_split_huge_page
> deferred_split_huge_page
> spin_lock
>
> while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
> - 55.02% zap_pmd_range.isra.0
> - 53.42% __split_huge_pmd
> - 51.74% _raw_spin_lock
> - 51.73% native_queued_spin_lock_slowpath
> + 3.03% asm_sysvec_call_function
> - 1.67% __split_huge_pmd_locked
> - 0.87% pmdp_invalidate
> + 0.86% flush_tlb_mm_range
> - 1.60% zap_pte_range
> - 1.04% page_remove_rmap
> 0.55% __mod_lruvec_page_state
>
>
> >
> >>
> >>
> >> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
> >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>
> >> testcase: stress-ng
> >> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
> >> parameters:
> >>
> >> nr_threads: 1
> >> disk: 1HDD
> >> testtime: 60s
> >> fs: ext4
> >> class: os
> >> test: pthread
> >> cpufreq_governor: performance
> >>
> >>
> >> In addition to that, the commit also has significant impact on the following tests:
> >>
> >> +------------------+-----------------------------------------------------------------------------------------------+
> >> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression |
> >> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory |
> >> | test parameters | array_size=50000000 |
> >> | | cpufreq_governor=performance |
> >> | | iterations=10x |
> >> | | loop=100 |
> >> | | nr_threads=25% |
> >> | | omp=true |
> >> +------------------+-----------------------------------------------------------------------------------------------+
> >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression |
> >> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
> >> | test parameters | cpufreq_governor=performance |
> >> | | option_a=Average |
> >> | | option_b=Integer |
> >> | | test=ramspeed-1.4.3 |
> >> +------------------+-----------------------------------------------------------------------------------------------+
> >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
> >> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
> >> | test parameters | cpufreq_governor=performance |
> >> | | option_a=Average |
> >> | | option_b=Floating Point |
> >> | | test=ramspeed-1.4.3 |
> >> +------------------+-----------------------------------------------------------------------------------------------+
> >>
> >>
> >> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> >> the same patch/commit), kindly add following tags
> >> | Reported-by: kernel test robot <oliver.sang@intel.com>
> >> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
> >>
> >>
> >> Details are as below:
> >> -------------------------------------------------------------------------------------------------->
> >>
> >>
> >> The kernel config and materials to reproduce are available at:
> >> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
> >>
> >> =========================================================================================
> >> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> >> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
> >>
> >> commit:
> >> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
> >> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
> >>
> >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> >> ---------------- ---------------------------
> >> %stddev %change %stddev
> >> \ | \
> >> 13405796 -65.5% 4620124 cpuidle..usage
> >> 8.00 +8.2% 8.66 ą 2% iostat.cpu.system
> >> 1.61 -60.6% 0.63 iostat.cpu.user
> >> 597.50 ą 14% -64.3% 213.50 ą 14% perf-c2c.DRAM.local
> >> 1882 ą 14% -74.7% 476.83 ą 7% perf-c2c.HITM.local
> >> 3768436 -12.9% 3283395 vmstat.memory.cache
> >> 355105 -75.7% 86344 ą 3% vmstat.system.cs
> >> 385435 -20.7% 305714 ą 3% vmstat.system.in
> >> 1.13 -0.2 0.88 mpstat.cpu.all.irq%
> >> 0.29 -0.2 0.10 ą 2% mpstat.cpu.all.soft%
> >> 6.76 ą 2% +1.1 7.88 ą 2% mpstat.cpu.all.sys%
> >> 1.62 -1.0 0.62 ą 2% mpstat.cpu.all.usr%
> >> 2234397 -84.3% 350161 ą 5% stress-ng.pthread.ops
> >> 37237 -84.3% 5834 ą 5% stress-ng.pthread.ops_per_sec
> >> 294706 ą 2% -68.0% 94191 ą 6% stress-ng.time.involuntary_context_switches
> >> 41442 ą 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size
> >> 4466457 -83.9% 717053 ą 5% stress-ng.time.minor_page_faults
> >
> > The larger RSS and fewer page faults are expected.
> >
> >> 243.33 +13.5% 276.17 ą 3% stress-ng.time.percent_of_cpu_this_job_got
> >> 131.64 +27.7% 168.11 ą 3% stress-ng.time.system_time
> >> 19.73 -82.1% 3.53 ą 4% stress-ng.time.user_time
> >
> > Much less user time. And it seems to match the drop of the pthread metric.
> >
> >> 7715609 -80.2% 1530125 ą 4% stress-ng.time.voluntary_context_switches
> >> 76728 -80.8% 14724 ą 4% perf-stat.i.minor-faults
> >> 5600408 -61.4% 2160997 ą 5% perf-stat.i.node-loads
> >> 8873996 +52.1% 13499744 ą 5% perf-stat.i.node-stores
> >> 112409 -81.9% 20305 ą 4% perf-stat.i.page-faults
> >> 2.55 +89.6% 4.83 perf-stat.overall.MPKI
> >
> > Much more TLB misses.
> >
> >> 1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate%
> >> 19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate%
> >> 1.70 +56.4% 2.65 perf-stat.overall.cpi
> >> 665.84 -17.5% 549.51 ą 2% perf-stat.overall.cycles-between-cache-misses
> >> 0.12 ą 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate%
> >> 0.08 ą 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate%
> >> 59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate%
> >> 1278 +86.1% 2379 ą 2% perf-stat.overall.instructions-per-iTLB-miss
> >> 0.59 -36.1% 0.38 perf-stat.overall.ipc
> >
> > Worse IPC and CPI.
> >
> >> 2.078e+09 -48.3% 1.074e+09 ą 4% perf-stat.ps.branch-instructions
> >> 31292687 -61.2% 12133349 ą 2% perf-stat.ps.branch-misses
> >> 26057291 -5.9% 24512034 ą 4% perf-stat.ps.cache-misses
> >> 1.353e+08 -58.6% 56072195 ą 4% perf-stat.ps.cache-references
> >> 365254 -75.8% 88464 ą 3% perf-stat.ps.context-switches
> >> 1.735e+10 -22.4% 1.346e+10 ą 2% perf-stat.ps.cpu-cycles
> >> 60838 -79.1% 12727 ą 6% perf-stat.ps.cpu-migrations
> >> 3056601 ą 4% -81.5% 565354 ą 4% perf-stat.ps.dTLB-load-misses
> >> 2.636e+09 -50.7% 1.3e+09 ą 4% perf-stat.ps.dTLB-loads
> >> 1155253 ą 2% -83.0% 196581 ą 5% perf-stat.ps.dTLB-store-misses
> >> 1.473e+09 -57.4% 6.268e+08 ą 3% perf-stat.ps.dTLB-stores
> >> 7997726 -73.3% 2131477 ą 3% perf-stat.ps.iTLB-load-misses
> >> 5521346 -74.3% 1418623 ą 2% perf-stat.ps.iTLB-loads
> >> 1.023e+10 -50.4% 5.073e+09 ą 4% perf-stat.ps.instructions
> >> 75671 -80.9% 14479 ą 4% perf-stat.ps.minor-faults
> >> 5549722 -61.4% 2141750 ą 4% perf-stat.ps.node-loads
> >> 8769156 +51.6% 13296579 ą 5% perf-stat.ps.node-stores
> >> 110795 -82.0% 19977 ą 4% perf-stat.ps.page-faults
> >> 6.482e+11 -50.7% 3.197e+11 ą 4% perf-stat.total.instructions
> >> 0.00 ą 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
> >> 0.01 ą 18% +8373.1% 0.73 ą 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
> >> 0.01 ą 16% +4600.0% 0.38 ą 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
> >
> > More time spent in madvise and munmap. but I'm not sure whether this
> > is caused by tearing down the address space when exiting the test. If
> > so it should not count in the regression.
> It's not for the whole address space tearing down. It's for pthread
> stack tearing down when pthread exit (can be treated as address space
> tearing down? I suppose so).
>
> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
> https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
It explains the problem. The madvise() does have some extra overhead
for handling THP (splitting pmd, deferred split queue, etc).
>
> Another thing is whether it's worthy to make stack use THP? It may be
> useful for some apps which need large stack size?
Kernel actually doesn't apply THP to stack (see
vma_is_temporary_stack()). But kernel can't know whether the VMA is
stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc
doesn't set the proper flags to tell kernel the area is stack, kernel
just treats it as normal anonymous area. So glibc should set up stack
properly IMHO.
>
>
> Regards
> Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-20 15:42 ` Christoph Lameter (Ampere)
@ 2023-12-20 20:14 ` Yang Shi
0 siblings, 0 replies; 24+ messages in thread
From: Yang Shi @ 2023-12-20 20:14 UTC (permalink / raw)
To: Christoph Lameter (Ampere)
Cc: Yin Fengwei, kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
ying.huang, feng.tang
On Wed, Dec 20, 2023 at 7:42 AM Christoph Lameter (Ampere) <cl@linux.com> wrote:
>
> On Wed, 20 Dec 2023, Yin Fengwei wrote:
>
> >> Interesting, wasn't the same regression seen last time? And I'm a
> >> little bit confused about how pthread got regressed. I didn't see the
> >> pthread benchmark do any intensive memory alloc/free operations. Do
> >> the pthread APIs do any intensive memory operations? I saw the
> >> benchmark does allocate memory for thread stack, but it should be just
> >> 8K per thread, so it should not trigger what this patch does. With
> >> 1024 threads, the thread stacks may get merged into one single VMA (8M
> >> total), but it may do so even though the patch is not applied.
> > stress-ng.pthread test code is strange here:
> >
> > https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
> >
> > Even it allocates its own stack, but that attr is not passed
> > to pthread_create. So it's still glibc to allocate stack for
> > pthread which is 8M size. This is why this patch can impact
> > the stress-ng.pthread testing.
>
> Hmmm... The use of calloc() for 8M triggers an mmap I guess.
>
> Why is that memory slower if we align the adress to a 2M boundary? Because
> THP can act faster and creates more overhead?
glibc calls madvise() to free unused stack, that may have higher cost
due to THP (splitting pmd, deferred split queue, etc).
>
> > while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
> > - 55.02% zap_pmd_range.isra.0
> > - 53.42% __split_huge_pmd
> > - 51.74% _raw_spin_lock
> > - 51.73% native_queued_spin_lock_slowpath
> > + 3.03% asm_sysvec_call_function
> > - 1.67% __split_huge_pmd_locked
> > - 0.87% pmdp_invalidate
> > + 0.86% flush_tlb_mm_range
> > - 1.60% zap_pte_range
> > - 1.04% page_remove_rmap
> > 0.55% __mod_lruvec_page_state
>
> Ok so we have 2M mappings and they are split because of some action on 4K
> segments? Guess because of the guard pages?
It should not relate to guard pages, just due to free unused stack
which may be partial 2M.
>
> >> More time spent in madvise and munmap. but I'm not sure whether this
> >> is caused by tearing down the address space when exiting the test. If
> >> so it should not count in the regression.
> > It's not for the whole address space tearing down. It's for pthread
> > stack tearing down when pthread exit (can be treated as address space
> > tearing down? I suppose so).
> >
> > https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
> > https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
> >
> > Another thing is whether it's worthy to make stack use THP? It may be
> > useful for some apps which need large stack size?
>
> No can do since a calloc is used to allocate the stack. How can the kernel
> distinguish the allocation?
Just by VM_GROWSDOWN | VM_GROWSUP. The user space needs to tell kernel
this area is stack by setting proper flags. For example,
ffffca1df000-ffffca200000 rw-p 00000000 00:00 0 [stack]
Size: 132 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 60 kB
Pss: 60 kB
Pss_Dirty: 60 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 60 kB
Referenced: 60 kB
Anonymous: 60 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 0
VmFlags: rd wr mr mw me gd ac
The "gd" flag means GROWSDOWN. But it totally depends on glibc in
terms of how it considers about "stack". So glibc just uses calloc()
to allocate stack area.
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-20 20:09 ` Yang Shi
@ 2023-12-21 0:26 ` Yang Shi
2023-12-21 0:58 ` Yin Fengwei
0 siblings, 1 reply; 24+ messages in thread
From: Yang Shi @ 2023-12-21 0:26 UTC (permalink / raw)
To: Yin Fengwei
Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang
On Wed, Dec 20, 2023 at 12:09 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, Dec 20, 2023 at 12:34 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> >
> >
> >
> > On 2023/12/20 13:27, Yang Shi wrote:
> > > On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote:
> > >>
> > >>
> > >>
> > >> Hello,
> > >>
> > >> for this commit, we reported
> > >> "[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression"
> > >> in Aug, 2022 when it's in linux-next/master
> > >> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
> > >>
> > >> later, we reported
> > >> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
> > >> in Oct, 2022 when it's in linus/master
> > >> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
> > >>
> > >> and the commit was reverted finally by
> > >> commit 0ba09b1733878afe838fe35c310715fda3d46428
> > >> Author: Linus Torvalds <torvalds@linux-foundation.org>
> > >> Date: Sun Dec 4 12:51:59 2022 -0800
> > >>
> > >> now we noticed it goes into linux-next/master again.
> > >>
> > >> we are not sure if there is an agreement that the benefit of this commit
> > >> has already overweight performance drop in some mirco benchmark.
> > >>
> > >> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
> > >> that
> > >> "This patch was applied to v6.1, but was reverted due to a regression
> > >> report. However it turned out the regression was not due to this patch.
> > >> I ping'ed Andrew to reapply this patch, Andrew may forget it. This
> > >> patch helps promote THP, so I rebased it onto the latest mm-unstable."
> > >
> > > IIRC, Huang Ying's analysis showed the regression for will-it-scale
> > > micro benchmark is fine, it was actually reverted due to kernel build
> > > regression with LLVM reported by Nathan Chancellor. Then the
> > > regression was resolved by commit
> > > 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
> > > if page in deferred queue already"). And this patch did improve kernel
> > > build with GCC by ~3% if I remember correctly.
> > >
> > >>
> > >> however, unfortunately, in our latest tests, we still observed below regression
> > >> upon this commit. just FYI.
> > >>
> > >>
> > >>
> > >> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
> > >
> > > Interesting, wasn't the same regression seen last time? And I'm a
> > > little bit confused about how pthread got regressed. I didn't see the
> > > pthread benchmark do any intensive memory alloc/free operations. Do
> > > the pthread APIs do any intensive memory operations? I saw the
> > > benchmark does allocate memory for thread stack, but it should be just
> > > 8K per thread, so it should not trigger what this patch does. With
> > > 1024 threads, the thread stacks may get merged into one single VMA (8M
> > > total), but it may do so even though the patch is not applied.
> > stress-ng.pthread test code is strange here:
> >
> > https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
> >
> > Even it allocates its own stack, but that attr is not passed
> > to pthread_create. So it's still glibc to allocate stack for
> > pthread which is 8M size. This is why this patch can impact
> > the stress-ng.pthread testing.
>
> Aha, nice catch, I overlooked that.
>
> >
> >
> > My understanding is this is different regression (if it's a valid
> > regression). The previous hotspot was in:
> > deferred_split_huge_page
> > deferred_split_huge_page
> > deferred_split_huge_page
> > spin_lock
> >
> > while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
> > - 55.02% zap_pmd_range.isra.0
> > - 53.42% __split_huge_pmd
> > - 51.74% _raw_spin_lock
> > - 51.73% native_queued_spin_lock_slowpath
> > + 3.03% asm_sysvec_call_function
> > - 1.67% __split_huge_pmd_locked
> > - 0.87% pmdp_invalidate
> > + 0.86% flush_tlb_mm_range
> > - 1.60% zap_pte_range
> > - 1.04% page_remove_rmap
> > 0.55% __mod_lruvec_page_state
> >
> >
> > >
> > >>
> > >>
> > >> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
> > >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >>
> > >> testcase: stress-ng
> > >> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
> > >> parameters:
> > >>
> > >> nr_threads: 1
> > >> disk: 1HDD
> > >> testtime: 60s
> > >> fs: ext4
> > >> class: os
> > >> test: pthread
> > >> cpufreq_governor: performance
> > >>
> > >>
> > >> In addition to that, the commit also has significant impact on the following tests:
> > >>
> > >> +------------------+-----------------------------------------------------------------------------------------------+
> > >> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression |
> > >> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory |
> > >> | test parameters | array_size=50000000 |
> > >> | | cpufreq_governor=performance |
> > >> | | iterations=10x |
> > >> | | loop=100 |
> > >> | | nr_threads=25% |
> > >> | | omp=true |
> > >> +------------------+-----------------------------------------------------------------------------------------------+
> > >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression |
> > >> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
> > >> | test parameters | cpufreq_governor=performance |
> > >> | | option_a=Average |
> > >> | | option_b=Integer |
> > >> | | test=ramspeed-1.4.3 |
> > >> +------------------+-----------------------------------------------------------------------------------------------+
> > >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
> > >> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
> > >> | test parameters | cpufreq_governor=performance |
> > >> | | option_a=Average |
> > >> | | option_b=Floating Point |
> > >> | | test=ramspeed-1.4.3 |
> > >> +------------------+-----------------------------------------------------------------------------------------------+
> > >>
> > >>
> > >> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > >> the same patch/commit), kindly add following tags
> > >> | Reported-by: kernel test robot <oliver.sang@intel.com>
> > >> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
> > >>
> > >>
> > >> Details are as below:
> > >> -------------------------------------------------------------------------------------------------->
> > >>
> > >>
> > >> The kernel config and materials to reproduce are available at:
> > >> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
> > >>
> > >> =========================================================================================
> > >> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> > >> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
> > >>
> > >> commit:
> > >> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
> > >> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
> > >>
> > >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> > >> ---------------- ---------------------------
> > >> %stddev %change %stddev
> > >> \ | \
> > >> 13405796 -65.5% 4620124 cpuidle..usage
> > >> 8.00 +8.2% 8.66 ą 2% iostat.cpu.system
> > >> 1.61 -60.6% 0.63 iostat.cpu.user
> > >> 597.50 ą 14% -64.3% 213.50 ą 14% perf-c2c.DRAM.local
> > >> 1882 ą 14% -74.7% 476.83 ą 7% perf-c2c.HITM.local
> > >> 3768436 -12.9% 3283395 vmstat.memory.cache
> > >> 355105 -75.7% 86344 ą 3% vmstat.system.cs
> > >> 385435 -20.7% 305714 ą 3% vmstat.system.in
> > >> 1.13 -0.2 0.88 mpstat.cpu.all.irq%
> > >> 0.29 -0.2 0.10 ą 2% mpstat.cpu.all.soft%
> > >> 6.76 ą 2% +1.1 7.88 ą 2% mpstat.cpu.all.sys%
> > >> 1.62 -1.0 0.62 ą 2% mpstat.cpu.all.usr%
> > >> 2234397 -84.3% 350161 ą 5% stress-ng.pthread.ops
> > >> 37237 -84.3% 5834 ą 5% stress-ng.pthread.ops_per_sec
> > >> 294706 ą 2% -68.0% 94191 ą 6% stress-ng.time.involuntary_context_switches
> > >> 41442 ą 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size
> > >> 4466457 -83.9% 717053 ą 5% stress-ng.time.minor_page_faults
> > >
> > > The larger RSS and fewer page faults are expected.
> > >
> > >> 243.33 +13.5% 276.17 ą 3% stress-ng.time.percent_of_cpu_this_job_got
> > >> 131.64 +27.7% 168.11 ą 3% stress-ng.time.system_time
> > >> 19.73 -82.1% 3.53 ą 4% stress-ng.time.user_time
> > >
> > > Much less user time. And it seems to match the drop of the pthread metric.
> > >
> > >> 7715609 -80.2% 1530125 ą 4% stress-ng.time.voluntary_context_switches
> > >> 76728 -80.8% 14724 ą 4% perf-stat.i.minor-faults
> > >> 5600408 -61.4% 2160997 ą 5% perf-stat.i.node-loads
> > >> 8873996 +52.1% 13499744 ą 5% perf-stat.i.node-stores
> > >> 112409 -81.9% 20305 ą 4% perf-stat.i.page-faults
> > >> 2.55 +89.6% 4.83 perf-stat.overall.MPKI
> > >
> > > Much more TLB misses.
> > >
> > >> 1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate%
> > >> 19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate%
> > >> 1.70 +56.4% 2.65 perf-stat.overall.cpi
> > >> 665.84 -17.5% 549.51 ą 2% perf-stat.overall.cycles-between-cache-misses
> > >> 0.12 ą 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate%
> > >> 0.08 ą 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate%
> > >> 59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate%
> > >> 1278 +86.1% 2379 ą 2% perf-stat.overall.instructions-per-iTLB-miss
> > >> 0.59 -36.1% 0.38 perf-stat.overall.ipc
> > >
> > > Worse IPC and CPI.
> > >
> > >> 2.078e+09 -48.3% 1.074e+09 ą 4% perf-stat.ps.branch-instructions
> > >> 31292687 -61.2% 12133349 ą 2% perf-stat.ps.branch-misses
> > >> 26057291 -5.9% 24512034 ą 4% perf-stat.ps.cache-misses
> > >> 1.353e+08 -58.6% 56072195 ą 4% perf-stat.ps.cache-references
> > >> 365254 -75.8% 88464 ą 3% perf-stat.ps.context-switches
> > >> 1.735e+10 -22.4% 1.346e+10 ą 2% perf-stat.ps.cpu-cycles
> > >> 60838 -79.1% 12727 ą 6% perf-stat.ps.cpu-migrations
> > >> 3056601 ą 4% -81.5% 565354 ą 4% perf-stat.ps.dTLB-load-misses
> > >> 2.636e+09 -50.7% 1.3e+09 ą 4% perf-stat.ps.dTLB-loads
> > >> 1155253 ą 2% -83.0% 196581 ą 5% perf-stat.ps.dTLB-store-misses
> > >> 1.473e+09 -57.4% 6.268e+08 ą 3% perf-stat.ps.dTLB-stores
> > >> 7997726 -73.3% 2131477 ą 3% perf-stat.ps.iTLB-load-misses
> > >> 5521346 -74.3% 1418623 ą 2% perf-stat.ps.iTLB-loads
> > >> 1.023e+10 -50.4% 5.073e+09 ą 4% perf-stat.ps.instructions
> > >> 75671 -80.9% 14479 ą 4% perf-stat.ps.minor-faults
> > >> 5549722 -61.4% 2141750 ą 4% perf-stat.ps.node-loads
> > >> 8769156 +51.6% 13296579 ą 5% perf-stat.ps.node-stores
> > >> 110795 -82.0% 19977 ą 4% perf-stat.ps.page-faults
> > >> 6.482e+11 -50.7% 3.197e+11 ą 4% perf-stat.total.instructions
> > >> 0.00 ą 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
> > >> 0.01 ą 18% +8373.1% 0.73 ą 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
> > >> 0.01 ą 16% +4600.0% 0.38 ą 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
> > >
> > > More time spent in madvise and munmap. but I'm not sure whether this
> > > is caused by tearing down the address space when exiting the test. If
> > > so it should not count in the regression.
> > It's not for the whole address space tearing down. It's for pthread
> > stack tearing down when pthread exit (can be treated as address space
> > tearing down? I suppose so).
> >
> > https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
> > https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
>
> It explains the problem. The madvise() does have some extra overhead
> for handling THP (splitting pmd, deferred split queue, etc).
>
> >
> > Another thing is whether it's worthy to make stack use THP? It may be
> > useful for some apps which need large stack size?
>
> Kernel actually doesn't apply THP to stack (see
> vma_is_temporary_stack()). But kernel can't know whether the VMA is
> stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc
> doesn't set the proper flags to tell kernel the area is stack, kernel
> just treats it as normal anonymous area. So glibc should set up stack
> properly IMHO.
If I read the code correctly, nptl allocates stack by the below code:
mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
See https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L563
The MAP_STACK is used, but it is a no-op on Linux. So the alternative
is to make MAP_STACK useful on Linux instead of changing glibc. But
the blast radius seems much wider.
>
> >
> >
> > Regards
> > Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-21 0:26 ` Yang Shi
@ 2023-12-21 0:58 ` Yin Fengwei
2023-12-21 1:02 ` Yin Fengwei
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-12-21 0:58 UTC (permalink / raw)
To: Yang Shi
Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang
On 2023/12/21 08:26, Yang Shi wrote:
> On Wed, Dec 20, 2023 at 12:09 PM Yang Shi <shy828301@gmail.com> wrote:
>>
>> On Wed, Dec 20, 2023 at 12:34 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
>>>
>>>
>>>
>>> On 2023/12/20 13:27, Yang Shi wrote:
>>>> On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> for this commit, we reported
>>>>> "[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression"
>>>>> in Aug, 2022 when it's in linux-next/master
>>>>> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
>>>>>
>>>>> later, we reported
>>>>> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
>>>>> in Oct, 2022 when it's in linus/master
>>>>> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
>>>>>
>>>>> and the commit was reverted finally by
>>>>> commit 0ba09b1733878afe838fe35c310715fda3d46428
>>>>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>>>>> Date: Sun Dec 4 12:51:59 2022 -0800
>>>>>
>>>>> now we noticed it goes into linux-next/master again.
>>>>>
>>>>> we are not sure if there is an agreement that the benefit of this commit
>>>>> has already overweight performance drop in some mirco benchmark.
>>>>>
>>>>> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
>>>>> that
>>>>> "This patch was applied to v6.1, but was reverted due to a regression
>>>>> report. However it turned out the regression was not due to this patch.
>>>>> I ping'ed Andrew to reapply this patch, Andrew may forget it. This
>>>>> patch helps promote THP, so I rebased it onto the latest mm-unstable."
>>>>
>>>> IIRC, Huang Ying's analysis showed the regression for will-it-scale
>>>> micro benchmark is fine, it was actually reverted due to kernel build
>>>> regression with LLVM reported by Nathan Chancellor. Then the
>>>> regression was resolved by commit
>>>> 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
>>>> if page in deferred queue already"). And this patch did improve kernel
>>>> build with GCC by ~3% if I remember correctly.
>>>>
>>>>>
>>>>> however, unfortunately, in our latest tests, we still observed below regression
>>>>> upon this commit. just FYI.
>>>>>
>>>>>
>>>>>
>>>>> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
>>>>
>>>> Interesting, wasn't the same regression seen last time? And I'm a
>>>> little bit confused about how pthread got regressed. I didn't see the
>>>> pthread benchmark do any intensive memory alloc/free operations. Do
>>>> the pthread APIs do any intensive memory operations? I saw the
>>>> benchmark does allocate memory for thread stack, but it should be just
>>>> 8K per thread, so it should not trigger what this patch does. With
>>>> 1024 threads, the thread stacks may get merged into one single VMA (8M
>>>> total), but it may do so even though the patch is not applied.
>>> stress-ng.pthread test code is strange here:
>>>
>>> https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
>>>
>>> Even it allocates its own stack, but that attr is not passed
>>> to pthread_create. So it's still glibc to allocate stack for
>>> pthread which is 8M size. This is why this patch can impact
>>> the stress-ng.pthread testing.
>>
>> Aha, nice catch, I overlooked that.
>>
>>>
>>>
>>> My understanding is this is different regression (if it's a valid
>>> regression). The previous hotspot was in:
>>> deferred_split_huge_page
>>> deferred_split_huge_page
>>> deferred_split_huge_page
>>> spin_lock
>>>
>>> while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
>>> - 55.02% zap_pmd_range.isra.0
>>> - 53.42% __split_huge_pmd
>>> - 51.74% _raw_spin_lock
>>> - 51.73% native_queued_spin_lock_slowpath
>>> + 3.03% asm_sysvec_call_function
>>> - 1.67% __split_huge_pmd_locked
>>> - 0.87% pmdp_invalidate
>>> + 0.86% flush_tlb_mm_range
>>> - 1.60% zap_pte_range
>>> - 1.04% page_remove_rmap
>>> 0.55% __mod_lruvec_page_state
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>>
>>>>> testcase: stress-ng
>>>>> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
>>>>> parameters:
>>>>>
>>>>> nr_threads: 1
>>>>> disk: 1HDD
>>>>> testtime: 60s
>>>>> fs: ext4
>>>>> class: os
>>>>> test: pthread
>>>>> cpufreq_governor: performance
>>>>>
>>>>>
>>>>> In addition to that, the commit also has significant impact on the following tests:
>>>>>
>>>>> +------------------+-----------------------------------------------------------------------------------------------+
>>>>> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression |
>>>>> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory |
>>>>> | test parameters | array_size=50000000 |
>>>>> | | cpufreq_governor=performance |
>>>>> | | iterations=10x |
>>>>> | | loop=100 |
>>>>> | | nr_threads=25% |
>>>>> | | omp=true |
>>>>> +------------------+-----------------------------------------------------------------------------------------------+
>>>>> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression |
>>>>> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
>>>>> | test parameters | cpufreq_governor=performance |
>>>>> | | option_a=Average |
>>>>> | | option_b=Integer |
>>>>> | | test=ramspeed-1.4.3 |
>>>>> +------------------+-----------------------------------------------------------------------------------------------+
>>>>> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
>>>>> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
>>>>> | test parameters | cpufreq_governor=performance |
>>>>> | | option_a=Average |
>>>>> | | option_b=Floating Point |
>>>>> | | test=ramspeed-1.4.3 |
>>>>> +------------------+-----------------------------------------------------------------------------------------------+
>>>>>
>>>>>
>>>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>>>> the same patch/commit), kindly add following tags
>>>>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>>>>> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
>>>>>
>>>>>
>>>>> Details are as below:
>>>>> -------------------------------------------------------------------------------------------------->
>>>>>
>>>>>
>>>>> The kernel config and materials to reproduce are available at:
>>>>> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
>>>>>
>>>>> =========================================================================================
>>>>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>>>>> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
>>>>>
>>>>> commit:
>>>>> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>>>>> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>>>>>
>>>>> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
>>>>> ---------------- ---------------------------
>>>>> %stddev %change %stddev
>>>>> \ | \
>>>>> 13405796 -65.5% 4620124 cpuidle..usage
>>>>> 8.00 +8.2% 8.66 ą 2% iostat.cpu.system
>>>>> 1.61 -60.6% 0.63 iostat.cpu.user
>>>>> 597.50 ą 14% -64.3% 213.50 ą 14% perf-c2c.DRAM.local
>>>>> 1882 ą 14% -74.7% 476.83 ą 7% perf-c2c.HITM.local
>>>>> 3768436 -12.9% 3283395 vmstat.memory.cache
>>>>> 355105 -75.7% 86344 ą 3% vmstat.system.cs
>>>>> 385435 -20.7% 305714 ą 3% vmstat.system.in
>>>>> 1.13 -0.2 0.88 mpstat.cpu.all.irq%
>>>>> 0.29 -0.2 0.10 ą 2% mpstat.cpu.all.soft%
>>>>> 6.76 ą 2% +1.1 7.88 ą 2% mpstat.cpu.all.sys%
>>>>> 1.62 -1.0 0.62 ą 2% mpstat.cpu.all.usr%
>>>>> 2234397 -84.3% 350161 ą 5% stress-ng.pthread.ops
>>>>> 37237 -84.3% 5834 ą 5% stress-ng.pthread.ops_per_sec
>>>>> 294706 ą 2% -68.0% 94191 ą 6% stress-ng.time.involuntary_context_switches
>>>>> 41442 ą 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size
>>>>> 4466457 -83.9% 717053 ą 5% stress-ng.time.minor_page_faults
>>>>
>>>> The larger RSS and fewer page faults are expected.
>>>>
>>>>> 243.33 +13.5% 276.17 ą 3% stress-ng.time.percent_of_cpu_this_job_got
>>>>> 131.64 +27.7% 168.11 ą 3% stress-ng.time.system_time
>>>>> 19.73 -82.1% 3.53 ą 4% stress-ng.time.user_time
>>>>
>>>> Much less user time. And it seems to match the drop of the pthread metric.
>>>>
>>>>> 7715609 -80.2% 1530125 ą 4% stress-ng.time.voluntary_context_switches
>>>>> 76728 -80.8% 14724 ą 4% perf-stat.i.minor-faults
>>>>> 5600408 -61.4% 2160997 ą 5% perf-stat.i.node-loads
>>>>> 8873996 +52.1% 13499744 ą 5% perf-stat.i.node-stores
>>>>> 112409 -81.9% 20305 ą 4% perf-stat.i.page-faults
>>>>> 2.55 +89.6% 4.83 perf-stat.overall.MPKI
>>>>
>>>> Much more TLB misses.
>>>>
>>>>> 1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate%
>>>>> 19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate%
>>>>> 1.70 +56.4% 2.65 perf-stat.overall.cpi
>>>>> 665.84 -17.5% 549.51 ą 2% perf-stat.overall.cycles-between-cache-misses
>>>>> 0.12 ą 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate%
>>>>> 0.08 ą 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate%
>>>>> 59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate%
>>>>> 1278 +86.1% 2379 ą 2% perf-stat.overall.instructions-per-iTLB-miss
>>>>> 0.59 -36.1% 0.38 perf-stat.overall.ipc
>>>>
>>>> Worse IPC and CPI.
>>>>
>>>>> 2.078e+09 -48.3% 1.074e+09 ą 4% perf-stat.ps.branch-instructions
>>>>> 31292687 -61.2% 12133349 ą 2% perf-stat.ps.branch-misses
>>>>> 26057291 -5.9% 24512034 ą 4% perf-stat.ps.cache-misses
>>>>> 1.353e+08 -58.6% 56072195 ą 4% perf-stat.ps.cache-references
>>>>> 365254 -75.8% 88464 ą 3% perf-stat.ps.context-switches
>>>>> 1.735e+10 -22.4% 1.346e+10 ą 2% perf-stat.ps.cpu-cycles
>>>>> 60838 -79.1% 12727 ą 6% perf-stat.ps.cpu-migrations
>>>>> 3056601 ą 4% -81.5% 565354 ą 4% perf-stat.ps.dTLB-load-misses
>>>>> 2.636e+09 -50.7% 1.3e+09 ą 4% perf-stat.ps.dTLB-loads
>>>>> 1155253 ą 2% -83.0% 196581 ą 5% perf-stat.ps.dTLB-store-misses
>>>>> 1.473e+09 -57.4% 6.268e+08 ą 3% perf-stat.ps.dTLB-stores
>>>>> 7997726 -73.3% 2131477 ą 3% perf-stat.ps.iTLB-load-misses
>>>>> 5521346 -74.3% 1418623 ą 2% perf-stat.ps.iTLB-loads
>>>>> 1.023e+10 -50.4% 5.073e+09 ą 4% perf-stat.ps.instructions
>>>>> 75671 -80.9% 14479 ą 4% perf-stat.ps.minor-faults
>>>>> 5549722 -61.4% 2141750 ą 4% perf-stat.ps.node-loads
>>>>> 8769156 +51.6% 13296579 ą 5% perf-stat.ps.node-stores
>>>>> 110795 -82.0% 19977 ą 4% perf-stat.ps.page-faults
>>>>> 6.482e+11 -50.7% 3.197e+11 ą 4% perf-stat.total.instructions
>>>>> 0.00 ą 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>>>>> 0.01 ą 18% +8373.1% 0.73 ą 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>>>>> 0.01 ą 16% +4600.0% 0.38 ą 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
>>>>
>>>> More time spent in madvise and munmap. but I'm not sure whether this
>>>> is caused by tearing down the address space when exiting the test. If
>>>> so it should not count in the regression.
>>> It's not for the whole address space tearing down. It's for pthread
>>> stack tearing down when pthread exit (can be treated as address space
>>> tearing down? I suppose so).
>>>
>>> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
>>> https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
>>
>> It explains the problem. The madvise() does have some extra overhead
>> for handling THP (splitting pmd, deferred split queue, etc).
>>
>>>
>>> Another thing is whether it's worthy to make stack use THP? It may be
>>> useful for some apps which need large stack size?
>>
>> Kernel actually doesn't apply THP to stack (see
>> vma_is_temporary_stack()). But kernel can't know whether the VMA is
>> stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc
>> doesn't set the proper flags to tell kernel the area is stack, kernel
>> just treats it as normal anonymous area. So glibc should set up stack
>> properly IMHO.
>
> If I read the code correctly, nptl allocates stack by the below code:
>
> mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE,
> MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
>
> See https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L563
>
> The MAP_STACK is used, but it is a no-op on Linux. So the alternative
> is to make MAP_STACK useful on Linux instead of changing glibc. But
> the blast radius seems much wider.
Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
filter out of the MAP_STACK mapping based on this patch. The regression
in stress-ng.pthread was gone. I suppose this is kind of safe because
the madvise call is only applied to glibc allocated stack.
But what I am not sure was whether it's worthy to do such kind of change
as the regression only is seen obviously in micro-benchmark. No evidence
showed the other regressionsin this report is related with madvise. At
least from the perf statstics. Need to check more on stream/ramspeed.
Thanks.
Regards
Yin, Fengwei
>
>>
>>>
>>>
>>> Regards
>>> Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-21 0:58 ` Yin Fengwei
@ 2023-12-21 1:02 ` Yin Fengwei
2023-12-21 4:49 ` Matthew Wilcox
2023-12-21 13:39 ` Yin, Fengwei
2 siblings, 0 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-12-21 1:02 UTC (permalink / raw)
To: Yang Shi
Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang
>>>>
>>>> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
>>>> https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
>>>
>>> It explains the problem. The madvise() does have some extra overhead
>>> for handling THP (splitting pmd, deferred split queue, etc).
>>>
>>>>
>>>> Another thing is whether it's worthy to make stack use THP? It may be
>>>> useful for some apps which need large stack size?
>>>
>>> Kernel actually doesn't apply THP to stack (see
>>> vma_is_temporary_stack()). But kernel can't know whether the VMA is
>>> stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc
>>> doesn't set the proper flags to tell kernel the area is stack, kernel
>>> just treats it as normal anonymous area. So glibc should set up stack
>>> properly IMHO.
>>
>> If I read the code correctly, nptl allocates stack by the below code:
>>
>> mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE,
>> MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
>>
>> See
>> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L563
>>
>> The MAP_STACK is used, but it is a no-op on Linux. So the alternative
>> is to make MAP_STACK useful on Linux instead of changing glibc. But
>> the blast radius seems much wider.
> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
> filter out of the MAP_STACK mapping based on this patch. The regression
> in stress-ng.pthread was gone. I suppose this is kind of safe because
> the madvise call is only applied to glibc allocated stack.
The patch I tested against stress-ng.pthread:
diff --git a/mm/mmap.c b/mm/mmap.c
index b78e83d351d2..1fd510aef82e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1829,7 +1829,8 @@ get_unmapped_area(struct file *file, unsigned long
addr, unsigned long len,
*/
pgoff = 0;
get_area = shmem_get_unmapped_area;
- } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+ } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+ !(flags & MAP_STACK)) {
/* Ensures that larger anonymous mappings are THP
aligned. */
get_area = thp_get_unmapped_area;
}
>
>
> But what I am not sure was whether it's worthy to do such kind of change
> as the regression only is seen obviously in micro-benchmark. No evidence
> showed the other regressionsin this report is related with madvise. At
> least from the perf statstics. Need to check more on stream/ramspeed.
> Thanks.
>
>
> Regards
> Yin, Fengwei
>
>>
>>>
>>>>
>>>>
>>>> Regards
>>>> Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-21 0:58 ` Yin Fengwei
2023-12-21 1:02 ` Yin Fengwei
@ 2023-12-21 4:49 ` Matthew Wilcox
2023-12-21 4:58 ` Yin Fengwei
2023-12-21 18:07 ` Yang Shi
2023-12-21 13:39 ` Yin, Fengwei
2 siblings, 2 replies; 24+ messages in thread
From: Matthew Wilcox @ 2023-12-21 4:49 UTC (permalink / raw)
To: Yin Fengwei
Cc: Yang Shi, kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Christopher Lameter,
ying.huang, feng.tang
On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
> filter out of the MAP_STACK mapping based on this patch. The regression
> in stress-ng.pthread was gone. I suppose this is kind of safe because
> the madvise call is only applied to glibc allocated stack.
>
>
> But what I am not sure was whether it's worthy to do such kind of change
> as the regression only is seen obviously in micro-benchmark. No evidence
> showed the other regressionsin this report is related with madvise. At
> least from the perf statstics. Need to check more on stream/ramspeed.
FWIW, we had a customer report a significant performance problem when
inadvertently using 2MB pages for stacks. They were able to avoid it by
using 2044KiB sized stacks ...
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-21 4:49 ` Matthew Wilcox
@ 2023-12-21 4:58 ` Yin Fengwei
2023-12-21 18:07 ` Yang Shi
1 sibling, 0 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-12-21 4:58 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Yang Shi, kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Christopher Lameter,
ying.huang, feng.tang
On 2023/12/21 12:49, Matthew Wilcox wrote:
> On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
>> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
>> filter out of the MAP_STACK mapping based on this patch. The regression
>> in stress-ng.pthread was gone. I suppose this is kind of safe because
>> the madvise call is only applied to glibc allocated stack.
>>
>>
>> But what I am not sure was whether it's worthy to do such kind of change
>> as the regression only is seen obviously in micro-benchmark. No evidence
>> showed the other regressionsin this report is related with madvise. At
>> least from the perf statstics. Need to check more on stream/ramspeed.
>
> FWIW, we had a customer report a significant performance problem when
> inadvertently using 2MB pages for stacks. They were able to avoid it by
> using 2044KiB sized stacks ...
Looks like related with this regression. So we may need to consider
avoiding THP for stack.
Regards
Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-21 0:58 ` Yin Fengwei
2023-12-21 1:02 ` Yin Fengwei
2023-12-21 4:49 ` Matthew Wilcox
@ 2023-12-21 13:39 ` Yin, Fengwei
2023-12-21 18:11 ` Yang Shi
2 siblings, 1 reply; 24+ messages in thread
From: Yin, Fengwei @ 2023-12-21 13:39 UTC (permalink / raw)
To: Yang Shi
Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang
On 12/21/2023 8:58 AM, Yin Fengwei wrote:
> But what I am not sure was whether it's worthy to do such kind of change
> as the regression only is seen obviously in micro-benchmark. No evidence
> showed the other regressionsin this report is related with madvise. At
> least from the perf statstics. Need to check more on stream/ramspeed.
> Thanks.
With debugging patch (filter out the stack mapping from THP aligned),
the result of stream can be restored to around 2%:
commit:
30749e6fbb3d391a7939ac347e9612afe8c26e94
1111d46b5cbad57486e7a3fab75888accac2f072
89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with
debugging patch
30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589
---------------- --------------------------- ---------------------------
350993 -15.6% 296081 ± 2% -1.5% 345689
stream.add_bandwidth_MBps
349830 -16.1% 293492 ± 2% -2.3% 341860 ±
2% stream.add_bandwidth_MBps_harmonicMean
333973 -20.5% 265439 ± 3% -1.7% 328403
stream.copy_bandwidth_MBps
332930 -21.7% 260548 ± 3% -2.5% 324711 ±
2% stream.copy_bandwidth_MBps_harmonicMean
302788 -16.2% 253817 ± 2% -1.4% 298421
stream.scale_bandwidth_MBps
302157 -17.1% 250577 ± 2% -2.0% 296054
stream.scale_bandwidth_MBps_harmonicMean
339047 -12.1% 298061 -1.4% 334206
stream.triad_bandwidth_MBps
338186 -12.4% 296218 -2.0% 331469
stream.triad_bandwidth_MBps_harmonicMean
The regression of ramspeed is still there.
Regards
Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-21 4:49 ` Matthew Wilcox
2023-12-21 4:58 ` Yin Fengwei
@ 2023-12-21 18:07 ` Yang Shi
2023-12-21 18:14 ` Matthew Wilcox
1 sibling, 1 reply; 24+ messages in thread
From: Yang Shi @ 2023-12-21 18:07 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Yin Fengwei, kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Christopher Lameter,
ying.huang, feng.tang
On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
> > Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
> > filter out of the MAP_STACK mapping based on this patch. The regression
> > in stress-ng.pthread was gone. I suppose this is kind of safe because
> > the madvise call is only applied to glibc allocated stack.
> >
> >
> > But what I am not sure was whether it's worthy to do such kind of change
> > as the regression only is seen obviously in micro-benchmark. No evidence
> > showed the other regressionsin this report is related with madvise. At
> > least from the perf statstics. Need to check more on stream/ramspeed.
>
> FWIW, we had a customer report a significant performance problem when
> inadvertently using 2MB pages for stacks. They were able to avoid it by
> using 2044KiB sized stacks ...
Thanks for the report. This provided more justification regarding
honoring MAP_STACK on Linux. Some applications, for example, pthread,
just allocate a fixed size area for stack. This confuses kernel
because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP.
But I'm still a little confused by why THP for stack could result in
significant performance problems. Unless the applications resize the
stack quite often.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-21 13:39 ` Yin, Fengwei
@ 2023-12-21 18:11 ` Yang Shi
2023-12-22 1:13 ` Yin, Fengwei
0 siblings, 1 reply; 24+ messages in thread
From: Yang Shi @ 2023-12-21 18:11 UTC (permalink / raw)
To: Yin, Fengwei
Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang
On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 12/21/2023 8:58 AM, Yin Fengwei wrote:
> > But what I am not sure was whether it's worthy to do such kind of change
> > as the regression only is seen obviously in micro-benchmark. No evidence
> > showed the other regressionsin this report is related with madvise. At
> > least from the perf statstics. Need to check more on stream/ramspeed.
> > Thanks.
>
> With debugging patch (filter out the stack mapping from THP aligned),
> the result of stream can be restored to around 2%:
>
> commit:
> 30749e6fbb3d391a7939ac347e9612afe8c26e94
> 1111d46b5cbad57486e7a3fab75888accac2f072
> 89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with
> debugging patch
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589
> ---------------- --------------------------- ---------------------------
> 350993 -15.6% 296081 ± 2% -1.5% 345689
> stream.add_bandwidth_MBps
> 349830 -16.1% 293492 ± 2% -2.3% 341860 ±
> 2% stream.add_bandwidth_MBps_harmonicMean
> 333973 -20.5% 265439 ± 3% -1.7% 328403
> stream.copy_bandwidth_MBps
> 332930 -21.7% 260548 ± 3% -2.5% 324711 ±
> 2% stream.copy_bandwidth_MBps_harmonicMean
> 302788 -16.2% 253817 ± 2% -1.4% 298421
> stream.scale_bandwidth_MBps
> 302157 -17.1% 250577 ± 2% -2.0% 296054
> stream.scale_bandwidth_MBps_harmonicMean
> 339047 -12.1% 298061 -1.4% 334206
> stream.triad_bandwidth_MBps
> 338186 -12.4% 296218 -2.0% 331469
> stream.triad_bandwidth_MBps_harmonicMean
>
>
> The regression of ramspeed is still there.
Thanks for the debugging patch and the test. If no one has objection
to honor MAP_STACK, I'm going to come up with a more formal patch.
Even though thp_get_unmapped_area() is not called for MAP_STACK, stack
area still may be allocated at 2M aligned address theoretically. And
it may be worse with multi-sized THP, for 1M.
Do you have any instructions regarding how to run ramspeed? Anyway I
may not have time debug it until after holidays.
>
>
> Regards
> Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-21 18:07 ` Yang Shi
@ 2023-12-21 18:14 ` Matthew Wilcox
2023-12-22 1:06 ` Yin, Fengwei
0 siblings, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2023-12-21 18:14 UTC (permalink / raw)
To: Yang Shi
Cc: Yin Fengwei, kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Christopher Lameter,
ying.huang, feng.tang
On Thu, Dec 21, 2023 at 10:07:09AM -0800, Yang Shi wrote:
> On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
> > > Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
> > > filter out of the MAP_STACK mapping based on this patch. The regression
> > > in stress-ng.pthread was gone. I suppose this is kind of safe because
> > > the madvise call is only applied to glibc allocated stack.
> > >
> > >
> > > But what I am not sure was whether it's worthy to do such kind of change
> > > as the regression only is seen obviously in micro-benchmark. No evidence
> > > showed the other regressionsin this report is related with madvise. At
> > > least from the perf statstics. Need to check more on stream/ramspeed.
> >
> > FWIW, we had a customer report a significant performance problem when
> > inadvertently using 2MB pages for stacks. They were able to avoid it by
> > using 2044KiB sized stacks ...
>
> Thanks for the report. This provided more justification regarding
> honoring MAP_STACK on Linux. Some applications, for example, pthread,
> just allocate a fixed size area for stack. This confuses kernel
> because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP.
>
> But I'm still a little confused by why THP for stack could result in
> significant performance problems. Unless the applications resize the
> stack quite often.
We didn't delve into what was causing the problem, only that it was
happening. The application had many threads, so it could have been as
simple as consuming all the available THP and leaving fewer available
for other uses. Or it could have been a memory consumption problem;
maybe the app would only have been using 16-32kB per thread but was
now using 2MB per thread and if there were, say, 100 threads, that's an
extra 199MB of memory in use.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-21 18:14 ` Matthew Wilcox
@ 2023-12-22 1:06 ` Yin, Fengwei
2023-12-22 2:23 ` Huang, Ying
0 siblings, 1 reply; 24+ messages in thread
From: Yin, Fengwei @ 2023-12-22 1:06 UTC (permalink / raw)
To: Matthew Wilcox, Yang Shi
Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Christopher Lameter,
ying.huang, feng.tang
On 12/22/2023 2:14 AM, Matthew Wilcox wrote:
> On Thu, Dec 21, 2023 at 10:07:09AM -0800, Yang Shi wrote:
>> On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote:
>>>
>>> On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
>>>> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
>>>> filter out of the MAP_STACK mapping based on this patch. The regression
>>>> in stress-ng.pthread was gone. I suppose this is kind of safe because
>>>> the madvise call is only applied to glibc allocated stack.
>>>>
>>>>
>>>> But what I am not sure was whether it's worthy to do such kind of change
>>>> as the regression only is seen obviously in micro-benchmark. No evidence
>>>> showed the other regressionsin this report is related with madvise. At
>>>> least from the perf statstics. Need to check more on stream/ramspeed.
>>>
>>> FWIW, we had a customer report a significant performance problem when
>>> inadvertently using 2MB pages for stacks. They were able to avoid it by
>>> using 2044KiB sized stacks ...
>>
>> Thanks for the report. This provided more justification regarding
>> honoring MAP_STACK on Linux. Some applications, for example, pthread,
>> just allocate a fixed size area for stack. This confuses kernel
>> because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP.
>>
>> But I'm still a little confused by why THP for stack could result in
>> significant performance problems. Unless the applications resize the
>> stack quite often.
>
> We didn't delve into what was causing the problem, only that it was
> happening. The application had many threads, so it could have been as
> simple as consuming all the available THP and leaving fewer available
> for other uses. Or it could have been a memory consumption problem;
> maybe the app would only have been using 16-32kB per thread but was
> now using 2MB per thread and if there were, say, 100 threads, that's an
> extra 199MB of memory in use.
One thing I know is related with the memory zeroing. This is from
the perf data in this report:
0.00 +16.7 16.69 ± 7%
perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
Zeroing 2M memory costs much more CPU than zeroing 16-32KB memory if
there are many threads.
Regards
Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-21 18:11 ` Yang Shi
@ 2023-12-22 1:13 ` Yin, Fengwei
2024-01-04 1:32 ` Yang Shi
0 siblings, 1 reply; 24+ messages in thread
From: Yin, Fengwei @ 2023-12-22 1:13 UTC (permalink / raw)
To: Yang Shi
Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang
On 12/22/2023 2:11 AM, Yang Shi wrote:
> On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>
>>
>>
>> On 12/21/2023 8:58 AM, Yin Fengwei wrote:
>>> But what I am not sure was whether it's worthy to do such kind of change
>>> as the regression only is seen obviously in micro-benchmark. No evidence
>>> showed the other regressionsin this report is related with madvise. At
>>> least from the perf statstics. Need to check more on stream/ramspeed.
>>> Thanks.
>>
>> With debugging patch (filter out the stack mapping from THP aligned),
>> the result of stream can be restored to around 2%:
>>
>> commit:
>> 30749e6fbb3d391a7939ac347e9612afe8c26e94
>> 1111d46b5cbad57486e7a3fab75888accac2f072
>> 89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with
>> debugging patch
>>
>> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589
>> ---------------- --------------------------- ---------------------------
>> 350993 -15.6% 296081 ± 2% -1.5% 345689
>> stream.add_bandwidth_MBps
>> 349830 -16.1% 293492 ± 2% -2.3% 341860 ±
>> 2% stream.add_bandwidth_MBps_harmonicMean
>> 333973 -20.5% 265439 ± 3% -1.7% 328403
>> stream.copy_bandwidth_MBps
>> 332930 -21.7% 260548 ± 3% -2.5% 324711 ±
>> 2% stream.copy_bandwidth_MBps_harmonicMean
>> 302788 -16.2% 253817 ± 2% -1.4% 298421
>> stream.scale_bandwidth_MBps
>> 302157 -17.1% 250577 ± 2% -2.0% 296054
>> stream.scale_bandwidth_MBps_harmonicMean
>> 339047 -12.1% 298061 -1.4% 334206
>> stream.triad_bandwidth_MBps
>> 338186 -12.4% 296218 -2.0% 331469
>> stream.triad_bandwidth_MBps_harmonicMean
>>
>>
>> The regression of ramspeed is still there.
>
> Thanks for the debugging patch and the test. If no one has objection
> to honor MAP_STACK, I'm going to come up with a more formal patch.
> Even though thp_get_unmapped_area() is not called for MAP_STACK, stack
> area still may be allocated at 2M aligned address theoretically. And
> it may be worse with multi-sized THP, for 1M.
Right. Filtering out MAP_STACK can't make sure no THP for stack. Just
reduce the possibility of using THP for stack.
>
> Do you have any instructions regarding how to run ramspeed? Anyway I
> may not have time debug it until after holidays.
0Day leverages phoronix-test-suite to run ramspeed. So I don't have
direct answer here.
I suppose we could check the configuration of ramspeed in phoronix-test-
suite to understand what's the build options and command options to run
ramspeed:
https://openbenchmarking.org/test/pts/ramspeed
Regards
Yin, Fengwei
>
>>
>>
>> Regards
>> Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-22 1:06 ` Yin, Fengwei
@ 2023-12-22 2:23 ` Huang, Ying
0 siblings, 0 replies; 24+ messages in thread
From: Huang, Ying @ 2023-12-22 2:23 UTC (permalink / raw)
To: Yin, Fengwei
Cc: Matthew Wilcox, Yang Shi, kernel test robot, Rik van Riel,
oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
Christopher Lameter, feng.tang
"Yin, Fengwei" <fengwei.yin@intel.com> writes:
> On 12/22/2023 2:14 AM, Matthew Wilcox wrote:
>> On Thu, Dec 21, 2023 at 10:07:09AM -0800, Yang Shi wrote:
>>> On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote:
>>>>
>>>> On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
>>>>> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
>>>>> filter out of the MAP_STACK mapping based on this patch. The regression
>>>>> in stress-ng.pthread was gone. I suppose this is kind of safe because
>>>>> the madvise call is only applied to glibc allocated stack.
>>>>>
>>>>>
>>>>> But what I am not sure was whether it's worthy to do such kind of change
>>>>> as the regression only is seen obviously in micro-benchmark. No evidence
>>>>> showed the other regressionsin this report is related with madvise. At
>>>>> least from the perf statstics. Need to check more on stream/ramspeed.
>>>>
>>>> FWIW, we had a customer report a significant performance problem when
>>>> inadvertently using 2MB pages for stacks. They were able to avoid it by
>>>> using 2044KiB sized stacks ...
>>>
>>> Thanks for the report. This provided more justification regarding
>>> honoring MAP_STACK on Linux. Some applications, for example, pthread,
>>> just allocate a fixed size area for stack. This confuses kernel
>>> because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP.
>>>
>>> But I'm still a little confused by why THP for stack could result in
>>> significant performance problems. Unless the applications resize the
>>> stack quite often.
>> We didn't delve into what was causing the problem, only that it was
>> happening. The application had many threads, so it could have been as
>> simple as consuming all the available THP and leaving fewer available
>> for other uses. Or it could have been a memory consumption problem;
>> maybe the app would only have been using 16-32kB per thread but was
>> now using 2MB per thread and if there were, say, 100 threads, that's an
>> extra 199MB of memory in use.
> One thing I know is related with the memory zeroing. This is from
> the perf data in this report:
>
> 0.00 +16.7 16.69 ± 7%
> perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>
> Zeroing 2M memory costs much more CPU than zeroing 16-32KB memory if
> there are many threads.
Using 2M stack may hurt performance of short-live threads with shallow
stack depth. Imagine a network server which creates a new thread for
each incoming connection. I understand that the performance will not be
great in this way anyway. IIUC we should not make it too bad.
But, whether this is import depends on whether the use case is
important. TBH, I don't know that.
--
Best Regards,
Huang, Ying
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2023-12-22 1:13 ` Yin, Fengwei
@ 2024-01-04 1:32 ` Yang Shi
2024-01-04 8:18 ` Yin Fengwei
0 siblings, 1 reply; 24+ messages in thread
From: Yang Shi @ 2024-01-04 1:32 UTC (permalink / raw)
To: Yin, Fengwei
Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang
On Thu, Dec 21, 2023 at 5:13 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 12/22/2023 2:11 AM, Yang Shi wrote:
> > On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> >>
> >>
> >>
> >> On 12/21/2023 8:58 AM, Yin Fengwei wrote:
> >>> But what I am not sure was whether it's worthy to do such kind of change
> >>> as the regression only is seen obviously in micro-benchmark. No evidence
> >>> showed the other regressionsin this report is related with madvise. At
> >>> least from the perf statstics. Need to check more on stream/ramspeed.
> >>> Thanks.
> >>
> >> With debugging patch (filter out the stack mapping from THP aligned),
> >> the result of stream can be restored to around 2%:
> >>
> >> commit:
> >> 30749e6fbb3d391a7939ac347e9612afe8c26e94
> >> 1111d46b5cbad57486e7a3fab75888accac2f072
> >> 89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with
> >> debugging patch
> >>
> >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589
> >> ---------------- --------------------------- ---------------------------
> >> 350993 -15.6% 296081 ± 2% -1.5% 345689
> >> stream.add_bandwidth_MBps
> >> 349830 -16.1% 293492 ± 2% -2.3% 341860 ±
> >> 2% stream.add_bandwidth_MBps_harmonicMean
> >> 333973 -20.5% 265439 ± 3% -1.7% 328403
> >> stream.copy_bandwidth_MBps
> >> 332930 -21.7% 260548 ± 3% -2.5% 324711 ±
> >> 2% stream.copy_bandwidth_MBps_harmonicMean
> >> 302788 -16.2% 253817 ± 2% -1.4% 298421
> >> stream.scale_bandwidth_MBps
> >> 302157 -17.1% 250577 ± 2% -2.0% 296054
> >> stream.scale_bandwidth_MBps_harmonicMean
> >> 339047 -12.1% 298061 -1.4% 334206
> >> stream.triad_bandwidth_MBps
> >> 338186 -12.4% 296218 -2.0% 331469
> >> stream.triad_bandwidth_MBps_harmonicMean
> >>
> >>
> >> The regression of ramspeed is still there.
> >
> > Thanks for the debugging patch and the test. If no one has objection
> > to honor MAP_STACK, I'm going to come up with a more formal patch.
> > Even though thp_get_unmapped_area() is not called for MAP_STACK, stack
> > area still may be allocated at 2M aligned address theoretically. And
> > it may be worse with multi-sized THP, for 1M.
> Right. Filtering out MAP_STACK can't make sure no THP for stack. Just
> reduce the possibility of using THP for stack.
Can you please help test the below patch?
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 40d94411d492..dc7048824be8 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
+ _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
arch_calc_vm_flag_bits(flags);
}
But I can't reproduce the pthread regression on my aarch64 VM. It
might be due to the guard stack (the 64K guard stack is at 2M aligned,
the 8M stack is right next to it which starts at 2M + 64K). But I can
see the stack area is not THP eligible anymore with this patch. See:
fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
Size: 8192 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 12 kB
Pss: 12 kB
Pss_Dirty: 12 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 12 kB
Referenced: 12 kB
Anonymous: 12 kB
KSM: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 0 kB
THPeligible: 0
VmFlags: rd wr mr mw me ac nh
The "nh" flag is set.
>
> >
> > Do you have any instructions regarding how to run ramspeed? Anyway I
> > may not have time debug it until after holidays.
> 0Day leverages phoronix-test-suite to run ramspeed. So I don't have
> direct answer here.
>
> I suppose we could check the configuration of ramspeed in phoronix-test-
> suite to understand what's the build options and command options to run
> ramspeed:
> https://openbenchmarking.org/test/pts/ramspeed
Downloaded the test suite. It looks phronix just runs test 3 (int) and
6 (float). They basically does 4 sub tests to benchmark memory
bandwidth:
* copy
* scale copy
* add copy
* triad copy
The source buffer is initialized (page fault is triggered), but the
destination area is not. So the page fault + page clear time is
accounted to the result. Clearing huge page may take a little bit more
time. But I didn't see noticeable regression on my aarch64 VM either.
Anyway I'm supposed such test should be run with THP off.
>
>
> Regards
> Yin, Fengwei
>
> >
> >>
> >>
> >> Regards
> >> Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2024-01-04 1:32 ` Yang Shi
@ 2024-01-04 8:18 ` Yin Fengwei
2024-01-04 8:39 ` Oliver Sang
0 siblings, 1 reply; 24+ messages in thread
From: Yin Fengwei @ 2024-01-04 8:18 UTC (permalink / raw)
To: Yang Shi
Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang
On 2024/1/4 09:32, Yang Shi wrote:
> On Thu, Dec 21, 2023 at 5:13 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>
>>
>>
>> On 12/22/2023 2:11 AM, Yang Shi wrote:
>>> On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 12/21/2023 8:58 AM, Yin Fengwei wrote:
>>>>> But what I am not sure was whether it's worthy to do such kind of change
>>>>> as the regression only is seen obviously in micro-benchmark. No evidence
>>>>> showed the other regressionsin this report is related with madvise. At
>>>>> least from the perf statstics. Need to check more on stream/ramspeed.
>>>>> Thanks.
>>>>
>>>> With debugging patch (filter out the stack mapping from THP aligned),
>>>> the result of stream can be restored to around 2%:
>>>>
>>>> commit:
>>>> 30749e6fbb3d391a7939ac347e9612afe8c26e94
>>>> 1111d46b5cbad57486e7a3fab75888accac2f072
>>>> 89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with
>>>> debugging patch
>>>>
>>>> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589
>>>> ---------------- --------------------------- ---------------------------
>>>> 350993 -15.6% 296081 ± 2% -1.5% 345689
>>>> stream.add_bandwidth_MBps
>>>> 349830 -16.1% 293492 ± 2% -2.3% 341860 ±
>>>> 2% stream.add_bandwidth_MBps_harmonicMean
>>>> 333973 -20.5% 265439 ± 3% -1.7% 328403
>>>> stream.copy_bandwidth_MBps
>>>> 332930 -21.7% 260548 ± 3% -2.5% 324711 ±
>>>> 2% stream.copy_bandwidth_MBps_harmonicMean
>>>> 302788 -16.2% 253817 ± 2% -1.4% 298421
>>>> stream.scale_bandwidth_MBps
>>>> 302157 -17.1% 250577 ± 2% -2.0% 296054
>>>> stream.scale_bandwidth_MBps_harmonicMean
>>>> 339047 -12.1% 298061 -1.4% 334206
>>>> stream.triad_bandwidth_MBps
>>>> 338186 -12.4% 296218 -2.0% 331469
>>>> stream.triad_bandwidth_MBps_harmonicMean
>>>>
>>>>
>>>> The regression of ramspeed is still there.
>>>
>>> Thanks for the debugging patch and the test. If no one has objection
>>> to honor MAP_STACK, I'm going to come up with a more formal patch.
>>> Even though thp_get_unmapped_area() is not called for MAP_STACK, stack
>>> area still may be allocated at 2M aligned address theoretically. And
>>> it may be worse with multi-sized THP, for 1M.
>> Right. Filtering out MAP_STACK can't make sure no THP for stack. Just
>> reduce the possibility of using THP for stack.
>
> Can you please help test the below patch?
I can't access the testing box now. Oliver will help to test your patch.
Regards
Yin, Fengwei
>
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 40d94411d492..dc7048824be8 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> arch_calc_vm_flag_bits(flags);
> }
>
> But I can't reproduce the pthread regression on my aarch64 VM. It
> might be due to the guard stack (the 64K guard stack is at 2M aligned,
> the 8M stack is right next to it which starts at 2M + 64K). But I can
> see the stack area is not THP eligible anymore with this patch. See:
>
> fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
> Size: 8192 kB
> KernelPageSize: 4 kB
> MMUPageSize: 4 kB
> Rss: 12 kB
> Pss: 12 kB
> Pss_Dirty: 12 kB
> Shared_Clean: 0 kB
> Shared_Dirty: 0 kB
> Private_Clean: 0 kB
> Private_Dirty: 12 kB
> Referenced: 12 kB
> Anonymous: 12 kB
> KSM: 0 kB
> LazyFree: 0 kB
> AnonHugePages: 0 kB
> ShmemPmdMapped: 0 kB
> FilePmdMapped: 0 kB
> Shared_Hugetlb: 0 kB
> Private_Hugetlb: 0 kB
> Swap: 0 kB
> SwapPss: 0 kB
> Locked: 0 kB
> THPeligible: 0
> VmFlags: rd wr mr mw me ac nh
>
> The "nh" flag is set.
>
>>
>>>
>>> Do you have any instructions regarding how to run ramspeed? Anyway I
>>> may not have time debug it until after holidays.
>> 0Day leverages phoronix-test-suite to run ramspeed. So I don't have
>> direct answer here.
>>
>> I suppose we could check the configuration of ramspeed in phoronix-test-
>> suite to understand what's the build options and command options to run
>> ramspeed:
>> https://openbenchmarking.org/test/pts/ramspeed
>
> Downloaded the test suite. It looks phronix just runs test 3 (int) and
> 6 (float). They basically does 4 sub tests to benchmark memory
> bandwidth:
>
> * copy
> * scale copy
> * add copy
> * triad copy
>
> The source buffer is initialized (page fault is triggered), but the
> destination area is not. So the page fault + page clear time is
> accounted to the result. Clearing huge page may take a little bit more
> time. But I didn't see noticeable regression on my aarch64 VM either.
> Anyway I'm supposed such test should be run with THP off.
>
>>
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>>>
>>>>
>>>> Regards
>>>> Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2024-01-04 8:18 ` Yin Fengwei
@ 2024-01-04 8:39 ` Oliver Sang
2024-01-05 9:29 ` Oliver Sang
0 siblings, 1 reply; 24+ messages in thread
From: Oliver Sang @ 2024-01-04 8:39 UTC (permalink / raw)
To: Yin Fengwei
Cc: Yang Shi, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang, oliver.sang
hi, Fengwei, hi, Yang Shi,
On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote:
>
> On 2024/1/4 09:32, Yang Shi wrote:
...
> > Can you please help test the below patch?
> I can't access the testing box now. Oliver will help to test your patch.
>
since now the commit-id of
'mm: align larger anonymous mappings on THP boundaries'
in linux-next/master is efa7df3e3bb5d
I applied the patch like below:
* d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
* efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries
* 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi
our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression
so far, I will test d8d7b1dae6f03 for all these tests. Thanks
commit d8d7b1dae6f0311d528b289cda7b317520f9a984
Author: 0day robot <lkp@intel.com>
Date: Thu Jan 4 12:51:10 2024 +0800
fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 40d94411d4920..91197bd387730 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
+ _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
arch_calc_vm_flag_bits(flags);
}
>
> Regards
> Yin, Fengwei
>
> >
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 40d94411d492..dc7048824be8 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> > arch_calc_vm_flag_bits(flags);
> > }
> >
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2024-01-04 8:39 ` Oliver Sang
@ 2024-01-05 9:29 ` Oliver Sang
2024-01-05 14:52 ` Yin, Fengwei
2024-01-05 18:49 ` Yang Shi
0 siblings, 2 replies; 24+ messages in thread
From: Oliver Sang @ 2024-01-05 9:29 UTC (permalink / raw)
To: Yang Shi
Cc: Yin Fengwei, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang, oliver.sang
[-- Attachment #1: Type: text/plain, Size: 16841 bytes --]
hi, Yang Shi,
On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote:
> hi, Fengwei, hi, Yang Shi,
>
> On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote:
> >
> > On 2024/1/4 09:32, Yang Shi wrote:
>
> ...
>
> > > Can you please help test the below patch?
> > I can't access the testing box now. Oliver will help to test your patch.
> >
>
> since now the commit-id of
> 'mm: align larger anonymous mappings on THP boundaries'
> in linux-next/master is efa7df3e3bb5d
> I applied the patch like below:
>
> * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
> * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries
> * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi
>
> our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression
> so far, I will test d8d7b1dae6f03 for all these tests. Thanks
>
we got 12 regressions and 1 improvement results for efa7df3e3b so far.
(4 regressions are just similar to what we reported for 1111d46b5c).
by your patch, 6 of those regressions are fixed, others are not impacted.
below is a summary:
No. testsuite test status-on-efa7df3e3b fix-by-d8d7b1dae6 ?
=== ========= ==== ==================== ===================
(1) stress-ng numa regression NO
(2) pthread regression yes (on a Ice Lake server)
(3) pthread regression yes (on a Cascade Lake desktop)
(4) will-it-scale malloc1 regression NO
(5) page_fault1 improvement no (so still improvement)
(6) vm-scalability anon-w-seq-mt regression yes
(7) stream nr_threads=25% regression yes
(8) nr_threads=50% regression yes
(9) phoronix osbench.CreateThreads regression yes (on a Cascade Lake server)
(10) ramspeed.Add.Integer regression NO (and below 3, on a Coffee Lake desktop)
(11) ramspeed.Average.FloatingPoint regression NO
(12) ramspeed.Triad.Integer regression NO
(13) ramspeed.Average.Integer regression NO
below are details, for those regressions not fixed by d8d7b1dae6, attached
full comparison.
(1) detail comparison is attached as 'stress-ng-regression'
Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
251.12 -48.2% 130.00 -47.9% 130.75 stress-ng.numa.ops
4.10 -49.4% 2.08 -49.2% 2.09 stress-ng.numa.ops_per_sec
(2)
Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/pthread/stress-ng/60s
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
3272223 -87.8% 400430 +0.5% 3287322 stress-ng.pthread.ops
54516 -87.8% 6664 +0.5% 54772 stress-ng.pthread.ops_per_sec
(3)
Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with memory: 128G
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
2250845 -85.2% 332370 ± 6% -0.8% 2232820 stress-ng.pthread.ops
37510 -85.2% 5538 ± 6% -0.8% 37209 stress-ng.pthread.ops_per_sec
(4) full comparison attached as 'will-it-scale-regression'
Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
10994 -86.7% 1466 -86.7% 1460 will-it-scale.per_process_ops
1231431 -86.7% 164315 -86.7% 163624 will-it-scale.workload
(5)
Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault1/will-it-scale
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.224.threads
56.06 +13.3% 63.53 +13.8% 63.81 will-it-scale.224.threads_idle
84191 +44.8% 121869 +44.9% 122010 will-it-scale.per_thread_ops
18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.workload
(6)
Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
345968 -6.5% 323566 +0.1% 346304 vm-scalability.median
1.91 ± 10% -0.5 1.38 ± 20% -0.2 1.75 ± 13% vm-scalability.median_stddev%
79708409 -7.4% 73839640 -0.1% 79613742 vm-scalability.throughput
(7)
Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G
=========================================================================================
array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
349414 -16.2% 292854 ± 2% -0.4% 348048 stream.add_bandwidth_MBps
347727 ± 2% -16.5% 290470 ± 2% -0.6% 345750 ± 2% stream.add_bandwidth_MBps_harmonicMean
332206 -21.6% 260428 ± 3% -0.4% 330838 stream.copy_bandwidth_MBps
330746 ± 2% -22.6% 255915 ± 3% -0.6% 328725 ± 2% stream.copy_bandwidth_MBps_harmonicMean
301178 -16.9% 250209 ± 2% -0.4% 299920 stream.scale_bandwidth_MBps
300262 -17.7% 247151 ± 2% -0.6% 298586 ± 2% stream.scale_bandwidth_MBps_harmonicMean
337408 -12.5% 295287 ± 2% -0.3% 336304 stream.triad_bandwidth_MBps
336153 -12.7% 293621 -0.5% 334624 ± 2% stream.triad_bandwidth_MBps_harmonicMean
(8)
Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G
=========================================================================================
array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/50%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
345632 -19.7% 277550 ± 3% +0.4% 347067 ± 2% stream.add_bandwidth_MBps
342263 ± 2% -19.7% 274704 ± 2% +0.4% 343609 ± 2% stream.add_bandwidth_MBps_harmonicMean
343820 -17.3% 284428 ± 3% +0.1% 344248 stream.copy_bandwidth_MBps
341759 ± 2% -17.8% 280934 ± 3% +0.1% 342025 ± 2% stream.copy_bandwidth_MBps_harmonicMean
343270 -17.8% 282330 ± 3% +0.3% 344276 ± 2% stream.scale_bandwidth_MBps
340812 ± 2% -18.3% 278284 ± 3% +0.3% 341672 ± 2% stream.scale_bandwidth_MBps_harmonicMean
364596 -19.7% 292831 ± 3% +0.4% 366145 ± 2% stream.triad_bandwidth_MBps
360643 ± 2% -19.9% 289034 ± 3% +0.4% 362004 ± 2% stream.triad_bandwidth_MBps_harmonicMean
(9)
Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with memory: 512G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Create Threads/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
26.82 +1348.4% 388.43 +4.0% 27.88 phoronix-test-suite.osbench.CreateThreads.us_per_event
**** for below (10) - (13), full comparison is attached as phoronix-regressions
(they all happen on a Coffee Lake desktop)
(10)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
20115 -4.5% 19211 -4.5% 19217 phoronix-test-suite.ramspeed.Add.Integer.mb_s
(11)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
19960 -2.9% 19378 -3.0% 19366 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
(12)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
19667 -6.4% 18399 -6.4% 18413 phoronix-test-suite.ramspeed.Triad.Integer.mb_s
(13)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
19799 -3.5% 19106 -3.4% 19117 phoronix-test-suite.ramspeed.Average.Integer.mb_s
>
>
> commit d8d7b1dae6f0311d528b289cda7b317520f9a984
> Author: 0day robot <lkp@intel.com>
> Date: Thu Jan 4 12:51:10 2024 +0800
>
> fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
>
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 40d94411d4920..91197bd387730 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> arch_calc_vm_flag_bits(flags);
> }
>
>
> >
> > Regards
> > Yin, Fengwei
> >
> > >
> > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > index 40d94411d492..dc7048824be8 100644
> > > --- a/include/linux/mman.h
> > > +++ b/include/linux/mman.h
> > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> > > arch_calc_vm_flag_bits(flags);
> > > }
> > >
[-- Attachment #2: stress-ng-regression --]
[-- Type: text/plain, Size: 15787 bytes --]
(1)
Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
55848 ± 28% +236.5% 187927 ± 3% +259.4% 200733 ± 2% meminfo.AnonHugePages
1.80 ± 5% -0.2 1.60 ± 5% -0.2 1.60 ± 7% mpstat.cpu.all.usr%
8077 ± 7% +11.8% 9030 ± 5% +4.6% 8451 ± 7% numa-vmstat.node0.nr_kernel_stack
120605 ± 3% -10.0% 108597 ± 3% -10.5% 107928 ± 3% vmstat.system.in
1868 ± 32% +75.1% 3271 ± 14% +87.1% 3495 ± 20% turbostat.C1
9123408 ± 5% -13.8% 7863298 ± 7% -14.0% 7846843 ± 6% turbostat.IRQ
59.62 ± 49% +125.4% 134.38 ± 88% +267.9% 219.38 ± 85% turbostat.POLL
24.33 ± 43% +69.1% 41.14 ± 35% +9.0% 26.51 ± 53% sched_debug.cfs_rq:/.removed.load_avg.avg
104.44 ± 21% +29.2% 134.94 ± 17% +3.2% 107.78 ± 26% sched_debug.cfs_rq:/.removed.load_avg.stddev
106.26 ± 16% -17.6% 87.53 ± 21% -24.6% 80.11 ± 21% sched_debug.cfs_rq:/.util_est_enqueued.stddev
35387 ± 59% +127.7% 80580 ± 53% +249.2% 123565 ± 57% sched_debug.cpu.avg_idle.min
1156 ± 7% -21.9% 903.06 ± 5% -23.2% 888.25 ± 15% sched_debug.cpu.nr_switches.min
20719 ±111% -51.1% 10123 ± 71% -56.6% 8996 ± 29% numa-meminfo.node0.Active
20639 ±111% -51.5% 10001 ± 72% -56.8% 8916 ± 29% numa-meminfo.node0.Active(anon)
31253 ± 70% +142.7% 75839 ± 20% +214.1% 98180 ± 22% numa-meminfo.node0.AnonHugePages
8076 ± 7% +11.8% 9029 ± 5% +4.7% 8451 ± 7% numa-meminfo.node0.KernelStack
24260 ± 62% +360.8% 111783 ± 17% +321.2% 102184 ± 21% numa-meminfo.node1.AnonHugePages
283702 ± 16% +40.9% 399633 ± 18% +35.9% 385485 ± 11% numa-meminfo.node1.AnonPages.max
251.12 -48.2% 130.00 -47.9% 130.75 stress-ng.numa.ops
4.10 -49.4% 2.08 -49.2% 2.09 stress-ng.numa.ops_per_sec
61658 -53.5% 28697 -53.3% 28768 stress-ng.time.minor_page_faults
3727 +2.8% 3832 +2.9% 3833 stress-ng.time.system_time
10.41 -48.6% 5.35 -48.7% 5.34 stress-ng.time.user_time
4313 ± 4% -47.0% 2285 ± 8% -48.3% 2230 ± 7% stress-ng.time.voluntary_context_switches
63.61 +2.5% 65.20 +2.7% 65.30 time.elapsed_time
63.61 +2.5% 65.20 +2.7% 65.30 time.elapsed_time.max
61658 -53.5% 28697 -53.3% 28768 time.minor_page_faults
3727 +2.8% 3832 +2.9% 3833 time.system_time
10.41 -48.6% 5.35 -48.7% 5.34 time.user_time
4313 ± 4% -47.0% 2285 ± 8% -48.3% 2230 ± 7% time.voluntary_context_switches
120325 +6.1% 127672 ± 6% +0.9% 121431 proc-vmstat.nr_anon_pages
27.33 ± 29% +236.0% 91.83 ± 3% +258.6% 98.02 ± 2% proc-vmstat.nr_anon_transparent_hugepages
148677 +6.2% 157844 ± 4% +0.7% 149763 proc-vmstat.nr_inactive_anon
98.10 ± 25% -52.8% 46.30 ± 69% -55.3% 43.82 ± 64% proc-vmstat.nr_isolated_file
2809 +9.0% 3063 ± 28% -3.9% 2698 ± 2% proc-vmstat.nr_page_table_pages
148670 +6.2% 157837 ± 4% +0.7% 149765 proc-vmstat.nr_zone_inactive_anon
2580003 -5.8% 2431297 -5.8% 2431173 proc-vmstat.numa_hit
1488693 -5.8% 1402808 -5.8% 1401633 proc-vmstat.numa_local
1091291 -5.8% 1028489 -5.7% 1029540 proc-vmstat.numa_other
9.56e+08 +2.1% 9.757e+08 +2.1% 9.761e+08 proc-vmstat.pgalloc_normal
469554 -7.6% 433894 -7.3% 435076 proc-vmstat.pgfault
9.559e+08 +2.1% 9.756e+08 +2.1% 9.76e+08 proc-vmstat.pgfree
17127 ± 21% -55.4% 7647 ± 64% -55.0% 7700 ± 52% proc-vmstat.pgmigrate_fail
9.554e+08 +2.1% 9.751e+08 +2.1% 9.754e+08 proc-vmstat.pgmigrate_success
1865641 +2.1% 1904388 +2.1% 1905158 proc-vmstat.thp_migration_success
0.43 ± 8% -0.1 0.30 ± 10% -0.2 0.28 ± 12% perf-profile.children.cycles-pp.queue_pages_range
0.43 ± 8% -0.1 0.30 ± 10% -0.2 0.28 ± 12% perf-profile.children.cycles-pp.walk_page_range
0.32 ± 8% -0.1 0.21 ± 11% -0.1 0.19 ± 13% perf-profile.children.cycles-pp.__walk_page_range
0.30 ± 8% -0.1 0.19 ± 12% -0.1 0.17 ± 13% perf-profile.children.cycles-pp.walk_pud_range
0.31 ± 9% -0.1 0.20 ± 12% -0.1 0.19 ± 12% perf-profile.children.cycles-pp.walk_pgd_range
0.30 ± 8% -0.1 0.20 ± 11% -0.1 0.18 ± 13% perf-profile.children.cycles-pp.walk_p4d_range
0.29 ± 8% -0.1 0.18 ± 11% -0.1 0.17 ± 13% perf-profile.children.cycles-pp.walk_pmd_range
0.28 ± 8% -0.1 0.17 ± 11% -0.1 0.16 ± 13% perf-profile.children.cycles-pp.queue_folios_pte_range
0.13 ± 12% -0.1 0.07 ± 11% -0.1 0.06 ± 17% perf-profile.children.cycles-pp.vm_normal_folio
0.18 ± 4% -0.0 0.15 ± 3% -0.0 0.16 ± 3% perf-profile.children.cycles-pp.add_page_for_migration
0.12 ± 4% -0.0 0.12 ± 5% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.__cond_resched
98.65 +0.2 98.82 +0.2 98.88 perf-profile.children.cycles-pp.migrate_pages_batch
98.66 +0.2 98.83 +0.2 98.89 perf-profile.children.cycles-pp.migrate_pages_sync
98.68 +0.2 98.85 +0.2 98.91 perf-profile.children.cycles-pp.migrate_pages
0.10 ± 11% -0.0 0.05 ± 12% -0.1 0.04 ± 79% perf-profile.self.cycles-pp.vm_normal_folio
0.13 ± 8% -0.0 0.08 ± 14% -0.0 0.08 ± 14% perf-profile.self.cycles-pp.queue_folios_pte_range
0.17 ± 89% -100.0% 0.00 -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
0.45 ± 59% +124.4% 1.01 ± 81% +1094.5% 5.40 ±120% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
27.27 ± 95% -75.2% 6.77 ± 83% -48.4% 14.08 ± 77% perf-sched.sch_delay.max.ms.__cond_resched.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch
2.00 ± 88% -100.0% 0.00 -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
4.30 ± 86% -50.9% 2.11 ± 67% -90.0% 0.43 ±261% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
3.31 ± 53% -55.8% 1.46 ±218% -81.0% 0.63 ±182% perf-sched.sch_delay.max.ms.synchronize_rcu_expedited.lru_cache_disable.do_pages_move.kernel_move_pages
190.22 ± 41% +125.2% 428.42 ± 60% +72.7% 328.46 ± 21% perf-sched.wait_and_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
294.56 ± 10% +44.0% 424.28 ± 16% +62.5% 478.70 ± 13% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0
322.33 ± 5% +46.1% 470.78 ± 10% +40.8% 453.90 ± 10% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
117.25 ± 11% -13.3% 101.62 ± 34% -24.6% 88.38 ± 17% perf-sched.wait_and_delay.count.__cond_resched.down_read.add_page_for_migration.do_pages_move.kernel_move_pages
307.25 ± 7% -54.6% 139.62 ± 4% -55.2% 137.62 ± 5% perf-sched.wait_and_delay.count.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_pages_move.kernel_move_pages
406.25 ± 3% -57.7% 171.88 ± 10% -59.0% 166.75 ± 3% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.__flush_work.isra.0
142.50 ± 33% -76.8% 33.00 ±139% -65.8% 48.75 ± 83% perf-sched.wait_and_delay.count.synchronize_rcu_expedited.lru_cache_disable.do_pages_move.kernel_move_pages
1196 ± 3% -37.9% 743.38 ± 10% -38.5% 736.00 ± 9% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1749 ± 19% +45.1% 2537 ± 6% +76.0% 3078 ± 18% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0
2691 ± 15% +48.8% 4003 ± 6% +44.6% 3892 ± 11% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.82 ± 14% -100.0% 0.00 -81.1% 0.53 ±264% perf-sched.wait_time.avg.ms.__cond_resched.down_read.migrate_to_node.do_migrate_pages.kernel_migrate_pages
199.40 ± 29% +114.8% 428.41 ± 60% +64.7% 328.44 ± 21% perf-sched.wait_time.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
3.09 ± 16% -100.0% 0.00 -84.4% 0.48 ±264% perf-sched.wait_time.avg.ms.__cond_resched.queue_folios_pte_range.walk_pmd_range.isra.0
1.94 ± 50% -100.0% 0.00 -74.2% 0.50 ±264% perf-sched.wait_time.avg.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
294.30 ± 10% +44.1% 424.17 ± 16% +62.6% 478.57 ± 13% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0
0.98 ±107% -100.0% 0.00 -95.8% 0.04 ±264% perf-sched.wait_time.avg.ms.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
321.84 ± 5% +46.1% 470.35 ± 10% +40.8% 453.02 ± 10% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
7.31 ± 53% -100.0% 0.00 -87.7% 0.90 ±264% perf-sched.wait_time.max.ms.__cond_resched.down_read.migrate_to_node.do_migrate_pages.kernel_migrate_pages
6.45 ± 16% -100.0% 0.00 -84.5% 1.00 ±264% perf-sched.wait_time.max.ms.__cond_resched.queue_folios_pte_range.walk_pmd_range.isra.0
6.17 ± 45% -100.0% 0.00 -91.9% 0.50 ±264% perf-sched.wait_time.max.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
11.63 ±118% -93.3% 0.78 ±178% -89.3% 1.24 ±245% perf-sched.wait_time.max.ms.exp_funnel_lock.synchronize_rcu_expedited.lru_cache_disable.do_pages_move
1749 ± 19% +45.1% 2537 ± 6% +76.0% 3078 ± 18% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0
2.49 ± 88% -100.0% 0.00 -98.4% 0.04 ±264% perf-sched.wait_time.max.ms.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
2691 ± 15% +48.8% 4003 ± 6% +44.6% 3892 ± 11% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
340.81 +38.9% 473.47 +38.4% 471.58 perf-stat.i.MPKI
1.131e+09 -25.0% 8.485e+08 -25.2% 8.465e+08 ± 2% perf-stat.i.branch-instructions
68.31 +1.1 69.37 +1.1 69.37 perf-stat.i.cache-miss-rate%
46.16 +38.1% 63.73 +37.5% 63.45 perf-stat.i.cpi
157.48 -7.7% 145.30 ± 2% -8.1% 144.76 ± 2% perf-stat.i.cpu-migrations
0.02 ± 2% +0.0 0.02 ± 16% +0.0 0.02 perf-stat.i.dTLB-load-miss-rate%
165432 ± 2% -2.9% 160583 ± 12% -8.3% 151664 perf-stat.i.dTLB-load-misses
1.133e+09 -21.9% 8.846e+08 -22.1% 8.823e+08 ± 2% perf-stat.i.dTLB-loads
0.02 -0.0 0.01 ± 3% -0.0 0.01 perf-stat.i.dTLB-store-miss-rate%
98452 -31.8% 67127 ± 2% -32.2% 66739 ± 2% perf-stat.i.dTLB-store-misses
5.668e+08 -13.7% 4.891e+08 -13.9% 4.879e+08 perf-stat.i.dTLB-stores
5.684e+09 -24.5% 4.292e+09 -24.7% 4.282e+09 ± 2% perf-stat.i.instructions
0.07 ± 2% -14.5% 0.06 ± 3% -14.6% 0.06 ± 5% perf-stat.i.ipc
88.20 -10.7% 78.73 -11.0% 78.53 perf-stat.i.metric.M/sec
1.242e+08 +0.9% 1.254e+08 +1.0% 1.255e+08 perf-stat.i.node-load-misses
76214273 +1.0% 76999051 +1.2% 77103845 perf-stat.i.node-loads
247.93 +32.1% 327.57 ± 2% +32.1% 327.56 ± 2% perf-stat.overall.MPKI
0.92 ± 4% +0.2 1.13 ± 5% +0.2 1.12 ± 5% perf-stat.overall.branch-miss-rate%
69.51 +0.9 70.45 +1.0 70.50 perf-stat.overall.cache-miss-rate%
33.77 +31.3% 44.35 ± 2% +31.3% 44.35 ± 2% perf-stat.overall.cpi
0.01 ± 2% +0.0 0.02 ± 13% +0.0 0.02 ± 2% perf-stat.overall.dTLB-load-miss-rate%
0.02 -0.0 0.01 ± 2% -0.0 0.01 perf-stat.overall.dTLB-store-miss-rate%
0.03 -23.9% 0.02 ± 2% -23.9% 0.02 perf-stat.overall.ipc
1.084e+09 -24.2% 8.217e+08 ± 2% -24.2% 8.216e+08 ± 2% perf-stat.ps.branch-instructions
154.44 -8.0% 142.02 ± 2% -8.6% 141.20 ± 2% perf-stat.ps.cpu-migrations
163178 ± 3% -3.1% 158185 ± 12% -8.0% 150107 ± 2% perf-stat.ps.dTLB-load-misses
1.089e+09 -21.1% 8.585e+08 -21.2% 8.581e+08 perf-stat.ps.dTLB-loads
96861 -31.9% 65975 ± 2% -32.1% 65796 ± 2% perf-stat.ps.dTLB-store-misses
5.503e+08 -13.1% 4.781e+08 -13.2% 4.776e+08 perf-stat.ps.dTLB-stores
5.447e+09 -23.7% 4.157e+09 -23.7% 4.157e+09 perf-stat.ps.instructions
1.223e+08 +1.0% 1.235e+08 +1.0% 1.235e+08 perf-stat.ps.node-load-misses
75118302 +1.1% 75929311 +1.1% 75927016 perf-stat.ps.node-loads
3.496e+11 -21.7% 2.737e+11 -21.7% 2.739e+11 ± 2% perf-stat.total.instructions
[-- Attachment #3: will-it-scale-regression --]
[-- Type: text/plain, Size: 57536 bytes --]
(4)
Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
3161 +46.4% 4627 +47.5% 4662 vmstat.system.cs
0.58 ± 2% +0.7 1.27 +0.7 1.26 mpstat.cpu.all.irq%
0.55 ± 3% -0.5 0.09 ± 2% -0.5 0.09 ± 2% mpstat.cpu.all.soft%
1.00 ± 13% -0.7 0.29 -0.7 0.28 mpstat.cpu.all.usr%
1231431 -86.7% 164315 -86.7% 163624 will-it-scale.112.processes
10994 -86.7% 1466 -86.7% 1460 will-it-scale.per_process_ops
1231431 -86.7% 164315 -86.7% 163624 will-it-scale.workload
0.03 -66.7% 0.01 -66.7% 0.01 turbostat.IPC
81.38 -2.8% 79.12 -2.2% 79.62 turbostat.PkgTmp
764.02 +17.1% 894.78 +17.0% 893.81 turbostat.PkgWatt
19.80 +135.4% 46.59 +135.1% 46.53 turbostat.RAMWatt
771.38 ± 5% +249.5% 2696 ± 14% +231.9% 2560 ± 10% perf-c2c.DRAM.local
3050 ± 5% -69.8% 922.75 ± 6% -71.5% 869.88 ± 8% perf-c2c.DRAM.remote
11348 ± 4% -90.2% 1107 ± 5% -90.6% 1065 ± 3% perf-c2c.HITM.local
357.50 ± 21% -44.0% 200.38 ± 7% -48.2% 185.25 ± 13% perf-c2c.HITM.remote
11706 ± 4% -88.8% 1307 ± 4% -89.3% 1250 ± 3% perf-c2c.HITM.total
1.717e+08 ± 9% -85.5% 24955542 -85.5% 24880885 numa-numastat.node0.local_node
1.718e+08 ± 9% -85.4% 25046901 -85.5% 24972867 numa-numastat.node0.numa_hit
1.945e+08 ± 7% -87.0% 25203631 -87.1% 25104844 numa-numastat.node1.local_node
1.946e+08 ± 7% -87.0% 25300536 -87.1% 25180465 numa-numastat.node1.numa_hit
2.001e+08 ± 2% -87.5% 25098699 -87.5% 25011079 numa-numastat.node2.local_node
2.002e+08 ± 2% -87.4% 25173132 -87.5% 25119438 numa-numastat.node2.numa_hit
1.956e+08 ± 6% -87.3% 24922332 -87.3% 24784408 numa-numastat.node3.local_node
1.957e+08 ± 6% -87.2% 25008002 -87.3% 24874399 numa-numastat.node3.numa_hit
766959 -45.9% 414816 -46.2% 412898 meminfo.Active
766881 -45.9% 414742 -46.2% 412824 meminfo.Active(anon)
391581 +12.1% 438946 +8.4% 424669 meminfo.AnonPages
421982 +20.7% 509155 +14.8% 484430 meminfo.Inactive
421800 +20.7% 508969 +14.8% 484244 meminfo.Inactive(anon)
68496 ± 7% +88.9% 129357 ± 2% +82.9% 125252 ± 2% meminfo.Mapped
569270 -24.0% 432709 -24.1% 431884 meminfo.SUnreclaim
797185 -40.2% 476420 -40.8% 471912 meminfo.Shmem
730111 -18.8% 593041 -18.9% 592400 meminfo.Slab
148082 ± 2% -20.3% 118055 ± 4% -21.7% 115994 ± 6% numa-meminfo.node0.SUnreclaim
197311 ± 16% -22.5% 152829 ± 19% -29.8% 138546 ± 9% numa-meminfo.node0.Slab
144635 ± 5% -25.8% 107254 ± 4% -25.3% 107973 ± 6% numa-meminfo.node1.SUnreclaim
137974 ± 2% -24.5% 104205 ± 6% -25.7% 102563 ± 4% numa-meminfo.node2.SUnreclaim
167889 ± 13% -26.1% 124127 ± 9% -15.0% 142771 ± 18% numa-meminfo.node2.Slab
607639 ± 20% -46.2% 326998 ± 15% -46.8% 323458 ± 13% numa-meminfo.node3.Active
607611 ± 20% -46.2% 326968 ± 15% -46.8% 323438 ± 13% numa-meminfo.node3.Active(anon)
679476 ± 21% -31.3% 466619 ± 19% -38.5% 418074 ± 16% numa-meminfo.node3.FilePages
20150 ± 22% +128.4% 46020 ± 11% +123.0% 44932 ± 8% numa-meminfo.node3.Mapped
138148 ± 2% -25.3% 103148 ± 4% -23.8% 105326 ± 7% numa-meminfo.node3.SUnreclaim
631930 ± 20% -40.9% 373456 ± 15% -41.5% 369883 ± 13% numa-meminfo.node3.Shmem
166777 ± 7% -19.6% 134013 ± 9% -20.7% 132332 ± 7% numa-meminfo.node3.Slab
37030 ± 2% -20.3% 29511 ± 4% -21.7% 28993 ± 6% numa-vmstat.node0.nr_slab_unreclaimable
1.718e+08 ± 9% -85.4% 25047066 -85.5% 24973455 numa-vmstat.node0.numa_hit
1.717e+08 ± 9% -85.5% 24955707 -85.5% 24881472 numa-vmstat.node0.numa_local
36158 ± 5% -25.8% 26811 ± 4% -25.4% 26990 ± 6% numa-vmstat.node1.nr_slab_unreclaimable
1.946e+08 ± 7% -87.0% 25300606 -87.1% 25181038 numa-vmstat.node1.numa_hit
1.945e+08 ± 7% -87.0% 25203699 -87.1% 25105417 numa-vmstat.node1.numa_local
34499 ± 2% -24.5% 26050 ± 6% -25.7% 25638 ± 4% numa-vmstat.node2.nr_slab_unreclaimable
2.002e+08 ± 2% -87.4% 25173363 -87.5% 25119830 numa-vmstat.node2.numa_hit
2.001e+08 ± 2% -87.5% 25098930 -87.5% 25011471 numa-vmstat.node2.numa_local
151851 ± 20% -46.2% 81720 ± 15% -46.8% 80848 ± 13% numa-vmstat.node3.nr_active_anon
169827 ± 21% -31.3% 116645 ± 19% -38.5% 104502 ± 16% numa-vmstat.node3.nr_file_pages
4991 ± 23% +131.5% 11555 ± 11% +125.4% 11249 ± 8% numa-vmstat.node3.nr_mapped
157941 ± 20% -40.9% 93355 ± 15% -41.5% 92454 ± 13% numa-vmstat.node3.nr_shmem
34570 ± 2% -25.4% 25780 ± 4% -23.8% 26327 ± 7% numa-vmstat.node3.nr_slab_unreclaimable
151851 ± 20% -46.2% 81720 ± 15% -46.8% 80848 ± 13% numa-vmstat.node3.nr_zone_active_anon
1.957e+08 ± 6% -87.2% 25008117 -87.3% 24874649 numa-vmstat.node3.numa_hit
1.956e+08 ± 6% -87.3% 24922447 -87.3% 24784657 numa-vmstat.node3.numa_local
191746 -45.9% 103734 -46.2% 103228 proc-vmstat.nr_active_anon
97888 +12.1% 109757 +8.5% 106185 proc-vmstat.nr_anon_pages
947825 -8.5% 867659 -8.6% 866533 proc-vmstat.nr_file_pages
105444 +20.7% 127227 +14.9% 121113 proc-vmstat.nr_inactive_anon
17130 ± 7% +88.9% 32365 ± 2% +83.4% 31420 ± 2% proc-vmstat.nr_mapped
4007 +4.2% 4176 +4.1% 4170 proc-vmstat.nr_page_table_pages
199322 -40.2% 119155 -40.8% 118031 proc-vmstat.nr_shmem
142294 -24.0% 108161 -24.1% 107954 proc-vmstat.nr_slab_unreclaimable
191746 -45.9% 103734 -46.2% 103228 proc-vmstat.nr_zone_active_anon
105444 +20.7% 127223 +14.9% 121106 proc-vmstat.nr_zone_inactive_anon
40186 ± 13% +65.0% 66320 ± 5% +60.2% 64374 ± 13% proc-vmstat.numa_hint_faults
20248 ± 39% +108.3% 42185 ± 12% +102.6% 41033 ± 10% proc-vmstat.numa_hint_faults_local
7.623e+08 -86.8% 1.005e+08 -86.9% 1.002e+08 proc-vmstat.numa_hit
7.62e+08 -86.9% 1.002e+08 -86.9% 99786408 proc-vmstat.numa_local
181538 ± 6% +49.5% 271428 ± 3% +48.9% 270328 ± 6% proc-vmstat.numa_pte_updates
152652 ± 7% -28.6% 108996 -29.6% 107396 proc-vmstat.pgactivate
7.993e+08 +3068.4% 2.533e+10 +3055.6% 2.522e+10 proc-vmstat.pgalloc_normal
3.72e+08 -86.4% 50632612 -86.4% 50429200 proc-vmstat.pgfault
7.99e+08 +3069.7% 2.533e+10 +3056.9% 2.522e+10 proc-vmstat.pgfree
48.75 ± 2% +1e+08% 49362627 +1e+08% 49162408 proc-vmstat.thp_fault_alloc
21789703 ± 10% -20.1% 17410551 ± 7% -18.9% 17673460 ± 4% sched_debug.cfs_rq:/.avg_vruntime.max
427573 ± 99% +1126.7% 5245182 ± 17% +1104.4% 5149659 ± 13% sched_debug.cfs_rq:/.avg_vruntime.min
4757464 ± 10% -48.3% 2458136 ± 19% -46.6% 2539001 ± 11% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.44 ± 2% -15.9% 0.37 ± 2% -16.6% 0.37 ± 3% sched_debug.cfs_rq:/.h_nr_running.stddev
299205 ± 38% +59.3% 476493 ± 27% +50.6% 450561 ± 42% sched_debug.cfs_rq:/.load.max
21789703 ± 10% -20.1% 17410551 ± 7% -18.9% 17673460 ± 4% sched_debug.cfs_rq:/.min_vruntime.max
427573 ± 99% +1126.7% 5245182 ± 17% +1104.4% 5149659 ± 13% sched_debug.cfs_rq:/.min_vruntime.min
4757464 ± 10% -48.3% 2458136 ± 19% -46.6% 2539001 ± 11% sched_debug.cfs_rq:/.min_vruntime.stddev
0.44 ± 2% -16.0% 0.37 ± 2% -17.2% 0.36 ± 2% sched_debug.cfs_rq:/.nr_running.stddev
446.75 ± 2% -18.4% 364.71 ± 2% -19.3% 360.46 ± 2% sched_debug.cfs_rq:/.runnable_avg.stddev
445.25 ± 2% -18.4% 363.46 ± 2% -19.3% 359.33 ± 2% sched_debug.cfs_rq:/.util_avg.stddev
946.71 ± 3% -14.7% 807.54 ± 4% -15.4% 800.58 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.max
281.39 ± 7% -31.2% 193.63 ± 4% -32.0% 191.24 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.stddev
1131635 ± 7% +73.7% 1965577 ± 6% +76.5% 1997455 ± 7% sched_debug.cpu.avg_idle.max
223539 ± 16% +165.4% 593172 ± 7% +146.0% 549906 ± 11% sched_debug.cpu.avg_idle.min
83325 ± 4% +64.3% 136927 ± 9% +69.7% 141399 ± 11% sched_debug.cpu.avg_idle.stddev
17.57 ± 6% +594.5% 122.01 ± 3% +588.0% 120.88 ± 3% sched_debug.cpu.clock.stddev
873.33 -11.1% 776.19 -11.8% 770.20 sched_debug.cpu.clock_task.stddev
2870 -18.1% 2351 -17.4% 2371 sched_debug.cpu.curr->pid.avg
3003 -12.5% 2627 -12.4% 2630 sched_debug.cpu.curr->pid.stddev
550902 ± 6% +74.4% 960871 ± 6% +79.8% 990291 ± 8% sched_debug.cpu.max_idle_balance_cost.max
4451 ± 59% +1043.9% 50917 ± 15% +1129.4% 54721 ± 15% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 ± 17% +385.8% 0.00 ± 34% +315.7% 0.00 ± 3% sched_debug.cpu.next_balance.stddev
0.43 -17.5% 0.35 -16.8% 0.35 sched_debug.cpu.nr_running.avg
1.15 ± 8% +25.0% 1.44 ± 8% +30.4% 1.50 ± 13% sched_debug.cpu.nr_running.max
0.45 -14.4% 0.39 -14.2% 0.39 ± 2% sched_debug.cpu.nr_running.stddev
3280 ± 5% +32.5% 4345 +34.5% 4412 sched_debug.cpu.nr_switches.avg
846.82 ± 11% +109.9% 1777 ± 12% +112.4% 1799 ± 4% sched_debug.cpu.nr_switches.min
0.03 ±173% +887.2% 0.30 ± 73% +521.1% 0.19 ± 35% sched_debug.rt_rq:.rt_time.avg
6.79 ±173% +887.2% 67.01 ± 73% +521.1% 42.16 ± 35% sched_debug.rt_rq:.rt_time.max
0.45 ±173% +887.2% 4.47 ± 73% +521.1% 2.81 ± 35% sched_debug.rt_rq:.rt_time.stddev
4.65 +28.0% 5.96 +28.5% 5.98 perf-stat.i.MPKI
8.721e+09 -71.0% 2.532e+09 -71.1% 2.523e+09 perf-stat.i.branch-instructions
0.34 +0.1 0.48 +0.1 0.48 perf-stat.i.branch-miss-rate%
30145441 -58.6% 12471062 -58.6% 12487542 perf-stat.i.branch-misses
33.52 -15.3 18.20 -15.2 18.27 perf-stat.i.cache-miss-rate%
1.819e+08 -58.8% 74947458 -58.8% 74903072 perf-stat.i.cache-misses
5.429e+08 ± 2% -24.1% 4.123e+08 -24.4% 4.103e+08 perf-stat.i.cache-references
3041 +48.6% 4518 +49.7% 4552 perf-stat.i.context-switches
10.96 +212.9% 34.28 +214.1% 34.41 perf-stat.i.cpi
309.29 -11.2% 274.59 -11.3% 274.20 perf-stat.i.cpu-migrations
2354 +144.6% 5758 +144.7% 5761 perf-stat.i.cycles-between-cache-misses
0.13 -0.1 0.01 ± 3% -0.1 0.01 ± 3% perf-stat.i.dTLB-load-miss-rate%
12852209 ± 2% -98.0% 261197 ± 3% -97.9% 263864 ± 3% perf-stat.i.dTLB-load-misses
9.56e+09 -69.3% 2.932e+09 -69.4% 2.922e+09 perf-stat.i.dTLB-loads
0.12 -0.1 0.03 -0.1 0.03 perf-stat.i.dTLB-store-miss-rate%
5083186 -86.3% 693971 -86.4% 690328 perf-stat.i.dTLB-store-misses
4.209e+09 -44.9% 2.317e+09 -45.2% 2.308e+09 perf-stat.i.dTLB-stores
76.33 -39.7 36.61 -39.7 36.59 perf-stat.i.iTLB-load-miss-rate%
18717931 -80.1% 3715941 -80.2% 3698121 perf-stat.i.iTLB-load-misses
5758034 +7.7% 6202790 +7.4% 6183041 perf-stat.i.iTLB-loads
3.914e+10 -67.8% 1.261e+10 -67.9% 1.256e+10 perf-stat.i.instructions
2107 +73.9% 3663 +73.6% 3658 perf-stat.i.instructions-per-iTLB-miss
0.09 -67.9% 0.03 -68.1% 0.03 perf-stat.i.ipc
269.39 +10.6% 297.91 +10.7% 298.33 perf-stat.i.metric.K/sec
102.78 -64.5% 36.54 -64.6% 36.40 perf-stat.i.metric.M/sec
1234832 -86.4% 167556 -86.5% 166848 perf-stat.i.minor-faults
87.25 -41.9 45.32 -42.2 45.09 perf-stat.i.node-load-miss-rate%
25443233 -83.0% 4326696 ± 3% -83.4% 4227985 ± 2% perf-stat.i.node-load-misses
3723342 ± 3% +45.4% 5414430 +44.3% 5372545 perf-stat.i.node-loads
79.20 -74.4 4.78 -74.5 4.74 perf-stat.i.node-store-miss-rate%
14161911 ± 2% -83.1% 2394469 -83.2% 2382317 perf-stat.i.node-store-misses
3727955 ± 3% +1181.6% 47776544 +1188.5% 48035797 perf-stat.i.node-stores
1234832 -86.4% 167556 -86.5% 166849 perf-stat.i.page-faults
4.65 +28.0% 5.95 +28.4% 5.97 perf-stat.overall.MPKI
0.35 +0.1 0.49 +0.1 0.49 perf-stat.overall.branch-miss-rate%
33.51 -15.3 18.19 -15.3 18.26 perf-stat.overall.cache-miss-rate%
10.94 +212.3% 34.16 +213.4% 34.28 perf-stat.overall.cpi
2354 +143.9% 5741 +144.1% 5746 perf-stat.overall.cycles-between-cache-misses
0.13 -0.1 0.01 ± 3% -0.1 0.01 ± 5% perf-stat.overall.dTLB-load-miss-rate%
0.12 -0.1 0.03 -0.1 0.03 perf-stat.overall.dTLB-store-miss-rate%
76.49 -39.2 37.31 -39.2 37.29 perf-stat.overall.iTLB-load-miss-rate%
2090 +63.4% 3416 +63.5% 3417 perf-stat.overall.instructions-per-iTLB-miss
0.09 -68.0% 0.03 -68.1% 0.03 perf-stat.overall.ipc
87.22 -43.1 44.12 ± 2% -43.5 43.76 perf-stat.overall.node-load-miss-rate%
79.16 -74.4 4.77 -74.4 4.72 perf-stat.overall.node-store-miss-rate%
9549728 +140.9% 23005172 +141.1% 23022843 perf-stat.overall.path-length
8.691e+09 -71.0% 2.519e+09 -71.1% 2.51e+09 perf-stat.ps.branch-instructions
30118940 -59.1% 12319517 -59.1% 12327993 perf-stat.ps.branch-misses
1.813e+08 -58.8% 74623919 -58.9% 74563289 perf-stat.ps.cache-misses
5.41e+08 ± 2% -24.2% 4.103e+08 -24.5% 4.085e+08 perf-stat.ps.cache-references
3031 +47.9% 4485 +49.1% 4519 perf-stat.ps.context-switches
307.72 -12.7% 268.59 -12.7% 268.66 perf-stat.ps.cpu-migrations
12806734 ± 2% -98.0% 260740 ± 4% -97.9% 267782 ± 5% perf-stat.ps.dTLB-load-misses
9.528e+09 -69.4% 2.917e+09 -69.5% 2.907e+09 perf-stat.ps.dTLB-loads
5063992 -86.4% 690720 -86.4% 687415 perf-stat.ps.dTLB-store-misses
4.195e+09 -45.0% 2.306e+09 -45.2% 2.297e+09 perf-stat.ps.dTLB-stores
18661026 -80.3% 3672024 -80.4% 3658006 perf-stat.ps.iTLB-load-misses
5735379 +7.6% 6169096 +7.3% 6151755 perf-stat.ps.iTLB-loads
3.901e+10 -67.8% 1.254e+10 -68.0% 1.25e+10 perf-stat.ps.instructions
1230175 -86.4% 166708 -86.5% 166045 perf-stat.ps.minor-faults
25346347 -83.0% 4299946 ± 2% -83.4% 4203636 ± 2% perf-stat.ps.node-load-misses
3713652 ± 3% +46.6% 5444481 +45.5% 5401831 perf-stat.ps.node-loads
14107969 ± 2% -83.1% 2381707 -83.2% 2368146 perf-stat.ps.node-store-misses
3716359 ± 3% +1179.6% 47556224 +1186.1% 47797289 perf-stat.ps.node-stores
1230175 -86.4% 166708 -86.5% 166046 perf-stat.ps.page-faults
1.176e+13 -67.9% 3.78e+12 -68.0% 3.767e+12 perf-stat.total.instructions
0.01 ± 42% +385.1% 0.03 ± 8% +566.0% 0.04 ± 42% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.01 ± 17% +354.3% 0.05 ± 8% +402.1% 0.06 ± 8% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.01 ± 19% +323.1% 0.06 ± 27% +347.1% 0.06 ± 17% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.01 ± 14% +2.9e+05% 25.06 ±172% +1.6e+05% 13.94 ±263% perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.00 ±129% +7133.3% 0.03 ± 7% +7200.0% 0.03 ± 4% perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.01 ± 8% +396.8% 0.06 ± 2% +402.1% 0.06 ± 2% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.01 ± 9% +256.9% 0.03 ± 10% +232.8% 0.02 ± 13% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.01 ± 15% +324.0% 0.05 ± 17% +320.8% 0.05 ± 17% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
0.01 ± 19% +338.6% 0.06 ± 7% +305.0% 0.05 ± 8% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.01 ± 9% +298.4% 0.03 ± 2% +304.8% 0.03 perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.01 ± 7% +265.8% 0.03 ± 5% +17282.9% 1.65 ±258% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
0.19 ± 11% -89.3% 0.02 ± 10% -89.4% 0.02 ± 10% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.01 ± 28% +319.8% 0.05 ± 19% +303.0% 0.05 ± 18% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
0.01 ± 14% +338.9% 0.03 ± 9% +318.5% 0.03 ± 4% perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
0.02 ± 20% +674.2% 0.12 ±137% +267.5% 0.06 ± 15% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.01 ± 46% +256.9% 0.03 ± 11% +1095.8% 0.11 ±112% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.02 ± 28% +324.6% 0.07 ± 8% +353.2% 0.07 ± 9% perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.02 ± 21% +318.4% 0.07 ± 25% +389.6% 0.08 ± 26% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.01 ± 26% +1.9e+06% 250.13 ±173% +9.7e+05% 125.09 ±264% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.02 ± 25% +585.6% 0.11 ± 63% +454.5% 0.09 ± 31% perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.04 ± 39% +159.0% 0.11 ± 6% +190.0% 0.13 ± 10% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.01 ± 29% +312.9% 0.06 ± 19% +401.7% 0.07 ± 13% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
0.02 ± 25% +216.8% 0.06 ± 36% +166.4% 0.05 ± 7% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.01 ± 21% +345.8% 0.07 ± 26% +298.3% 0.06 ± 18% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
0.03 ± 35% +190.2% 0.07 ± 16% +187.8% 0.07 ± 11% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.02 ± 19% +220.8% 0.07 ± 23% +2.9e+05% 63.06 ±263% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
4.60 ± 5% -10.7% 4.11 ± 8% -13.4% 3.99 perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.02 ± 32% +368.0% 0.07 ± 25% +346.9% 0.07 ± 20% perf-sched.sch_delay.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
189.60 -32.9% 127.16 -33.0% 126.98 perf-sched.total_wait_and_delay.average.ms
11265 ± 3% +73.7% 19568 ± 3% +71.1% 19274 perf-sched.total_wait_and_delay.count.ms
189.18 -32.9% 126.97 -33.0% 126.81 perf-sched.total_wait_time.average.ms
0.50 ± 20% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
0.50 ± 11% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
0.43 ± 16% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0
52.33 ± 31% +223.4% 169.23 ± 7% +226.5% 170.86 ± 2% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.51 ± 18% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
28.05 ± 4% +27.8% 35.84 ± 4% +26.0% 35.34 ± 8% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
2.08 ± 3% +33.2% 2.76 +32.9% 2.76 ± 2% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
491.80 -53.6% 227.96 ± 3% -53.5% 228.58 ± 2% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
222.00 ± 9% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
8.75 ± 33% -84.3% 1.38 ±140% -82.9% 1.50 ± 57% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1065 ± 3% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
538.25 ± 9% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.unmap_vmas.unmap_region.constprop.0
307.75 ± 6% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
2458 ± 3% -20.9% 1944 ± 4% -20.5% 1954 ± 7% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
2577 ± 5% +168.6% 6921 ± 4% +165.0% 6829 ± 2% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
7.07 ±172% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
1730 ± 24% -77.9% 382.66 ±117% -50.1% 862.68 ± 89% perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
34.78 ± 43% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
8.04 ±179% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0
9.47 ±134% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
3.96 ± 6% +60.6% 6.36 ± 5% +58.3% 6.27 ± 6% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.42 ± 27% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
0.50 ± 20% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
0.51 ± 17% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault
0.59 ± 17% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault
0.46 ± 31% -63.3% 0.17 ± 18% -67.7% 0.15 ± 15% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap
0.50 ± 11% -67.8% 0.16 ± 8% -67.6% 0.16 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
0.43 ± 16% -63.5% 0.16 ± 10% -62.6% 0.16 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0
0.50 ± 19% -67.0% 0.17 ± 5% -69.0% 0.16 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
1.71 ± 5% +55.9% 2.66 ± 3% +47.3% 2.52 ± 6% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
52.33 ± 31% +223.4% 169.20 ± 7% +226.5% 170.83 ± 2% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.51 ± 18% -67.7% 0.16 ± 5% -68.0% 0.16 ± 6% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
0.53 ± 17% -65.4% 0.18 ± 56% -66.5% 0.18 ± 10% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
27.63 ± 4% +29.7% 35.83 ± 4% +27.6% 35.27 ± 8% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
2.07 ± 3% +32.1% 2.73 +31.9% 2.73 ± 2% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
491.61 -53.6% 227.94 ± 3% -53.5% 228.56 ± 2% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1.72 ± 5% +58.1% 2.73 ± 3% +50.4% 2.59 ± 7% perf-sched.wait_time.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
1.42 ± 21% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
7.07 ±172% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
1.66 ± 27% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault
2.05 ± 57% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault
1.69 ± 20% -84.6% 0.26 ± 25% -86.0% 0.24 ± 6% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap
1730 ± 24% -76.3% 409.21 ±104% -50.1% 862.65 ± 89% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
34.78 ± 43% -98.9% 0.38 ± 12% -98.8% 0.41 ± 10% perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
8.04 ±179% -96.0% 0.32 ± 18% -95.7% 0.35 ± 19% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0
4.68 ±155% -93.4% 0.31 ± 24% -93.9% 0.28 ± 21% perf-sched.wait_time.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
3.42 ± 5% +55.9% 5.33 ± 3% +47.3% 5.03 ± 6% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
9.47 ±134% -96.3% 0.35 ± 17% -96.1% 0.37 ± 8% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
1.87 ± 10% -60.9% 0.73 ±164% -85.3% 0.28 ± 24% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
2.39 ±185% -97.8% 0.05 ±165% -98.0% 0.05 ±177% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
3.95 ± 6% +59.9% 6.32 ± 5% +57.6% 6.23 ± 6% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
3.45 ± 5% +58.1% 5.45 ± 3% +50.4% 5.19 ± 7% perf-sched.wait_time.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
56.55 ± 2% -55.1 1.45 ± 2% -55.1 1.44 ± 2% perf-profile.calltrace.cycles-pp.__munmap
56.06 ± 2% -55.1 0.96 ± 2% -55.1 0.96 ± 2% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
56.50 ± 2% -55.1 1.44 -55.1 1.44 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
56.50 ± 2% -55.1 1.44 ± 2% -55.1 1.43 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
56.47 ± 2% -55.0 1.43 -55.0 1.42 ± 2% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
56.48 ± 2% -55.0 1.44 ± 2% -55.0 1.43 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
56.45 ± 2% -55.0 1.42 -55.0 1.42 ± 2% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
56.40 ± 2% -55.0 1.40 ± 2% -55.0 1.39 ± 2% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
35.28 -34.6 0.66 -34.6 0.66 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
35.17 -34.6 0.57 -34.6 0.57 ± 2% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
35.11 -34.5 0.57 -34.5 0.56 perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
18.40 ± 7% -18.4 0.00 -18.4 0.00 perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
17.42 ± 7% -17.4 0.00 -17.4 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
17.42 ± 7% -17.4 0.00 -17.4 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap
17.41 ± 7% -17.4 0.00 -17.4 0.00 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap
17.23 ± 6% -17.2 0.00 -17.2 0.00 perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region
16.09 ± 8% -16.1 0.00 -16.1 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region
16.02 ± 8% -16.0 0.00 -16.0 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region
15.95 ± 8% -16.0 0.00 -16.0 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
15.89 ± 8% -15.9 0.00 -15.9 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush
15.86 ± 8% -15.9 0.00 -15.9 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
15.82 ± 8% -15.8 0.00 -15.8 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
9.32 ± 9% -9.3 0.00 -9.3 0.00 perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
8.52 ± 8% -8.5 0.00 -8.5 0.00 perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
7.90 ± 4% -7.9 0.00 -7.9 0.00 perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
7.56 ± 6% -7.6 0.00 -7.6 0.00 perf-profile.calltrace.cycles-pp.__pte_alloc.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
7.55 ± 6% -7.6 0.00 -7.6 0.00 perf-profile.calltrace.cycles-pp.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault.handle_mm_fault
6.51 ± 8% -6.5 0.00 -6.5 0.00 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault
6.51 ± 8% -6.5 0.00 -6.5 0.00 perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc.do_anonymous_page
6.41 ± 8% -6.4 0.00 -6.4 0.00 perf-profile.calltrace.cycles-pp.__memcg_kmem_charge_page.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
0.00 +0.5 0.54 ± 4% +0.6 0.55 ± 3% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page
0.00 +0.7 0.70 ± 3% +0.7 0.71 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault
0.00 +1.4 1.39 +1.4 1.38 ± 3% perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
19.16 ± 6% +57.0 76.21 +57.5 76.66 perf-profile.calltrace.cycles-pp.asm_exc_page_fault
19.09 ± 6% +57.1 76.16 +57.5 76.61 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
19.10 ± 6% +57.1 76.17 +57.5 76.61 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
18.99 ± 6% +57.1 76.14 +57.6 76.58 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
18.43 ± 7% +57.7 76.11 +58.1 76.56 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.00 +73.0 73.00 +73.5 73.46 perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
0.00 +75.1 75.15 +75.6 75.60 perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.00 +75.9 75.92 +76.4 76.37 perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
58.03 ± 2% -56.0 2.05 -56.0 2.03 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
58.02 ± 2% -56.0 2.04 -56.0 2.02 perf-profile.children.cycles-pp.do_syscall_64
56.57 ± 2% -55.1 1.45 ± 2% -55.1 1.45 ± 2% perf-profile.children.cycles-pp.__munmap
56.06 ± 2% -55.1 0.97 -55.1 0.96 perf-profile.children.cycles-pp.unmap_region
56.51 ± 2% -55.1 1.43 -55.1 1.42 ± 2% perf-profile.children.cycles-pp.do_vmi_munmap
56.48 ± 2% -55.0 1.43 ± 2% -55.0 1.43 ± 2% perf-profile.children.cycles-pp.__vm_munmap
56.48 ± 2% -55.0 1.44 ± 2% -55.0 1.43 ± 2% perf-profile.children.cycles-pp.__x64_sys_munmap
56.40 ± 2% -55.0 1.40 -55.0 1.39 ± 2% perf-profile.children.cycles-pp.do_vmi_align_munmap
35.28 -34.6 0.66 -34.6 0.66 perf-profile.children.cycles-pp.tlb_finish_mmu
35.18 -34.6 0.58 -34.6 0.57 perf-profile.children.cycles-pp.tlb_batch_pages_flush
35.16 -34.6 0.57 -34.6 0.57 perf-profile.children.cycles-pp.release_pages
32.12 ± 8% -32.1 0.05 -32.1 0.04 ± 37% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
31.85 ± 8% -31.8 0.06 -31.8 0.06 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
31.74 ± 8% -31.7 0.00 -31.7 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
18.40 ± 7% -18.4 0.00 -18.4 0.00 perf-profile.children.cycles-pp.do_anonymous_page
17.43 ± 7% -17.4 0.00 -17.4 0.00 perf-profile.children.cycles-pp.lru_add_drain
17.43 ± 7% -17.4 0.00 -17.4 0.00 perf-profile.children.cycles-pp.lru_add_drain_cpu
17.43 ± 7% -17.3 0.10 ± 5% -17.3 0.10 ± 3% perf-profile.children.cycles-pp.folio_batch_move_lru
17.23 ± 6% -17.2 0.00 -17.2 0.00 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
9.32 ± 9% -9.3 0.00 -9.3 0.00 perf-profile.children.cycles-pp.uncharge_folio
8.57 ± 8% -8.4 0.16 ± 4% -8.4 0.15 ± 4% perf-profile.children.cycles-pp.__mem_cgroup_charge
7.90 ± 4% -7.8 0.14 ± 5% -7.8 0.14 ± 4% perf-profile.children.cycles-pp.uncharge_batch
7.57 ± 6% -7.6 0.00 -7.6 0.00 perf-profile.children.cycles-pp.__pte_alloc
7.55 ± 6% -7.4 0.16 ± 3% -7.4 0.16 ± 3% perf-profile.children.cycles-pp.pte_alloc_one
6.54 ± 2% -6.5 0.00 -6.5 0.00 perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
6.59 ± 8% -6.4 0.22 ± 2% -6.4 0.22 ± 3% perf-profile.children.cycles-pp.alloc_pages_mpol
6.58 ± 8% -6.4 0.21 ± 2% -6.4 0.22 ± 2% perf-profile.children.cycles-pp.__alloc_pages
6.41 ± 8% -6.3 0.07 ± 5% -6.3 0.07 ± 5% perf-profile.children.cycles-pp.__memcg_kmem_charge_page
4.48 ± 2% -4.3 0.18 ± 4% -4.3 0.18 ± 3% perf-profile.children.cycles-pp.__mod_lruvec_page_state
3.08 ± 4% -3.0 0.09 ± 7% -3.0 0.09 ± 6% perf-profile.children.cycles-pp.page_counter_uncharge
1.74 ± 8% -1.6 0.10 -1.6 0.10 ± 4% perf-profile.children.cycles-pp.kmem_cache_alloc
1.72 ± 2% -1.5 0.23 ± 2% -1.5 0.23 ± 4% perf-profile.children.cycles-pp.unmap_vmas
1.71 ± 2% -1.5 0.22 ± 3% -1.5 0.22 ± 4% perf-profile.children.cycles-pp.unmap_page_range
1.70 ± 2% -1.5 0.21 ± 3% -1.5 0.21 ± 4% perf-profile.children.cycles-pp.zap_pmd_range
1.36 ± 16% -1.3 0.09 ± 4% -1.3 0.09 ± 4% perf-profile.children.cycles-pp.native_irq_return_iret
1.18 ± 2% -1.1 0.08 ± 7% -1.1 0.08 ± 5% perf-profile.children.cycles-pp.page_remove_rmap
1.16 ± 2% -1.1 0.08 ± 4% -1.1 0.07 ± 6% perf-profile.children.cycles-pp.folio_add_new_anon_rmap
1.45 ± 6% -1.0 0.44 ± 2% -1.0 0.44 ± 2% perf-profile.children.cycles-pp.__mmap
1.05 -1.0 0.06 ± 7% -1.0 0.06 ± 7% perf-profile.children.cycles-pp.lru_add_fn
1.03 ± 7% -1.0 0.04 ± 37% -1.0 0.04 ± 37% perf-profile.children.cycles-pp.__anon_vma_prepare
1.38 ± 6% -1.0 0.42 ± 3% -1.0 0.42 ± 2% perf-profile.children.cycles-pp.vm_mmap_pgoff
1.33 ± 6% -0.9 0.40 ± 2% -0.9 0.40 ± 2% perf-profile.children.cycles-pp.do_mmap
0.93 ± 11% -0.9 0.03 ± 77% -0.9 0.02 ±100% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
1.17 ± 7% -0.8 0.34 ± 2% -0.8 0.34 ± 2% perf-profile.children.cycles-pp.mmap_region
0.87 ± 5% -0.8 0.06 ± 5% -0.8 0.06 ± 9% perf-profile.children.cycles-pp.kmem_cache_free
0.89 ± 5% -0.7 0.19 ± 4% -0.7 0.20 ± 2% perf-profile.children.cycles-pp.rcu_do_batch
0.89 ± 5% -0.7 0.20 ± 4% -0.7 0.20 ± 3% perf-profile.children.cycles-pp.rcu_core
0.90 ± 5% -0.7 0.21 ± 4% -0.7 0.21 ± 2% perf-profile.children.cycles-pp.__do_softirq
0.74 ± 6% -0.7 0.06 ± 5% -0.7 0.06 ± 8% perf-profile.children.cycles-pp.irq_exit_rcu
0.72 ± 10% -0.7 0.06 ± 5% -0.7 0.06 ± 7% perf-profile.children.cycles-pp.vm_area_alloc
1.01 ± 4% -0.4 0.61 ± 4% -0.4 0.61 ± 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.14 ± 5% -0.1 0.02 ±100% -0.1 0.02 ±100% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
0.16 ± 9% -0.1 0.07 ± 7% -0.1 0.07 perf-profile.children.cycles-pp.__slab_free
0.15 ± 3% -0.1 0.06 ± 5% -0.1 0.06 ± 5% perf-profile.children.cycles-pp.get_unmapped_area
0.08 ± 22% -0.0 0.05 ± 41% -0.0 0.04 ± 37% perf-profile.children.cycles-pp.generic_perform_write
0.08 ± 22% -0.0 0.05 ± 41% -0.0 0.04 ± 38% perf-profile.children.cycles-pp.shmem_file_write_iter
0.09 ± 22% -0.0 0.05 ± 43% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.record__pushfn
0.09 ± 22% -0.0 0.05 ± 43% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.writen
0.09 ± 22% -0.0 0.05 ± 43% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.__libc_write
0.11 ± 8% -0.0 0.07 ± 6% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.rcu_cblist_dequeue
0.16 ± 7% -0.0 0.13 ± 4% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.try_charge_memcg
0.09 ± 22% -0.0 0.07 ± 18% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.vfs_write
0.09 ± 22% -0.0 0.07 ± 18% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.ksys_write
0.15 ± 4% -0.0 0.13 ± 3% -0.0 0.13 ± 2% perf-profile.children.cycles-pp.get_page_from_freelist
0.09 -0.0 0.08 ± 4% -0.0 0.08 perf-profile.children.cycles-pp.flush_tlb_mm_range
0.06 +0.0 0.09 ± 4% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.rcu_all_qs
0.17 ± 6% +0.0 0.20 ± 4% +0.0 0.20 ± 3% perf-profile.children.cycles-pp.kthread
0.17 ± 6% +0.0 0.20 ± 4% +0.0 0.20 ± 3% perf-profile.children.cycles-pp.ret_from_fork_asm
0.17 ± 6% +0.0 0.20 ± 4% +0.0 0.20 ± 3% perf-profile.children.cycles-pp.ret_from_fork
0.12 ± 4% +0.0 0.16 ± 3% +0.0 0.16 ± 2% perf-profile.children.cycles-pp.mas_store_prealloc
0.08 ± 6% +0.0 0.12 ± 2% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.vma_alloc_folio
0.00 +0.0 0.04 ± 37% +0.1 0.05 perf-profile.children.cycles-pp.memcg_check_events
0.00 +0.0 0.04 ± 37% +0.1 0.05 perf-profile.children.cycles-pp.thp_get_unmapped_area
0.00 +0.1 0.05 +0.0 0.04 ± 57% perf-profile.children.cycles-pp.free_tail_page_prepare
0.00 +0.1 0.05 +0.1 0.05 perf-profile.children.cycles-pp.mas_destroy
0.00 +0.1 0.05 ± 9% +0.1 0.05 ± 9% perf-profile.children.cycles-pp.update_load_avg
0.00 +0.1 0.06 ± 7% +0.1 0.07 ± 7% perf-profile.children.cycles-pp.native_flush_tlb_one_user
0.00 +0.1 0.07 ± 7% +0.1 0.07 ± 6% perf-profile.children.cycles-pp.__page_cache_release
0.00 +0.1 0.07 ± 4% +0.1 0.07 ± 5% perf-profile.children.cycles-pp.mas_topiary_replace
0.08 ± 5% +0.1 0.16 ± 3% +0.1 0.15 ± 3% perf-profile.children.cycles-pp.mas_alloc_nodes
0.00 +0.1 0.08 ± 4% +0.1 0.08 ± 6% perf-profile.children.cycles-pp.prep_compound_page
0.08 ± 6% +0.1 0.17 ± 5% +0.1 0.18 ± 5% perf-profile.children.cycles-pp.task_tick_fair
0.00 +0.1 0.10 ± 5% +0.1 0.10 ± 4% perf-profile.children.cycles-pp.folio_add_lru_vma
0.00 +0.1 0.11 ± 4% +0.1 0.11 ± 5% perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
0.00 +0.1 0.12 ± 2% +0.1 0.12 ± 3% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
0.00 +0.1 0.13 ± 3% +0.1 0.13 ± 2% perf-profile.children.cycles-pp.mas_split
0.00 +0.1 0.13 +0.1 0.13 ± 3% perf-profile.children.cycles-pp._raw_spin_lock
0.11 ± 4% +0.1 0.24 ± 3% +0.1 0.25 ± 4% perf-profile.children.cycles-pp.scheduler_tick
0.00 +0.1 0.14 ± 4% +0.1 0.14 ± 5% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
0.00 +0.1 0.14 ± 3% +0.1 0.14 ± 3% perf-profile.children.cycles-pp.mas_wr_bnode
0.00 +0.1 0.14 ± 5% +0.1 0.14 ± 3% perf-profile.children.cycles-pp.destroy_large_folio
0.00 +0.1 0.15 ± 4% +0.1 0.15 ± 4% perf-profile.children.cycles-pp.mas_spanning_rebalance
0.00 +0.1 0.15 ± 2% +0.2 0.15 ± 4% perf-profile.children.cycles-pp.zap_huge_pmd
0.00 +0.2 0.17 ± 3% +0.2 0.17 ± 3% perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page
0.19 ± 3% +0.2 0.38 +0.2 0.38 ± 2% perf-profile.children.cycles-pp.mas_store_gfp
0.00 +0.2 0.19 ± 3% +0.2 0.18 ± 4% perf-profile.children.cycles-pp.__mod_node_page_state
0.00 +0.2 0.20 ± 3% +0.2 0.20 ± 4% perf-profile.children.cycles-pp.__mod_lruvec_state
0.12 ± 3% +0.2 0.35 +0.2 0.36 ± 3% perf-profile.children.cycles-pp.update_process_times
0.12 ± 3% +0.2 0.36 ± 2% +0.2 0.36 ± 2% perf-profile.children.cycles-pp.tick_sched_handle
0.14 ± 3% +0.2 0.39 +0.3 0.40 ± 4% perf-profile.children.cycles-pp.tick_nohz_highres_handler
0.27 ± 2% +0.3 0.52 ± 3% +0.3 0.52 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
0.27 ± 2% +0.3 0.52 ± 4% +0.3 0.53 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.21 ± 4% +0.3 0.48 ± 3% +0.3 0.48 ± 2% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.00 +0.3 0.31 ± 2% +0.3 0.31 ± 3% perf-profile.children.cycles-pp.mas_wr_spanning_store
0.00 +0.4 0.38 +0.4 0.38 ± 2% perf-profile.children.cycles-pp.free_unref_page_prepare
0.00 +0.4 0.39 +0.4 0.40 perf-profile.children.cycles-pp.free_unref_page
0.13 ± 4% +1.3 1.42 +1.3 1.41 ± 3% perf-profile.children.cycles-pp.__cond_resched
19.19 ± 6% +57.0 76.23 +57.5 76.68 perf-profile.children.cycles-pp.asm_exc_page_fault
19.11 ± 6% +57.1 76.18 +57.5 76.63 perf-profile.children.cycles-pp.exc_page_fault
19.10 ± 6% +57.1 76.18 +57.5 76.62 perf-profile.children.cycles-pp.do_user_addr_fault
19.00 ± 6% +57.1 76.15 +57.6 76.59 perf-profile.children.cycles-pp.handle_mm_fault
18.44 ± 7% +57.7 76.12 +58.1 76.57 perf-profile.children.cycles-pp.__handle_mm_fault
0.06 ± 9% +73.3 73.38 +73.8 73.84 perf-profile.children.cycles-pp.clear_page_erms
0.00 +75.2 75.25 +75.7 75.70 perf-profile.children.cycles-pp.clear_huge_page
0.00 +75.9 75.92 +76.4 76.37 perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
31.74 ± 8% -31.7 0.00 -31.7 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
9.22 ± 9% -9.2 0.00 -9.2 0.00 perf-profile.self.cycles-pp.uncharge_folio
6.50 ± 2% -6.5 0.00 -6.5 0.00 perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
5.56 ± 9% -5.6 0.00 -5.6 0.00 perf-profile.self.cycles-pp.__memcg_kmem_charge_page
1.94 ± 4% -1.9 0.08 ± 8% -1.9 0.08 ± 7% perf-profile.self.cycles-pp.page_counter_uncharge
1.36 ± 16% -1.3 0.09 ± 4% -1.3 0.09 ± 4% perf-profile.self.cycles-pp.native_irq_return_iret
0.16 ± 9% -0.1 0.07 ± 7% -0.1 0.07 perf-profile.self.cycles-pp.__slab_free
0.10 ± 8% -0.0 0.07 ± 6% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.rcu_cblist_dequeue
0.07 ± 7% +0.0 0.08 ± 5% +0.0 0.08 ± 7% perf-profile.self.cycles-pp.page_counter_try_charge
0.00 +0.1 0.06 ± 7% +0.1 0.07 ± 7% perf-profile.self.cycles-pp.native_flush_tlb_one_user
0.01 ±264% +0.1 0.07 ± 4% +0.1 0.07 perf-profile.self.cycles-pp.rcu_all_qs
0.00 +0.1 0.07 ± 4% +0.1 0.07 ± 4% perf-profile.self.cycles-pp.__do_huge_pmd_anonymous_page
0.00 +0.1 0.08 ± 6% +0.1 0.08 ± 6% perf-profile.self.cycles-pp.prep_compound_page
0.00 +0.1 0.08 ± 5% +0.1 0.08 ± 6% perf-profile.self.cycles-pp.__kmem_cache_alloc_bulk
0.00 +0.1 0.13 ± 2% +0.1 0.13 ± 2% perf-profile.self.cycles-pp._raw_spin_lock
0.00 +0.2 0.18 ± 3% +0.2 0.18 ± 4% perf-profile.self.cycles-pp.__mod_node_page_state
0.00 +0.3 0.30 ± 2% +0.3 0.30 perf-profile.self.cycles-pp.free_unref_page_prepare
0.00 +0.6 0.58 ± 3% +0.6 0.58 ± 5% perf-profile.self.cycles-pp.clear_huge_page
0.08 ± 4% +1.2 1.25 +1.2 1.24 ± 4% perf-profile.self.cycles-pp.__cond_resched
0.05 ± 9% +72.8 72.81 +73.2 73.26 perf-profile.self.cycles-pp.clear_page_erms
[-- Attachment #4: phoronix-regressions --]
[-- Type: text/plain, Size: 37812 bytes --]
(10)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
6787 -2.9% 6592 -2.9% 6589 vmstat.system.cs
0.18 ± 23% -0.0 0.15 ± 44% -0.1 0.12 ± 23% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.08 ± 49% +0.1 0.15 ± 16% +0.0 0.08 ± 61% perf-profile.self.cycles-pp.ct_kernel_enter
352936 +42.1% 501525 +6.9% 377117 meminfo.AnonHugePages
518885 +26.2% 654716 -2.1% 508198 meminfo.AnonPages
1334861 +11.4% 1486492 -0.9% 1322775 meminfo.Inactive(anon)
1.51 -0.1 1.45 -0.1 1.46 turbostat.C1E%
24.23 -1.2% 23.93 -0.7% 24.05 turbostat.CorWatt
2.64 -4.4% 2.52 -4.3% 2.53 turbostat.Pkg%pc2
25.40 -1.3% 25.06 -0.9% 25.18 turbostat.PkgWatt
3.30 -2.8% 3.20 -2.9% 3.20 turbostat.RAMWatt
20115 -4.5% 19211 -4.5% 19217 phoronix-test-suite.ramspeed.Add.Integer.mb_s
284.00 +3.5% 293.95 +3.5% 293.96 phoronix-test-suite.time.elapsed_time
284.00 +3.5% 293.95 +3.5% 293.96 phoronix-test-suite.time.elapsed_time.max
120322 +1.6% 122291 -0.2% 120098 phoronix-test-suite.time.maximum_resident_set_size
281626 -54.7% 127627 -54.7% 127530 phoronix-test-suite.time.minor_page_faults
259.16 +4.2% 270.02 +4.1% 269.86 phoronix-test-suite.time.user_time
284.00 +3.5% 293.95 +3.5% 293.96 time.elapsed_time
284.00 +3.5% 293.95 +3.5% 293.96 time.elapsed_time.max
120322 +1.6% 122291 -0.2% 120098 time.maximum_resident_set_size
281626 -54.7% 127627 -54.7% 127530 time.minor_page_faults
1.72 -7.6% 1.59 -7.2% 1.60 time.system_time
259.16 +4.2% 270.02 +4.1% 269.86 time.user_time
129720 +26.2% 163681 -2.1% 127047 proc-vmstat.nr_anon_pages
172.33 +42.1% 244.89 +6.8% 184.14 proc-vmstat.nr_anon_transparent_hugepages
360027 -1.0% 356428 +0.1% 360507 proc-vmstat.nr_dirty_background_threshold
720935 -1.0% 713729 +0.1% 721897 proc-vmstat.nr_dirty_threshold
3328684 -1.1% 3292559 +0.1% 3333390 proc-vmstat.nr_free_pages
333715 +11.4% 371625 -0.9% 330692 proc-vmstat.nr_inactive_anon
1732 +5.1% 1820 +4.8% 1816 proc-vmstat.nr_page_table_pages
333715 +11.4% 371625 -0.9% 330692 proc-vmstat.nr_zone_inactive_anon
855883 -34.6% 560138 -34.9% 557459 proc-vmstat.numa_hit
855859 -34.6% 560157 -34.9% 557429 proc-vmstat.numa_local
5552895 +1.1% 5611662 +0.1% 5559236 proc-vmstat.pgalloc_normal
1080638 -26.7% 792254 -27.0% 788881 proc-vmstat.pgfault
109646 +3.0% 112918 +2.6% 112483 proc-vmstat.pgreuse
9026 +7.6% 9714 +6.6% 9619 proc-vmstat.thp_fault_alloc
1.165e+08 -3.6% 1.123e+08 -3.3% 1.126e+08 perf-stat.i.branch-instructions
3.38 +0.1 3.45 +0.1 3.49 perf-stat.i.branch-miss-rate%
4.13e+08 -2.7% 4.018e+08 -2.9% 4.011e+08 perf-stat.i.cache-misses
5.336e+08 -2.3% 5.212e+08 -2.4% 5.206e+08 perf-stat.i.cache-references
6824 -2.9% 6629 -2.9% 6624 perf-stat.i.context-switches
4.05 +3.8% 4.20 +3.7% 4.20 perf-stat.i.cpi
447744 ± 3% -17.3% 370369 ± 3% -15.0% 380580 perf-stat.i.dTLB-load-misses
1.119e+09 -3.3% 1.082e+09 -3.4% 1.081e+09 perf-stat.i.dTLB-loads
0.02 ± 10% -0.0 0.01 ± 14% -0.0 0.01 ± 3% perf-stat.i.dTLB-store-miss-rate%
84207 ± 7% -58.4% 35034 ± 13% -55.8% 37210 ± 2% perf-stat.i.dTLB-store-misses
7.312e+08 -3.3% 7.069e+08 -3.4% 7.065e+08 perf-stat.i.dTLB-stores
127863 -2.8% 124330 -3.6% 123263 perf-stat.i.iTLB-load-misses
145042 -2.5% 141459 -3.0% 140719 perf-stat.i.iTLB-loads
2.393e+09 -3.3% 2.313e+09 -3.4% 2.313e+09 perf-stat.i.instructions
0.28 -3.9% 0.27 -3.7% 0.27 perf-stat.i.ipc
220.56 -3.0% 213.92 -3.1% 213.80 perf-stat.i.metric.M/sec
3580 -31.0% 2470 -30.9% 2476 perf-stat.i.minor-faults
49017829 +2.1% 50065997 +2.1% 50037948 perf-stat.i.node-loads
98043570 -2.7% 95377592 -2.9% 95180579 perf-stat.i.node-stores
3585 -31.0% 2474 -30.8% 2480 perf-stat.i.page-faults
3.64 +3.8% 3.78 +3.8% 3.78 perf-stat.overall.cpi
21.10 +3.2% 21.77 +3.3% 21.79 perf-stat.overall.cycles-between-cache-misses
0.04 ± 3% -0.0 0.03 ± 3% -0.0 0.04 perf-stat.overall.dTLB-load-miss-rate%
0.01 ± 7% -0.0 0.00 ± 13% -0.0 0.01 ± 2% perf-stat.overall.dTLB-store-miss-rate%
0.27 -3.7% 0.26 -3.7% 0.26 perf-stat.overall.ipc
1.16e+08 -3.6% 1.119e+08 -3.3% 1.121e+08 perf-stat.ps.branch-instructions
4.117e+08 -2.7% 4.006e+08 -2.9% 3.999e+08 perf-stat.ps.cache-misses
5.319e+08 -2.3% 5.195e+08 -2.4% 5.19e+08 perf-stat.ps.cache-references
6798 -2.8% 6605 -2.9% 6600 perf-stat.ps.context-switches
446139 ± 3% -17.3% 369055 ± 3% -15.0% 379224 perf-stat.ps.dTLB-load-misses
1.115e+09 -3.3% 1.078e+09 -3.4% 1.078e+09 perf-stat.ps.dTLB-loads
83922 ± 7% -58.4% 34908 ± 13% -55.8% 37075 ± 2% perf-stat.ps.dTLB-store-misses
7.288e+08 -3.3% 7.047e+08 -3.4% 7.042e+08 perf-stat.ps.dTLB-stores
127384 -2.7% 123884 -3.6% 122817 perf-stat.ps.iTLB-load-misses
144399 -2.4% 140903 -2.9% 140152 perf-stat.ps.iTLB-loads
2.385e+09 -3.3% 2.306e+09 -3.4% 2.305e+09 perf-stat.ps.instructions
3566 -31.0% 2460 -30.9% 2465 perf-stat.ps.minor-faults
48864755 +2.1% 49912372 +2.1% 49884745 perf-stat.ps.node-loads
97730481 -2.7% 95083043 -2.9% 94887981 perf-stat.ps.node-stores
3571 -31.0% 2465 -30.8% 2470 perf-stat.ps.page-faults
(11)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
6853 -2.6% 6678 -2.7% 6668 vmstat.system.cs
353760 +40.0% 495232 +6.4% 376514 meminfo.AnonHugePages
519691 +25.5% 652412 -2.1% 508766 meminfo.AnonPages
1335612 +11.1% 1484265 -0.9% 1323541 meminfo.Inactive(anon)
1.52 -0.0 1.48 -0.0 1.48 turbostat.C1E%
2.65 -3.0% 2.57 -2.8% 2.58 turbostat.Pkg%pc2
3.32 -2.6% 3.23 -2.6% 3.23 turbostat.RAMWatt
19960 -2.9% 19378 -3.0% 19366 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
281.37 +3.0% 289.87 +3.1% 290.12 phoronix-test-suite.time.elapsed_time
281.37 +3.0% 289.87 +3.1% 290.12 phoronix-test-suite.time.elapsed_time.max
120220 +1.6% 122163 -0.1% 120158 phoronix-test-suite.time.maximum_resident_set_size
281853 -54.7% 127777 -54.7% 127780 phoronix-test-suite.time.minor_page_faults
257.32 +3.4% 265.97 +3.4% 265.99 phoronix-test-suite.time.user_time
281.37 +3.0% 289.87 +3.1% 290.12 time.elapsed_time
281.37 +3.0% 289.87 +3.1% 290.12 time.elapsed_time.max
120220 +1.6% 122163 -0.1% 120158 time.maximum_resident_set_size
281853 -54.7% 127777 -54.7% 127780 time.minor_page_faults
1.74 -8.5% 1.59 -9.1% 1.58 time.system_time
257.32 +3.4% 265.97 +3.4% 265.99 time.user_time
0.80 ± 23% -0.4 0.41 ± 78% -0.3 0.54 ± 40% perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.79 ± 21% -0.4 0.40 ± 77% -0.3 0.54 ± 39% perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.77 ± 20% -0.4 0.40 ± 77% -0.3 0.52 ± 39% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
1.39 ± 15% -0.3 1.04 ± 22% -0.2 1.20 ± 14% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
1.39 ± 15% -0.3 1.04 ± 21% -0.2 1.20 ± 14% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.80 ± 23% -0.3 0.55 ± 29% -0.2 0.60 ± 16% perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
0.79 ± 21% -0.3 0.54 ± 28% -0.2 0.60 ± 16% perf-profile.children.cycles-pp.clear_huge_page
0.79 ± 20% -0.2 0.58 ± 31% -0.2 0.58 ± 17% perf-profile.children.cycles-pp.clear_page_erms
0.78 ± 20% -0.2 0.58 ± 31% -0.2 0.58 ± 17% perf-profile.self.cycles-pp.clear_page_erms
129919 +25.5% 163102 -2.1% 127191 proc-vmstat.nr_anon_pages
172.73 +40.0% 241.81 +6.4% 183.84 proc-vmstat.nr_anon_transparent_hugepages
3328013 -1.1% 3291433 +0.1% 3332863 proc-vmstat.nr_free_pages
333903 +11.1% 371065 -0.9% 330885 proc-vmstat.nr_inactive_anon
1740 +4.5% 1819 +4.4% 1817 proc-vmstat.nr_page_table_pages
333903 +11.1% 371065 -0.9% 330885 proc-vmstat.nr_zone_inactive_anon
853676 -34.9% 556019 -34.7% 557219 proc-vmstat.numa_hit
853653 -34.9% 555977 -34.7% 557192 proc-vmstat.numa_local
5551461 +1.0% 5607022 +0.1% 5559594 proc-vmstat.pgalloc_normal
1075659 -27.0% 785124 -26.9% 786363 proc-vmstat.pgfault
108727 +2.6% 111582 +2.6% 111546 proc-vmstat.pgreuse
9027 +7.6% 9714 +6.6% 9619 proc-vmstat.thp_fault_alloc
1.184e+08 -3.3% 1.145e+08 -3.2% 1.146e+08 perf-stat.i.branch-instructions
5500836 -2.4% 5367239 -2.4% 5368946 perf-stat.i.branch-misses
4.139e+08 -2.5% 4.036e+08 -2.6% 4.034e+08 perf-stat.i.cache-misses
5.246e+08 -2.5% 5.114e+08 -2.5% 5.117e+08 perf-stat.i.cache-references
6889 -2.6% 6710 -2.6% 6710 perf-stat.i.context-switches
4.31 +2.6% 4.42 +2.7% 4.43 perf-stat.i.cpi
0.10 ± 2% -0.0 0.09 ± 2% -0.0 0.08 ± 3% perf-stat.i.dTLB-load-miss-rate%
454444 -16.1% 381426 -18.4% 370782 ± 3% perf-stat.i.dTLB-load-misses
8.087e+08 -3.0% 7.841e+08 -3.1% 7.839e+08 perf-stat.i.dTLB-loads
0.02 -0.0 0.01 ± 2% -0.0 0.01 ± 14% perf-stat.i.dTLB-store-miss-rate%
86294 -57.1% 36992 ± 2% -59.7% 34809 ± 13% perf-stat.i.dTLB-store-misses
5.311e+08 -3.0% 5.151e+08 -3.1% 5.149e+08 perf-stat.i.dTLB-stores
129929 -4.0% 124682 -3.3% 125639 perf-stat.i.iTLB-load-misses
146749 -3.3% 141975 -3.7% 141337 perf-stat.i.iTLB-loads
2.249e+09 -3.1% 2.18e+09 -3.1% 2.179e+09 perf-stat.i.instructions
0.26 -3.0% 0.25 -2.9% 0.25 perf-stat.i.ipc
179.65 -2.7% 174.83 -2.7% 174.79 perf-stat.i.metric.M/sec
3614 -31.4% 2478 -31.1% 2490 perf-stat.i.minor-faults
65665882 -0.5% 65367211 -0.8% 65111743 perf-stat.i.node-loads
3618 -31.4% 2483 -31.1% 2494 perf-stat.i.page-faults
3.88 +3.3% 4.01 +3.3% 4.01 perf-stat.overall.cpi
21.10 +2.7% 21.67 +2.7% 21.67 perf-stat.overall.cycles-between-cache-misses
0.06 -0.0 0.05 -0.0 0.05 ± 3% perf-stat.overall.dTLB-load-miss-rate%
0.02 -0.0 0.01 ± 2% -0.0 0.01 ± 13% perf-stat.overall.dTLB-store-miss-rate%
0.26 -3.2% 0.25 -3.2% 0.25 perf-stat.overall.ipc
1.179e+08 -3.3% 1.14e+08 -3.2% 1.141e+08 perf-stat.ps.branch-instructions
5473781 -2.4% 5340720 -2.4% 5344770 perf-stat.ps.branch-misses
4.126e+08 -2.5% 4.023e+08 -2.5% 4.021e+08 perf-stat.ps.cache-misses
5.229e+08 -2.5% 5.098e+08 -2.5% 5.1e+08 perf-stat.ps.cache-references
6864 -2.6% 6687 -2.6% 6687 perf-stat.ps.context-switches
452799 -16.1% 380049 -18.4% 369456 ± 3% perf-stat.ps.dTLB-load-misses
8.06e+08 -3.0% 7.815e+08 -3.1% 7.814e+08 perf-stat.ps.dTLB-loads
85997 -57.1% 36856 ± 2% -59.7% 34683 ± 13% perf-stat.ps.dTLB-store-misses
5.294e+08 -3.0% 5.135e+08 -3.0% 5.133e+08 perf-stat.ps.dTLB-stores
129440 -4.0% 124225 -3.3% 125181 perf-stat.ps.iTLB-load-misses
146145 -3.2% 141400 -3.7% 140780 perf-stat.ps.iTLB-loads
2.241e+09 -3.1% 2.172e+09 -3.1% 2.172e+09 perf-stat.ps.instructions
3599 -31.4% 2468 -31.1% 2479 perf-stat.ps.minor-faults
65457458 -0.5% 65162312 -0.8% 64909293 perf-stat.ps.node-loads
3604 -31.4% 2472 -31.1% 2484 perf-stat.ps.page-faults
(12)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
607.38 ± 15% -24.4% 459.12 ± 24% -6.0% 570.75 ± 5% perf-c2c.DRAM.local
6801 -3.4% 6570 -3.1% 6587 vmstat.system.cs
15155 -0.9% 15024 -0.7% 15046 vmstat.system.in
353771 +43.0% 505977 ± 3% +7.1% 378972 meminfo.AnonHugePages
518698 +26.5% 656280 -1.7% 509920 meminfo.AnonPages
1334737 +11.5% 1487919 -0.8% 1324549 meminfo.Inactive(anon)
1.50 -0.1 1.45 -0.1 1.45 turbostat.C1E%
2.64 -4.0% 2.54 -2.8% 2.57 turbostat.Pkg%pc2
25.32 -1.1% 25.06 -0.6% 25.17 turbostat.PkgWatt
3.30 -3.0% 3.20 -2.8% 3.20 turbostat.RAMWatt
1.25 ± 8% -0.3 0.96 ± 16% -0.1 1.15 ± 22% perf-profile.children.cycles-pp.do_user_addr_fault
1.25 ± 8% -0.3 0.96 ± 16% -0.1 1.15 ± 22% perf-profile.children.cycles-pp.exc_page_fault
1.15 ± 9% -0.3 0.88 ± 16% -0.1 1.02 ± 22% perf-profile.children.cycles-pp.__handle_mm_fault
1.18 ± 9% -0.3 0.91 ± 15% -0.1 1.06 ± 21% perf-profile.children.cycles-pp.handle_mm_fault
0.23 ± 19% +0.1 0.32 ± 18% +0.1 0.33 ± 20% perf-profile.children.cycles-pp.exit_mmap
0.23 ± 19% +0.1 0.32 ± 18% +0.1 0.33 ± 20% perf-profile.children.cycles-pp.__mmput
19667 -6.4% 18399 -6.4% 18413 phoronix-test-suite.ramspeed.Triad.Integer.mb_s
284.07 +3.7% 294.53 +3.4% 293.86 phoronix-test-suite.time.elapsed_time
284.07 +3.7% 294.53 +3.4% 293.86 phoronix-test-suite.time.elapsed_time.max
120102 +1.8% 122256 +0.1% 120265 phoronix-test-suite.time.maximum_resident_set_size
281737 -54.7% 127624 -54.7% 127574 phoronix-test-suite.time.minor_page_faults
259.49 +4.1% 270.20 +4.1% 270.14 phoronix-test-suite.time.user_time
284.07 +3.7% 294.53 +3.4% 293.86 time.elapsed_time
284.07 +3.7% 294.53 +3.4% 293.86 time.elapsed_time.max
120102 +1.8% 122256 +0.1% 120265 time.maximum_resident_set_size
281737 -54.7% 127624 -54.7% 127574 time.minor_page_faults
1.72 -8.1% 1.58 -8.4% 1.58 time.system_time
259.49 +4.1% 270.20 +4.1% 270.14 time.user_time
129673 +26.5% 164074 -1.7% 127482 proc-vmstat.nr_anon_pages
172.74 +43.0% 247.07 ± 3% +7.1% 185.05 proc-vmstat.nr_anon_transparent_hugepages
360059 -1.0% 356437 +0.1% 360424 proc-vmstat.nr_dirty_background_threshold
720999 -1.0% 713747 +0.1% 721730 proc-vmstat.nr_dirty_threshold
3328170 -1.1% 3291542 +0.1% 3330837 proc-vmstat.nr_free_pages
333684 +11.5% 371981 -0.8% 331138 proc-vmstat.nr_inactive_anon
1735 +5.0% 1822 +4.9% 1819 proc-vmstat.nr_page_table_pages
333684 +11.5% 371981 -0.8% 331138 proc-vmstat.nr_zone_inactive_anon
857533 -34.7% 559940 -34.6% 560503 proc-vmstat.numa_hit
857463 -34.7% 560233 -34.6% 560504 proc-vmstat.numa_local
1082386 -26.7% 793742 -26.9% 791272 proc-vmstat.pgfault
109917 +2.8% 113044 +2.4% 112517 proc-vmstat.pgreuse
9028 +7.5% 9707 +6.5% 9619 proc-vmstat.thp_fault_alloc
1.168e+08 -6.9% 1.087e+08 ± 9% -3.5% 1.127e+08 perf-stat.i.branch-instructions
3.39 +0.1 3.47 +0.1 3.47 perf-stat.i.branch-miss-rate%
5431805 -8.1% 4990354 ± 15% -2.7% 5285279 perf-stat.i.branch-misses
4.13e+08 -3.1% 4.004e+08 -2.8% 4.015e+08 perf-stat.i.cache-misses
5.338e+08 -2.6% 5.196e+08 -2.4% 5.211e+08 perf-stat.i.cache-references
6835 -3.4% 6604 -3.1% 6623 perf-stat.i.context-switches
4.05 +3.8% 4.21 +3.6% 4.20 perf-stat.i.cpi
60.96 ± 7% +0.4% 61.20 ± 12% -7.7% 56.27 ± 3% perf-stat.i.cycles-between-cache-misses
0.08 ± 3% -0.0 0.08 ± 6% -0.0 0.08 ± 4% perf-stat.i.dTLB-load-miss-rate%
455317 -16.9% 378574 -16.7% 379148 perf-stat.i.dTLB-load-misses
1.118e+09 -3.8% 1.076e+09 -3.3% 1.082e+09 perf-stat.i.dTLB-loads
0.02 -0.0 0.01 ± 6% -0.0 0.01 ± 2% perf-stat.i.dTLB-store-miss-rate%
86796 -57.3% 37100 ± 2% -57.3% 37097 ± 2% perf-stat.i.dTLB-store-misses
7.31e+08 -3.7% 7.04e+08 -3.3% 7.068e+08 perf-stat.i.dTLB-stores
128995 -3.1% 125030 ± 2% -4.4% 123280 perf-stat.i.iTLB-load-misses
145739 -4.0% 139945 -3.7% 140348 perf-stat.i.iTLB-loads
2.395e+09 -4.3% 2.291e+09 ± 2% -3.4% 2.314e+09 perf-stat.i.instructions
0.28 -4.2% 0.27 -3.9% 0.27 perf-stat.i.ipc
30.30 ± 6% -11.5% 26.81 ± 6% -21.3% 23.84 ± 12% perf-stat.i.metric.K/sec
220.55 -3.5% 212.73 -3.0% 213.94 perf-stat.i.metric.M/sec
3598 -31.3% 2473 -31.5% 2466 perf-stat.i.minor-faults
49026239 +1.9% 49938429 +2.0% 50024868 perf-stat.i.node-loads
98013334 -3.0% 95053521 -2.8% 95291354 perf-stat.i.node-stores
3602 -31.2% 2477 -31.4% 2470 perf-stat.i.page-faults
3.64 +4.6% 3.81 +3.9% 3.78 perf-stat.overall.cpi
21.09 +3.2% 21.76 +3.3% 21.78 perf-stat.overall.cycles-between-cache-misses
0.04 -0.0 0.04 -0.0 0.04 perf-stat.overall.dTLB-load-miss-rate%
0.01 -0.0 0.01 ± 2% -0.0 0.01 ± 2% perf-stat.overall.dTLB-store-miss-rate%
0.27 -4.3% 0.26 -3.7% 0.26 perf-stat.overall.ipc
1.163e+08 -6.9% 1.083e+08 ± 9% -3.5% 1.122e+08 perf-stat.ps.branch-instructions
5405065 -8.1% 4967211 ± 15% -2.7% 5259197 perf-stat.ps.branch-misses
4.117e+08 -3.0% 3.992e+08 -2.8% 4.003e+08 perf-stat.ps.cache-misses
5.321e+08 -2.6% 5.18e+08 -2.4% 5.195e+08 perf-stat.ps.cache-references
6810 -3.4% 6579 -3.1% 6599 perf-stat.ps.context-switches
453677 -16.9% 377215 -16.7% 377792 perf-stat.ps.dTLB-load-misses
1.115e+09 -3.8% 1.072e+09 -3.3% 1.078e+09 perf-stat.ps.dTLB-loads
86500 -57.3% 36965 ± 2% -57.3% 36962 ± 2% perf-stat.ps.dTLB-store-misses
7.286e+08 -3.7% 7.019e+08 -3.3% 7.045e+08 perf-stat.ps.dTLB-stores
128515 -3.1% 124573 ± 2% -4.4% 122831 perf-stat.ps.iTLB-load-misses
145145 -4.0% 139336 -3.7% 139772 perf-stat.ps.iTLB-loads
2.386e+09 -4.3% 2.283e+09 ± 2% -3.4% 2.306e+09 perf-stat.ps.instructions
3583 -31.3% 2462 -31.5% 2455 perf-stat.ps.minor-faults
48873391 +1.9% 49781212 +2.0% 49874192 perf-stat.ps.node-loads
97704914 -3.0% 94765417 -2.8% 94999974 perf-stat.ps.node-stores
3588 -31.2% 2467 -31.4% 2460 perf-stat.ps.page-faults
(13)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
6786 -2.9% 6587 -2.9% 6586 vmstat.system.cs
355264 ± 2% +41.1% 501244 +6.5% 378393 meminfo.AnonHugePages
520377 +25.7% 654330 -2.1% 509644 meminfo.AnonPages
1336461 +11.2% 1486141 -0.9% 1324302 meminfo.Inactive(anon)
1.50 -0.0 1.46 -0.1 1.45 turbostat.C1E%
24.20 -1.2% 23.90 -0.9% 23.98 turbostat.CorWatt
2.62 -2.4% 2.56 -3.7% 2.53 turbostat.Pkg%pc2
25.37 -1.3% 25.03 -1.0% 25.12 turbostat.PkgWatt
3.30 -3.1% 3.20 -3.0% 3.20 turbostat.RAMWatt
19799 -3.5% 19106 -3.4% 19117 phoronix-test-suite.ramspeed.Average.Integer.mb_s
283.91 +3.7% 294.40 +3.6% 294.12 phoronix-test-suite.time.elapsed_time
283.91 +3.7% 294.40 +3.6% 294.12 phoronix-test-suite.time.elapsed_time.max
120150 +1.7% 122196 +0.2% 120373 phoronix-test-suite.time.maximum_resident_set_size
281692 -54.7% 127689 -54.7% 127587 phoronix-test-suite.time.minor_page_faults
259.47 +4.1% 270.04 +4.0% 269.86 phoronix-test-suite.time.user_time
283.91 +3.7% 294.40 +3.6% 294.12 time.elapsed_time
283.91 +3.7% 294.40 +3.6% 294.12 time.elapsed_time.max
120150 +1.7% 122196 +0.2% 120373 time.maximum_resident_set_size
281692 -54.7% 127689 -54.7% 127587 time.minor_page_faults
1.72 -7.9% 1.58 -8.4% 1.58 time.system_time
259.47 +4.1% 270.04 +4.0% 269.86 time.user_time
130092 +25.7% 163578 -2.1% 127411 proc-vmstat.nr_anon_pages
173.47 ± 2% +41.1% 244.74 +6.5% 184.76 proc-vmstat.nr_anon_transparent_hugepages
3328419 -1.1% 3292662 +0.1% 3332791 proc-vmstat.nr_free_pages
334114 +11.2% 371530 -0.9% 331076 proc-vmstat.nr_inactive_anon
1732 +4.7% 1814 +5.2% 1823 proc-vmstat.nr_page_table_pages
334114 +11.2% 371530 -0.9% 331076 proc-vmstat.nr_zone_inactive_anon
853734 -34.6% 558669 -34.2% 562087 proc-vmstat.numa_hit
853524 -34.6% 558628 -34.1% 562074 proc-vmstat.numa_local
5551673 +1.0% 5609595 +0.2% 5564708 proc-vmstat.pgalloc_normal
1077693 -26.6% 791019 -26.3% 794706 proc-vmstat.pgfault
109591 +3.1% 112941 +2.9% 112795 proc-vmstat.pgreuse
9027 +7.6% 9714 +6.6% 9619 proc-vmstat.thp_fault_alloc
1.58 ± 16% -0.5 1.08 ± 8% -0.4 1.16 ± 24% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
1.42 ± 14% -0.4 0.97 ± 9% -0.4 1.05 ± 24% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
1.42 ± 14% -0.4 0.98 ± 8% -0.4 1.05 ± 24% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
1.32 ± 14% -0.4 0.91 ± 12% -0.3 0.98 ± 26% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
1.30 ± 13% -0.4 0.88 ± 13% -0.4 0.94 ± 26% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
1.64 ± 16% -0.5 1.12 ± 9% -0.4 1.24 ± 22% perf-profile.children.cycles-pp.asm_exc_page_fault
1.48 ± 15% -0.5 1.01 ± 10% -0.4 1.12 ± 21% perf-profile.children.cycles-pp.do_user_addr_fault
1.49 ± 14% -0.5 1.02 ± 9% -0.4 1.12 ± 21% perf-profile.children.cycles-pp.exc_page_fault
1.37 ± 14% -0.4 0.94 ± 12% -0.3 1.05 ± 22% perf-profile.children.cycles-pp.handle_mm_fault
1.34 ± 13% -0.4 0.91 ± 13% -0.3 1.00 ± 23% perf-profile.children.cycles-pp.__handle_mm_fault
0.78 ± 20% -0.3 0.50 ± 20% -0.2 0.54 ± 33% perf-profile.children.cycles-pp.clear_page_erms
0.76 ± 20% -0.3 0.50 ± 22% -0.2 0.53 ± 34% perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
0.75 ± 20% -0.2 0.50 ± 23% -0.2 0.53 ± 33% perf-profile.children.cycles-pp.clear_huge_page
0.25 ± 28% +0.0 0.28 ± 77% -0.1 0.11 ± 52% perf-profile.children.cycles-pp.ret_from_fork_asm
0.24 ± 28% +0.0 0.28 ± 77% -0.1 0.11 ± 52% perf-profile.children.cycles-pp.ret_from_fork
0.23 ± 31% +0.0 0.28 ± 78% -0.1 0.09 ± 59% perf-profile.children.cycles-pp.kthread
0.77 ± 20% -0.3 0.50 ± 18% -0.2 0.54 ± 33% perf-profile.self.cycles-pp.clear_page_erms
1.166e+08 -3.3% 1.127e+08 -3.0% 1.131e+08 perf-stat.i.branch-instructions
3.39 +0.1 3.49 +0.1 3.46 perf-stat.i.branch-miss-rate%
5415570 -2.0% 5304890 -2.0% 5306531 perf-stat.i.branch-misses
4.133e+08 -3.1% 4.005e+08 -2.9% 4.014e+08 perf-stat.i.cache-misses
5.335e+08 -2.5% 5.203e+08 -2.4% 5.209e+08 perf-stat.i.cache-references
6825 -3.1% 6616 -3.1% 6614 perf-stat.i.context-switches
4.06 +3.5% 4.20 +3.3% 4.19 perf-stat.i.cpi
0.08 ± 3% -0.0 0.08 ± 2% -0.0 0.08 ± 2% perf-stat.i.dTLB-load-miss-rate%
451852 -17.2% 374167 ± 4% -16.1% 378935 perf-stat.i.dTLB-load-misses
1.12e+09 -3.7% 1.079e+09 -3.5% 1.081e+09 perf-stat.i.dTLB-loads
0.02 -0.0 0.01 ± 13% -0.0 0.01 perf-stat.i.dTLB-store-miss-rate%
86119 -59.0% 35274 ± 13% -57.5% 36598 perf-stat.i.dTLB-store-misses
7.319e+08 -3.7% 7.049e+08 -3.5% 7.066e+08 perf-stat.i.dTLB-stores
128297 -2.6% 124925 -3.6% 123631 perf-stat.i.iTLB-load-misses
2.395e+09 -3.6% 2.309e+09 -3.4% 2.315e+09 perf-stat.i.instructions
0.28 -3.4% 0.27 -3.4% 0.27 perf-stat.i.ipc
220.76 -3.3% 213.44 -3.1% 213.87 perf-stat.i.metric.M/sec
3575 -30.9% 2470 -30.4% 2487 perf-stat.i.minor-faults
49267237 +1.1% 49805411 +1.4% 49954320 perf-stat.i.node-loads
98097080 -3.1% 95014639 -2.8% 95307489 perf-stat.i.node-stores
3579 -30.9% 2475 -30.4% 2492 perf-stat.i.page-faults
4.64 +0.1 4.71 +0.0 4.69 perf-stat.overall.branch-miss-rate%
3.64 +3.8% 3.78 +3.7% 3.78 perf-stat.overall.cpi
21.10 +3.3% 21.80 +3.2% 21.78 perf-stat.overall.cycles-between-cache-misses
0.04 -0.0 0.03 ± 4% -0.0 0.04 perf-stat.overall.dTLB-load-miss-rate%
0.01 -0.0 0.01 ± 13% -0.0 0.01 perf-stat.overall.dTLB-store-miss-rate%
0.27 -3.7% 0.26 -3.6% 0.26 perf-stat.overall.ipc
1.161e+08 -3.3% 1.122e+08 -3.0% 1.126e+08 perf-stat.ps.branch-instructions
5390667 -2.1% 5280037 -2.0% 5282651 perf-stat.ps.branch-misses
4.12e+08 -3.1% 3.993e+08 -2.9% 4.001e+08 perf-stat.ps.cache-misses
5.318e+08 -2.5% 5.187e+08 -2.3% 5.193e+08 perf-stat.ps.cache-references
6801 -3.1% 6593 -3.0% 6595 perf-stat.ps.context-switches
450236 -17.2% 372836 ± 4% -16.1% 377601 perf-stat.ps.dTLB-load-misses
1.117e+09 -3.7% 1.075e+09 -3.5% 1.078e+09 perf-stat.ps.dTLB-loads
85824 -59.0% 35147 ± 13% -57.5% 36467 perf-stat.ps.dTLB-store-misses
7.295e+08 -3.7% 7.027e+08 -3.4% 7.044e+08 perf-stat.ps.dTLB-stores
127825 -2.6% 124475 -3.6% 123194 perf-stat.ps.iTLB-load-misses
2.387e+09 -3.6% 2.302e+09 -3.3% 2.307e+09 perf-stat.ps.instructions
3561 -30.9% 2460 -30.4% 2478 perf-stat.ps.minor-faults
49109319 +1.1% 49654078 +1.4% 49800339 perf-stat.ps.node-loads
97782680 -3.1% 94720369 -2.8% 95009401 perf-stat.ps.node-stores
3566 -30.9% 2465 -30.4% 2482 perf-stat.ps.page-faults
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2024-01-05 9:29 ` Oliver Sang
@ 2024-01-05 14:52 ` Yin, Fengwei
2024-01-05 18:49 ` Yang Shi
1 sibling, 0 replies; 24+ messages in thread
From: Yin, Fengwei @ 2024-01-05 14:52 UTC (permalink / raw)
To: Oliver Sang, Yang Shi
Cc: Rik van Riel, oe-lkp, lkp, Linux Memory Management List,
Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang,
feng.tang
On 1/5/2024 5:29 PM, Oliver Sang wrote:
> hi, Yang Shi,
>
> On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote:
>> hi, Fengwei, hi, Yang Shi,
>>
>> On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote:
>>>
>>> On 2024/1/4 09:32, Yang Shi wrote:
>>
>> ...
>>
>>>> Can you please help test the below patch?
>>> I can't access the testing box now. Oliver will help to test your patch.
>>>
>>
>> since now the commit-id of
>> 'mm: align larger anonymous mappings on THP boundaries'
>> in linux-next/master is efa7df3e3bb5d
>> I applied the patch like below:
>>
>> * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
>> * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries
>> * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi
>>
>> our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression
>> so far, I will test d8d7b1dae6f03 for all these tests. Thanks
>>
>
> we got 12 regressions and 1 improvement results for efa7df3e3b so far.
> (4 regressions are just similar to what we reported for 1111d46b5c).
> by your patch, 6 of those regressions are fixed, others are not impacted.
>
> below is a summary:
>
> No. testsuite test status-on-efa7df3e3b fix-by-d8d7b1dae6 ?
> === ========= ==== ==================== ===================
> (1) stress-ng numa regression NO
> (2) pthread regression yes (on a Ice Lake server)
> (3) pthread regression yes (on a Cascade Lake desktop)
> (4) will-it-scale malloc1 regression NO
> (5) page_fault1 improvement no (so still improvement)
> (6) vm-scalability anon-w-seq-mt regression yes
> (7) stream nr_threads=25% regression yes
> (8) nr_threads=50% regression yes
> (9) phoronix osbench.CreateThreads regression yes (on a Cascade Lake server)
> (10) ramspeed.Add.Integer regression NO (and below 3, on a Coffee Lake desktop)
> (11) ramspeed.Average.FloatingPoint regression NO
> (12) ramspeed.Triad.Integer regression NO
> (13) ramspeed.Average.Integer regression NO
Hints on ramspeed just for your reference:
I did standalone ramspeed (not phoronix) testing on a IceLake 48C/96T +
192GB memory and didn't see the regressions on that testing box (The
testing box was retired at the end of last year and can't be accessed
anymore).
Regards
Yin, Fengwei
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
2024-01-05 9:29 ` Oliver Sang
2024-01-05 14:52 ` Yin, Fengwei
@ 2024-01-05 18:49 ` Yang Shi
1 sibling, 0 replies; 24+ messages in thread
From: Yang Shi @ 2024-01-05 18:49 UTC (permalink / raw)
To: Oliver Sang
Cc: Yin Fengwei, Rik van Riel, oe-lkp, lkp,
Linux Memory Management List, Andrew Morton, Matthew Wilcox,
Christopher Lameter, ying.huang, feng.tang
On Fri, Jan 5, 2024 at 1:29 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Yang Shi,
>
> On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote:
> > hi, Fengwei, hi, Yang Shi,
> >
> > On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote:
> > >
> > > On 2024/1/4 09:32, Yang Shi wrote:
> >
> > ...
> >
> > > > Can you please help test the below patch?
> > > I can't access the testing box now. Oliver will help to test your patch.
> > >
> >
> > since now the commit-id of
> > 'mm: align larger anonymous mappings on THP boundaries'
> > in linux-next/master is efa7df3e3bb5d
> > I applied the patch like below:
> >
> > * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
> > * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries
> > * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi
> >
> > our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression
> > so far, I will test d8d7b1dae6f03 for all these tests. Thanks
> >
>
Hi Oliver,
Thanks for running the test. Please see the inline comments.
> we got 12 regressions and 1 improvement results for efa7df3e3b so far.
> (4 regressions are just similar to what we reported for 1111d46b5c).
> by your patch, 6 of those regressions are fixed, others are not impacted.
>
> below is a summary:
>
> No. testsuite test status-on-efa7df3e3b fix-by-d8d7b1dae6 ?
> === ========= ==== ==================== ===================
> (1) stress-ng numa regression NO
> (2) pthread regression yes (on a Ice Lake server)
> (3) pthread regression yes (on a Cascade Lake desktop)
> (4) will-it-scale malloc1 regression NO
I think this was reported earlier when Rik submitted the patch in the
first place. IIRC, Huang Ying did some analysis on this one and
thought is can be ignored.
> (5) page_fault1 improvement no (so still improvement)
> (6) vm-scalability anon-w-seq-mt regression yes
> (7) stream nr_threads=25% regression yes
> (8) nr_threads=50% regression yes
> (9) phoronix osbench.CreateThreads regression yes (on a Cascade Lake server)
> (10) ramspeed.Add.Integer regression NO (and below 3, on a Coffee Lake desktop)
> (11) ramspeed.Average.FloatingPoint regression NO
> (12) ramspeed.Triad.Integer regression NO
> (13) ramspeed.Average.Integer regression NO
Not fixing the ramspeed regression is expected. But it seems like both
I and Fengwei can't reproduce the regression with running ramspeed
alone.
>
>
> below are details, for those regressions not fixed by d8d7b1dae6, attached
> full comparison.
>
>
> (1) detail comparison is attached as 'stress-ng-regression'
>
> Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
> =========================================================================================
> class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 251.12 -48.2% 130.00 -47.9% 130.75 stress-ng.numa.ops
> 4.10 -49.4% 2.08 -49.2% 2.09 stress-ng.numa.ops_per_sec
This is a new one. I did some analysis, it seems like it is not
related to the THP patch since I can reproduce it on the kernel (on
aarch64 VM) w/o the THP patch if I set THP to always.
The profiling showed the regression was caused by move_pages()
syscall. The test actually calls a bunch of NUMA syscalls, for
example, set_mempolicy(), mbind(), move_pages(), migrate_pages(), etc,
with different parameters. When calling move_pages() it tries to move
pages (at base page granularity) to different nodes in a circular
list. On my 2-node NUMA VM, it actually moves:
0th page to node #1
1st page to node #0
2nd page to node #1
3rd page to node #0
....
1023rd page to node #0
But for THP, it actually bounces the THP between the two nodes for 512 times.
The pgmigrate_success counter in /proc/vmstat also reflected the case:
For base page, the delta is 1928431, but for THP case the delta is 218466402.
The kernel already did the node check to kip move if the page is
already on the target node, but the test case just do the bounce on
purpose since it just assumes base page. So I think this case should
be run with THP disabled.
>
>
> (2)
> Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/pthread/stress-ng/60s
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 3272223 -87.8% 400430 +0.5% 3287322 stress-ng.pthread.ops
> 54516 -87.8% 6664 +0.5% 54772 stress-ng.pthread.ops_per_sec
>
>
> (3)
> Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with memory: 128G
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 2250845 -85.2% 332370 ± 6% -0.8% 2232820 stress-ng.pthread.ops
> 37510 -85.2% 5538 ± 6% -0.8% 37209 stress-ng.pthread.ops_per_sec
>
>
> (4) full comparison attached as 'will-it-scale-regression'
>
> Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 10994 -86.7% 1466 -86.7% 1460 will-it-scale.per_process_ops
> 1231431 -86.7% 164315 -86.7% 163624 will-it-scale.workload
>
>
> (5)
> Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault1/will-it-scale
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.224.threads
> 56.06 +13.3% 63.53 +13.8% 63.81 will-it-scale.224.threads_idle
> 84191 +44.8% 121869 +44.9% 122010 will-it-scale.per_thread_ops
> 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.workload
>
>
> (6)
> Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 345968 -6.5% 323566 +0.1% 346304 vm-scalability.median
> 1.91 ± 10% -0.5 1.38 ± 20% -0.2 1.75 ± 13% vm-scalability.median_stddev%
> 79708409 -7.4% 73839640 -0.1% 79613742 vm-scalability.throughput
>
>
> (7)
> Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G
> =========================================================================================
> array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
> 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 349414 -16.2% 292854 ± 2% -0.4% 348048 stream.add_bandwidth_MBps
> 347727 ± 2% -16.5% 290470 ± 2% -0.6% 345750 ± 2% stream.add_bandwidth_MBps_harmonicMean
> 332206 -21.6% 260428 ± 3% -0.4% 330838 stream.copy_bandwidth_MBps
> 330746 ± 2% -22.6% 255915 ± 3% -0.6% 328725 ± 2% stream.copy_bandwidth_MBps_harmonicMean
> 301178 -16.9% 250209 ± 2% -0.4% 299920 stream.scale_bandwidth_MBps
> 300262 -17.7% 247151 ± 2% -0.6% 298586 ± 2% stream.scale_bandwidth_MBps_harmonicMean
> 337408 -12.5% 295287 ± 2% -0.3% 336304 stream.triad_bandwidth_MBps
> 336153 -12.7% 293621 -0.5% 334624 ± 2% stream.triad_bandwidth_MBps_harmonicMean
>
>
> (8)
> Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G
> =========================================================================================
> array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
> 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/50%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 345632 -19.7% 277550 ± 3% +0.4% 347067 ± 2% stream.add_bandwidth_MBps
> 342263 ± 2% -19.7% 274704 ± 2% +0.4% 343609 ± 2% stream.add_bandwidth_MBps_harmonicMean
> 343820 -17.3% 284428 ± 3% +0.1% 344248 stream.copy_bandwidth_MBps
> 341759 ± 2% -17.8% 280934 ± 3% +0.1% 342025 ± 2% stream.copy_bandwidth_MBps_harmonicMean
> 343270 -17.8% 282330 ± 3% +0.3% 344276 ± 2% stream.scale_bandwidth_MBps
> 340812 ± 2% -18.3% 278284 ± 3% +0.3% 341672 ± 2% stream.scale_bandwidth_MBps_harmonicMean
> 364596 -19.7% 292831 ± 3% +0.4% 366145 ± 2% stream.triad_bandwidth_MBps
> 360643 ± 2% -19.9% 289034 ± 3% +0.4% 362004 ± 2% stream.triad_bandwidth_MBps_harmonicMean
>
>
> (9)
> Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with memory: 512G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Create Threads/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 26.82 +1348.4% 388.43 +4.0% 27.88 phoronix-test-suite.osbench.CreateThreads.us_per_event
>
>
> **** for below (10) - (13), full comparison is attached as phoronix-regressions
> (they all happen on a Coffee Lake desktop)
> (10)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 20115 -4.5% 19211 -4.5% 19217 phoronix-test-suite.ramspeed.Add.Integer.mb_s
>
>
> (11)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 19960 -2.9% 19378 -3.0% 19366 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
>
>
> (12)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 19667 -6.4% 18399 -6.4% 18413 phoronix-test-suite.ramspeed.Triad.Integer.mb_s
>
>
> (13)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 19799 -3.5% 19106 -3.4% 19117 phoronix-test-suite.ramspeed.Average.Integer.mb_s
>
>
>
> >
> >
> > commit d8d7b1dae6f0311d528b289cda7b317520f9a984
> > Author: 0day robot <lkp@intel.com>
> > Date: Thu Jan 4 12:51:10 2024 +0800
> >
> > fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
> >
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 40d94411d4920..91197bd387730 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> > arch_calc_vm_flag_bits(flags);
> > }
> >
> >
> > >
> > > Regards
> > > Yin, Fengwei
> > >
> > > >
> > > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > > index 40d94411d492..dc7048824be8 100644
> > > > --- a/include/linux/mman.h
> > > > +++ b/include/linux/mman.h
> > > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> > > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> > > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> > > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> > > > arch_calc_vm_flag_bits(flags);
> > > > }
> > > >
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2024-01-05 18:50 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-19 15:41 [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression kernel test robot
2023-12-20 5:27 ` Yang Shi
2023-12-20 8:29 ` Yin Fengwei
2023-12-20 15:42 ` Christoph Lameter (Ampere)
2023-12-20 20:14 ` Yang Shi
2023-12-20 20:09 ` Yang Shi
2023-12-21 0:26 ` Yang Shi
2023-12-21 0:58 ` Yin Fengwei
2023-12-21 1:02 ` Yin Fengwei
2023-12-21 4:49 ` Matthew Wilcox
2023-12-21 4:58 ` Yin Fengwei
2023-12-21 18:07 ` Yang Shi
2023-12-21 18:14 ` Matthew Wilcox
2023-12-22 1:06 ` Yin, Fengwei
2023-12-22 2:23 ` Huang, Ying
2023-12-21 13:39 ` Yin, Fengwei
2023-12-21 18:11 ` Yang Shi
2023-12-22 1:13 ` Yin, Fengwei
2024-01-04 1:32 ` Yang Shi
2024-01-04 8:18 ` Yin Fengwei
2024-01-04 8:39 ` Oliver Sang
2024-01-05 9:29 ` Oliver Sang
2024-01-05 14:52 ` Yin, Fengwei
2024-01-05 18:49 ` Yang Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox