* [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
@ 2023-12-19 15:41 kernel test robot
2023-12-20 5:27 ` Yang Shi
0 siblings, 1 reply; 24+ messages in thread
From: kernel test robot @ 2023-12-19 15:41 UTC (permalink / raw)
To: Rik van Riel
Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
Yang Shi, Matthew Wilcox, Christopher Lameter, ying.huang,
feng.tang, fengwei.yin, oliver.sang
Hello,
for this commit, we reported
"[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression"
in Aug, 2022 when it's in linux-next/master
https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
later, we reported
"[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
in Oct, 2022 when it's in linus/master
https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
and the commit was reverted finally by
commit 0ba09b1733878afe838fe35c310715fda3d46428
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun Dec 4 12:51:59 2022 -0800
now we noticed it goes into linux-next/master again.
we are not sure if there is an agreement that the benefit of this commit
has already overweight performance drop in some mirco benchmark.
we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
that
"This patch was applied to v6.1, but was reverted due to a regression
report. However it turned out the regression was not due to this patch.
I ping'ed Andrew to reapply this patch, Andrew may forget it. This
patch helps promote THP, so I rebased it onto the latest mm-unstable."
however, unfortunately, in our latest tests, we still observed below regression
upon this commit. just FYI.
kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: stress-ng
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
parameters:
nr_threads: 1
disk: 1HDD
testtime: 60s
fs: ext4
class: os
test: pthread
cpufreq_governor: performance
In addition to that, the commit also has significant impact on the following tests:
+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression |
| test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory |
| test parameters | array_size=50000000 |
| | cpufreq_governor=performance |
| | iterations=10x |
| | loop=100 |
| | nr_threads=25% |
| | omp=true |
+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression |
| test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=Average |
| | option_b=Integer |
| | test=ramspeed-1.4.3 |
+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
| test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory |
| test parameters | cpufreq_governor=performance |
| | option_a=Average |
| | option_b=Floating Point |
| | test=ramspeed-1.4.3 |
+------------------+-----------------------------------------------------------------------------------------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
commit:
30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
---------------- ---------------------------
%stddev %change %stddev
\ | \
13405796 -65.5% 4620124 cpuidle..usage
8.00 +8.2% 8.66 ± 2% iostat.cpu.system
1.61 -60.6% 0.63 iostat.cpu.user
597.50 ± 14% -64.3% 213.50 ± 14% perf-c2c.DRAM.local
1882 ± 14% -74.7% 476.83 ± 7% perf-c2c.HITM.local
3768436 -12.9% 3283395 vmstat.memory.cache
355105 -75.7% 86344 ± 3% vmstat.system.cs
385435 -20.7% 305714 ± 3% vmstat.system.in
1.13 -0.2 0.88 mpstat.cpu.all.irq%
0.29 -0.2 0.10 ± 2% mpstat.cpu.all.soft%
6.76 ± 2% +1.1 7.88 ± 2% mpstat.cpu.all.sys%
1.62 -1.0 0.62 ± 2% mpstat.cpu.all.usr%
2234397 -84.3% 350161 ± 5% stress-ng.pthread.ops
37237 -84.3% 5834 ± 5% stress-ng.pthread.ops_per_sec
294706 ± 2% -68.0% 94191 ± 6% stress-ng.time.involuntary_context_switches
41442 ± 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size
4466457 -83.9% 717053 ± 5% stress-ng.time.minor_page_faults
243.33 +13.5% 276.17 ± 3% stress-ng.time.percent_of_cpu_this_job_got
131.64 +27.7% 168.11 ± 3% stress-ng.time.system_time
19.73 -82.1% 3.53 ± 4% stress-ng.time.user_time
7715609 -80.2% 1530125 ± 4% stress-ng.time.voluntary_context_switches
494566 -59.5% 200338 ± 3% meminfo.Active
478287 -61.5% 184050 ± 3% meminfo.Active(anon)
58549 ± 17% +1532.8% 956006 ± 14% meminfo.AnonHugePages
424631 +194.9% 1252445 ± 10% meminfo.AnonPages
3677263 -13.0% 3197755 meminfo.Cached
5829485 ± 4% -19.0% 4724784 ± 10% meminfo.Committed_AS
692486 +108.6% 1444669 ± 8% meminfo.Inactive
662179 +113.6% 1414338 ± 9% meminfo.Inactive(anon)
182416 -50.2% 90759 meminfo.Mapped
4614466 +10.0% 5076604 ± 2% meminfo.Memused
6985 +47.6% 10307 ± 4% meminfo.PageTables
718445 -66.7% 238913 ± 3% meminfo.Shmem
35906 -20.7% 28471 ± 3% meminfo.VmallocUsed
4838522 +25.6% 6075302 meminfo.max_used_kB
488.83 -20.9% 386.67 ± 2% turbostat.Avg_MHz
12.95 -2.7 10.26 ± 2% turbostat.Busy%
7156734 -87.2% 919149 ± 4% turbostat.C1
10.59 -8.9 1.65 ± 5% turbostat.C1%
3702647 -55.1% 1663518 ± 2% turbostat.C1E
32.99 -20.6 12.36 ± 3% turbostat.C1E%
1161078 +64.5% 1909611 turbostat.C6
44.25 +31.8 76.10 turbostat.C6%
0.18 -33.3% 0.12 turbostat.IPC
74338573 ± 2% -33.9% 49159610 ± 4% turbostat.IRQ
1381661 -91.0% 124075 ± 6% turbostat.POLL
0.26 -0.2 0.04 ± 12% turbostat.POLL%
96.15 -5.4% 90.95 turbostat.PkgWatt
12.12 +19.3% 14.46 turbostat.RAMWatt
119573 -61.5% 46012 ± 3% proc-vmstat.nr_active_anon
106168 +195.8% 314047 ± 10% proc-vmstat.nr_anon_pages
28.60 ± 17% +1538.5% 468.68 ± 14% proc-vmstat.nr_anon_transparent_hugepages
923365 -13.0% 803489 proc-vmstat.nr_file_pages
165571 +113.5% 353493 ± 9% proc-vmstat.nr_inactive_anon
45605 -50.2% 22690 proc-vmstat.nr_mapped
1752 +47.1% 2578 ± 4% proc-vmstat.nr_page_table_pages
179613 -66.7% 59728 ± 3% proc-vmstat.nr_shmem
21490 -2.4% 20981 proc-vmstat.nr_slab_reclaimable
28260 -7.3% 26208 proc-vmstat.nr_slab_unreclaimable
119573 -61.5% 46012 ± 3% proc-vmstat.nr_zone_active_anon
165570 +113.5% 353492 ± 9% proc-vmstat.nr_zone_inactive_anon
17343640 -76.3% 4116748 ± 4% proc-vmstat.numa_hit
17364975 -76.3% 4118098 ± 4% proc-vmstat.numa_local
249252 -66.2% 84187 ± 2% proc-vmstat.pgactivate
27528916 +567.1% 1.836e+08 ± 5% proc-vmstat.pgalloc_normal
4912427 -79.2% 1019949 ± 3% proc-vmstat.pgfault
27227124 +574.1% 1.835e+08 ± 5% proc-vmstat.pgfree
8728 +3896.4% 348802 ± 5% proc-vmstat.thp_deferred_split_page
8730 +3895.3% 348814 ± 5% proc-vmstat.thp_fault_alloc
8728 +3896.4% 348802 ± 5% proc-vmstat.thp_split_pmd
316745 -21.5% 248756 ± 4% sched_debug.cfs_rq:/.avg_vruntime.avg
112735 ± 4% -34.3% 74061 ± 6% sched_debug.cfs_rq:/.avg_vruntime.min
0.49 ± 6% -17.2% 0.41 ± 8% sched_debug.cfs_rq:/.h_nr_running.stddev
12143 ±120% -99.9% 15.70 ±116% sched_debug.cfs_rq:/.left_vruntime.avg
414017 ±126% -99.9% 428.50 ±102% sched_debug.cfs_rq:/.left_vruntime.max
68492 ±125% -99.9% 78.15 ±106% sched_debug.cfs_rq:/.left_vruntime.stddev
41917 ± 24% -48.3% 21690 ± 57% sched_debug.cfs_rq:/.load.avg
176151 ± 30% -56.9% 75963 ± 57% sched_debug.cfs_rq:/.load.stddev
6489 ± 17% -29.0% 4608 ± 12% sched_debug.cfs_rq:/.load_avg.max
4.42 ± 45% -81.1% 0.83 ± 74% sched_debug.cfs_rq:/.load_avg.min
1112 ± 17% -31.0% 767.62 ± 11% sched_debug.cfs_rq:/.load_avg.stddev
316745 -21.5% 248756 ± 4% sched_debug.cfs_rq:/.min_vruntime.avg
112735 ± 4% -34.3% 74061 ± 6% sched_debug.cfs_rq:/.min_vruntime.min
0.49 ± 6% -17.2% 0.41 ± 8% sched_debug.cfs_rq:/.nr_running.stddev
12144 ±120% -99.9% 15.70 ±116% sched_debug.cfs_rq:/.right_vruntime.avg
414017 ±126% -99.9% 428.50 ±102% sched_debug.cfs_rq:/.right_vruntime.max
68492 ±125% -99.9% 78.15 ±106% sched_debug.cfs_rq:/.right_vruntime.stddev
14.25 ± 44% -76.6% 3.33 ± 58% sched_debug.cfs_rq:/.runnable_avg.min
11.58 ± 49% -77.7% 2.58 ± 58% sched_debug.cfs_rq:/.util_avg.min
423972 ± 23% +59.3% 675379 ± 3% sched_debug.cpu.avg_idle.avg
5720 ± 43% +439.5% 30864 sched_debug.cpu.avg_idle.min
99.79 ± 2% -23.7% 76.11 ± 2% sched_debug.cpu.clock_task.stddev
162475 ± 49% -95.8% 6813 ± 26% sched_debug.cpu.curr->pid.avg
1061268 -84.0% 170212 ± 4% sched_debug.cpu.curr->pid.max
365404 ± 20% -91.3% 31839 ± 10% sched_debug.cpu.curr->pid.stddev
0.51 ± 3% -20.1% 0.41 ± 9% sched_debug.cpu.nr_running.stddev
311923 -74.2% 80615 ± 2% sched_debug.cpu.nr_switches.avg
565973 ± 4% -77.8% 125597 ± 10% sched_debug.cpu.nr_switches.max
192666 ± 4% -70.6% 56695 ± 6% sched_debug.cpu.nr_switches.min
67485 ± 8% -79.9% 13558 ± 10% sched_debug.cpu.nr_switches.stddev
2.62 +102.1% 5.30 perf-stat.i.MPKI
2.09e+09 -47.6% 1.095e+09 ± 4% perf-stat.i.branch-instructions
1.56 -0.5 1.01 perf-stat.i.branch-miss-rate%
31951200 -60.9% 12481432 ± 2% perf-stat.i.branch-misses
19.38 +23.7 43.08 perf-stat.i.cache-miss-rate%
26413597 -5.7% 24899132 ± 4% perf-stat.i.cache-misses
1.363e+08 -58.3% 56906133 ± 4% perf-stat.i.cache-references
370628 -75.8% 89743 ± 3% perf-stat.i.context-switches
1.77 +65.1% 2.92 ± 2% perf-stat.i.cpi
1.748e+10 -21.8% 1.367e+10 ± 2% perf-stat.i.cpu-cycles
61611 -79.1% 12901 ± 6% perf-stat.i.cpu-migrations
716.97 ± 2% -17.2% 593.35 ± 2% perf-stat.i.cycles-between-cache-misses
0.12 ± 4% -0.1 0.05 perf-stat.i.dTLB-load-miss-rate%
3066100 ± 3% -81.3% 573066 ± 5% perf-stat.i.dTLB-load-misses
2.652e+09 -50.1% 1.324e+09 ± 4% perf-stat.i.dTLB-loads
0.08 ± 2% -0.0 0.03 perf-stat.i.dTLB-store-miss-rate%
1168195 ± 2% -82.9% 199438 ± 5% perf-stat.i.dTLB-store-misses
1.478e+09 -56.8% 6.384e+08 ± 3% perf-stat.i.dTLB-stores
8080423 -73.2% 2169371 ± 3% perf-stat.i.iTLB-load-misses
5601321 -74.3% 1440571 ± 2% perf-stat.i.iTLB-loads
1.028e+10 -49.7% 5.173e+09 ± 4% perf-stat.i.instructions
1450 +73.1% 2511 ± 2% perf-stat.i.instructions-per-iTLB-miss
0.61 -35.9% 0.39 perf-stat.i.ipc
0.48 -21.4% 0.38 ± 2% perf-stat.i.metric.GHz
616.28 -17.6% 507.69 ± 4% perf-stat.i.metric.K/sec
175.16 -50.8% 86.18 ± 4% perf-stat.i.metric.M/sec
76728 -80.8% 14724 ± 4% perf-stat.i.minor-faults
5600408 -61.4% 2160997 ± 5% perf-stat.i.node-loads
8873996 +52.1% 13499744 ± 5% perf-stat.i.node-stores
112409 -81.9% 20305 ± 4% perf-stat.i.page-faults
2.55 +89.6% 4.83 perf-stat.overall.MPKI
1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate%
19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate%
1.70 +56.4% 2.65 perf-stat.overall.cpi
665.84 -17.5% 549.51 ± 2% perf-stat.overall.cycles-between-cache-misses
0.12 ± 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate%
0.08 ± 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate%
59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate%
1278 +86.1% 2379 ± 2% perf-stat.overall.instructions-per-iTLB-miss
0.59 -36.1% 0.38 perf-stat.overall.ipc
2.078e+09 -48.3% 1.074e+09 ± 4% perf-stat.ps.branch-instructions
31292687 -61.2% 12133349 ± 2% perf-stat.ps.branch-misses
26057291 -5.9% 24512034 ± 4% perf-stat.ps.cache-misses
1.353e+08 -58.6% 56072195 ± 4% perf-stat.ps.cache-references
365254 -75.8% 88464 ± 3% perf-stat.ps.context-switches
1.735e+10 -22.4% 1.346e+10 ± 2% perf-stat.ps.cpu-cycles
60838 -79.1% 12727 ± 6% perf-stat.ps.cpu-migrations
3056601 ± 4% -81.5% 565354 ± 4% perf-stat.ps.dTLB-load-misses
2.636e+09 -50.7% 1.3e+09 ± 4% perf-stat.ps.dTLB-loads
1155253 ± 2% -83.0% 196581 ± 5% perf-stat.ps.dTLB-store-misses
1.473e+09 -57.4% 6.268e+08 ± 3% perf-stat.ps.dTLB-stores
7997726 -73.3% 2131477 ± 3% perf-stat.ps.iTLB-load-misses
5521346 -74.3% 1418623 ± 2% perf-stat.ps.iTLB-loads
1.023e+10 -50.4% 5.073e+09 ± 4% perf-stat.ps.instructions
75671 -80.9% 14479 ± 4% perf-stat.ps.minor-faults
5549722 -61.4% 2141750 ± 4% perf-stat.ps.node-loads
8769156 +51.6% 13296579 ± 5% perf-stat.ps.node-stores
110795 -82.0% 19977 ± 4% perf-stat.ps.page-faults
6.482e+11 -50.7% 3.197e+11 ± 4% perf-stat.total.instructions
0.00 ± 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
0.01 ± 18% +8373.1% 0.73 ± 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
0.01 ± 16% +4600.0% 0.38 ± 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
0.01 ±204% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
0.01 ± 8% +3678.9% 0.36 ± 79% perf-sched.sch_delay.avg.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
0.01 ± 14% -38.5% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
0.01 ± 5% +2946.2% 0.26 ± 43% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
0.00 ± 14% +125.0% 0.01 ± 12% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.02 ±170% -83.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 ± 69% +6578.6% 0.31 ± 4% perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
0.00 +100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
0.02 ± 86% +4234.4% 0.65 ± 4% perf-sched.sch_delay.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
0.01 ± 6% +6054.3% 0.47 perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
0.00 ± 14% +195.2% 0.01 ± 89% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.00 ±102% +340.0% 0.01 ± 85% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.00 +100.0% 0.00 perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.00 ± 11% +66.7% 0.01 ± 21% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.01 ± 89% +1096.1% 0.15 ± 30% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
0.00 +141.7% 0.01 ± 61% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.00 ±223% +9975.0% 0.07 ±203% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
0.00 ± 10% +789.3% 0.04 ± 69% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.00 ± 31% +6691.3% 0.26 ± 5% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
0.00 ± 28% +14612.5% 0.59 ± 4% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
0.00 ± 24% +4904.2% 0.20 ± 4% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
0.00 ± 28% +450.0% 0.01 ± 74% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.00 ± 17% +984.6% 0.02 ± 79% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.00 ± 20% +231.8% 0.01 ± 89% perf-sched.sch_delay.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.submit_bio_wait
0.00 +350.0% 0.01 ± 16% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.02 ± 16% +320.2% 0.07 ± 2% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.02 ± 2% +282.1% 0.09 ± 5% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.00 ± 14% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
0.05 ± 35% +3784.5% 1.92 ± 16% perf-sched.sch_delay.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
0.29 ±128% +563.3% 1.92 ± 7% perf-sched.sch_delay.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
0.14 ±217% -99.7% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
0.03 ± 49% -74.0% 0.01 ± 51% perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
0.01 ± 54% -57.4% 0.00 ± 75% perf-sched.sch_delay.max.ms.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
0.12 ± 21% +873.0% 1.19 ± 60% perf-sched.sch_delay.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
2.27 ±220% -99.7% 0.01 ± 19% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
0.02 ± 36% -54.4% 0.01 ± 55% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
0.04 ± 36% -77.1% 0.01 ± 31% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
0.12 ± 32% +1235.8% 1.58 ± 31% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
2.25 ±218% -99.3% 0.02 ± 52% perf-sched.sch_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.01 ± 85% +19836.4% 2.56 ± 7% perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
0.03 ± 70% -93.6% 0.00 ±223% perf-sched.sch_delay.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
0.10 ± 16% +2984.2% 3.21 ± 6% perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
0.01 ± 20% +883.9% 0.05 ±177% perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.01 ± 15% +694.7% 0.08 ±123% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.00 ±223% +6966.7% 0.07 ±199% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
0.01 ± 38% +8384.6% 0.55 ± 72% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
0.01 ± 13% +12995.7% 1.51 ±103% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
117.80 ± 56% -96.4% 4.26 ± 36% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.01 ± 68% +331.9% 0.03 perf-sched.total_sch_delay.average.ms
4.14 +242.6% 14.20 ± 4% perf-sched.total_wait_and_delay.average.ms
700841 -69.6% 212977 ± 3% perf-sched.total_wait_and_delay.count.ms
4.14 +242.4% 14.16 ± 4% perf-sched.total_wait_time.average.ms
11.68 ± 8% +213.3% 36.59 ± 28% perf-sched.wait_and_delay.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
10.00 ± 2% +226.1% 32.62 ± 20% perf-sched.wait_and_delay.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
10.55 ± 3% +259.8% 37.96 ± 7% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
9.80 ± 12% +196.5% 29.07 ± 32% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
9.80 ± 4% +234.9% 32.83 ± 14% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
10.32 ± 2% +223.8% 33.42 ± 6% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
8.15 ± 14% +271.3% 30.25 ± 35% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
9.60 ± 4% +240.8% 32.73 ± 16% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
10.37 ± 4% +232.0% 34.41 ± 10% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
7.32 ± 46% +269.7% 27.07 ± 49% perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
9.88 +236.2% 33.23 ± 4% perf-sched.wait_and_delay.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
4.44 ± 4% +379.0% 21.27 ± 18% perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
10.05 ± 2% +235.6% 33.73 ± 11% perf-sched.wait_and_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.03 +462.6% 0.15 ± 6% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.78 ± 4% +482.1% 39.46 ± 3% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
3.17 +683.3% 24.85 ± 8% perf-sched.wait_and_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
36.64 ± 13% +244.7% 126.32 ± 6% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
9.81 +302.4% 39.47 ± 4% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
1.05 +48.2% 1.56 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
0.93 +14.2% 1.06 ± 2% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
9.93 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
12.02 ± 3% +139.8% 28.83 ± 6% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
6.09 ± 2% +403.0% 30.64 ± 5% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
23.17 ± 19% -83.5% 3.83 ±143% perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio
79.83 ± 9% -55.1% 35.83 ± 16% perf-sched.wait_and_delay.count.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
14.83 ± 14% -59.6% 6.00 ± 56% perf-sched.wait_and_delay.count.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
8.50 ± 17% -80.4% 1.67 ± 89% perf-sched.wait_and_delay.count.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
114.00 ± 14% -62.4% 42.83 ± 11% perf-sched.wait_and_delay.count.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
94.67 ± 7% -48.1% 49.17 ± 13% perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
59.83 ± 13% -76.0% 14.33 ± 48% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
103.00 ± 12% -48.1% 53.50 ± 20% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
19.33 ± 16% -56.0% 8.50 ± 29% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
68.17 ± 11% -39.1% 41.50 ± 19% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
36.67 ± 22% -79.1% 7.67 ± 46% perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
465.50 ± 9% -47.4% 244.83 ± 11% perf-sched.wait_and_delay.count.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
14492 ± 3% -96.3% 533.67 ± 10% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
128.67 ± 7% -53.5% 59.83 ± 10% perf-sched.wait_and_delay.count.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.67 ± 34% -80.4% 1.50 ±107% perf-sched.wait_and_delay.count.__cond_resched.vunmap_p4d_range.__vunmap_range_noflush.remove_vm_area.vfree
147533 -81.0% 28023 ± 5% perf-sched.wait_and_delay.count.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
4394 ± 4% -78.5% 942.83 ± 7% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
228791 -79.3% 47383 ± 4% perf-sched.wait_and_delay.count.futex_wait_queue.__futex_wait.futex_wait.do_futex
368.50 ± 2% -67.1% 121.33 ± 3% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
147506 -81.0% 28010 ± 5% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
5387 ± 6% -16.7% 4488 ± 5% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
8303 ± 2% -56.9% 3579 ± 5% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
14.67 ± 7% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
370.50 ±141% +221.9% 1192 ± 5% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
24395 ± 2% -51.2% 11914 ± 6% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
31053 ± 2% -80.5% 6047 ± 5% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
16.41 ± 2% +342.7% 72.65 ± 29% perf-sched.wait_and_delay.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
16.49 ± 3% +463.3% 92.90 ± 27% perf-sched.wait_and_delay.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
17.32 ± 5% +520.9% 107.52 ± 14% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
15.38 ± 6% +325.2% 65.41 ± 22% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
16.73 ± 4% +456.2% 93.04 ± 11% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
17.14 ± 3% +510.6% 104.68 ± 14% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
15.70 ± 4% +379.4% 75.25 ± 28% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
15.70 ± 3% +422.1% 81.97 ± 19% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
16.38 +528.4% 102.91 ± 21% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
45.20 ± 48% +166.0% 120.23 ± 27% perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
17.25 +495.5% 102.71 ± 2% perf-sched.wait_and_delay.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
402.57 ± 15% -52.8% 189.90 ± 14% perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
16.96 ± 4% +521.3% 105.40 ± 15% perf-sched.wait_and_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
28.45 +517.3% 175.65 ± 14% perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
22.49 +628.5% 163.83 ± 16% perf-sched.wait_and_delay.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
26.53 ± 30% +326.9% 113.25 ± 16% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
15.54 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
1.67 ±141% +284.6% 6.44 ± 4% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
0.07 ± 34% -93.6% 0.00 ±105% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
10.21 ± 15% +295.8% 40.43 ± 50% perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.89 ± 40% -99.8% 0.01 ±113% perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
11.67 ± 8% +213.5% 36.58 ± 28% perf-sched.wait_time.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
9.98 ± 2% +226.8% 32.61 ± 20% perf-sched.wait_time.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
1.03 +71.2% 1.77 ± 20% perf-sched.wait_time.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
0.06 ± 79% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
0.05 ± 22% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
0.08 ± 82% -98.2% 0.00 ±223% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
10.72 ± 10% +166.9% 28.61 ± 29% perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
10.53 ± 3% +260.5% 37.95 ± 7% perf-sched.wait_time.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
9.80 ± 12% +196.6% 29.06 ± 32% perf-sched.wait_time.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
9.80 ± 4% +235.1% 32.82 ± 14% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
9.50 ± 12% +281.9% 36.27 ± 70% perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
10.31 ± 2% +223.9% 33.40 ± 6% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
8.04 ± 15% +276.1% 30.25 ± 35% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
9.60 ± 4% +240.9% 32.72 ± 16% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
0.06 ± 66% -98.3% 0.00 ±223% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
10.36 ± 4% +232.1% 34.41 ± 10% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
0.08 ± 50% -95.7% 0.00 ±100% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
0.01 ± 49% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
0.03 ± 73% -87.4% 0.00 ±145% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
8.01 ± 25% +238.0% 27.07 ± 49% perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
9.86 +237.0% 33.23 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
4.44 ± 4% +379.2% 21.26 ± 18% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
10.03 +236.3% 33.73 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.97 ± 8% -87.8% 0.12 ±221% perf-sched.wait_time.avg.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
0.02 ± 13% +1846.8% 0.45 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
1.01 +64.7% 1.66 perf-sched.wait_time.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
0.75 ± 4% +852.1% 7.10 ± 5% perf-sched.wait_time.avg.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
0.03 +462.6% 0.15 ± 6% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.24 ± 4% +25.3% 0.30 ± 8% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
1.98 ± 15% +595.7% 13.80 ± 90% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
2.78 ± 14% +444.7% 15.12 ± 16% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
6.77 ± 4% +483.0% 39.44 ± 3% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
3.17 +684.7% 24.85 ± 8% perf-sched.wait_time.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
36.64 ± 13% +244.7% 126.32 ± 6% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
9.79 +303.0% 39.45 ± 4% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
1.05 +23.8% 1.30 perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
0.86 +101.2% 1.73 ± 3% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
0.11 ± 21% +438.9% 0.61 ± 15% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.32 ± 4% +28.5% 0.41 ± 13% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
12.00 ± 3% +139.6% 28.76 ± 6% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
6.07 ± 2% +403.5% 30.56 ± 5% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.38 ± 41% -98.8% 0.00 ±105% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
0.36 ± 34% -84.3% 0.06 ±200% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.vma_alloc_folio.do_anonymous_page
0.36 ± 51% -92.9% 0.03 ±114% perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
15.98 ± 5% +361.7% 73.80 ± 23% perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.51 ± 14% -92.8% 0.04 ±196% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.__vmalloc_area_node.__vmalloc_node_range
8.56 ± 11% -99.9% 0.01 ±126% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
0.43 ± 32% -68.2% 0.14 ±119% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_node_trace.__get_vm_area_node.__vmalloc_node_range
0.46 ± 20% -89.3% 0.05 ±184% perf-sched.wait_time.max.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct
16.40 ± 2% +342.9% 72.65 ± 29% perf-sched.wait_time.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
0.31 ± 63% -76.2% 0.07 ±169% perf-sched.wait_time.max.ms.__cond_resched.cgroup_css_set_fork.cgroup_can_fork.copy_process.kernel_clone
0.14 ± 93% +258.7% 0.49 ± 14% perf-sched.wait_time.max.ms.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
16.49 ± 3% +463.5% 92.89 ± 27% perf-sched.wait_time.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
1.09 +171.0% 2.96 ± 10% perf-sched.wait_time.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
1.16 ± 7% +155.1% 2.97 ± 4% perf-sched.wait_time.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
0.19 ± 78% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
0.33 ± 35% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
0.20 ±101% -99.3% 0.00 ±223% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
17.31 ± 5% +521.0% 107.51 ± 14% perf-sched.wait_time.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
15.38 ± 6% +325.3% 65.40 ± 22% perf-sched.wait_time.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
16.72 ± 4% +456.6% 93.04 ± 11% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
1.16 ± 2% +88.7% 2.20 ± 33% perf-sched.wait_time.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
53.96 ± 32% +444.0% 293.53 ±109% perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
17.13 ± 2% +511.2% 104.68 ± 14% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
15.69 ± 4% +379.5% 75.25 ± 28% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
15.70 ± 3% +422.2% 81.97 ± 19% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
0.27 ± 80% -99.6% 0.00 ±223% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
16.37 +528.6% 102.90 ± 21% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
0.44 ± 33% -99.1% 0.00 ±104% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
0.02 ± 83% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
0.08 ± 83% -95.4% 0.00 ±147% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
1.16 ± 2% +134.7% 2.72 ± 19% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
49.88 ± 25% +141.0% 120.23 ± 27% perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
17.24 +495.7% 102.70 ± 2% perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
402.56 ± 15% -52.8% 189.89 ± 14% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
16.96 ± 4% +521.4% 105.39 ± 15% perf-sched.wait_time.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.06 +241.7% 3.61 ± 4% perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
1.07 -88.9% 0.12 ±221% perf-sched.wait_time.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
0.28 ± 27% +499.0% 1.67 ± 18% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
1.21 ± 2% +207.2% 3.71 ± 3% perf-sched.wait_time.max.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
13.43 ± 26% +38.8% 18.64 perf-sched.wait_time.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
28.45 +517.3% 175.65 ± 14% perf-sched.wait_time.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.79 ± 10% +62.2% 1.28 ± 25% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
13.22 ± 2% +317.2% 55.16 ± 35% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
834.29 ± 28% -48.5% 429.53 ± 94% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
22.48 +628.6% 163.83 ± 16% perf-sched.wait_time.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
22.74 ± 18% +398.0% 113.25 ± 16% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
7.72 ± 7% +80.6% 13.95 ± 2% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
0.74 ± 4% +77.2% 1.31 ± 32% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
5.01 +14.1% 5.72 ± 2% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
44.98 -19.7 25.32 ± 2% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
43.21 -19.6 23.65 ± 3% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
43.21 -19.6 23.65 ± 3% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
43.18 -19.5 23.63 ± 3% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
40.30 -17.5 22.75 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
41.10 -17.4 23.66 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
39.55 -17.3 22.24 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
24.76 ± 2% -8.5 16.23 ± 3% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
8.68 ± 4% -6.5 2.22 ± 6% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
7.23 ± 4% -5.8 1.46 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
7.23 ± 4% -5.8 1.46 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.11 ± 4% -5.7 1.39 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.09 ± 4% -5.7 1.39 ± 7% perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.59 ± 3% -5.1 1.47 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
6.59 ± 3% -5.1 1.47 ± 7% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
6.59 ± 3% -5.1 1.47 ± 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
5.76 ± 2% -5.0 0.80 ± 9% perf-profile.calltrace.cycles-pp.start_thread
7.43 ± 2% -4.9 2.52 ± 7% perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
5.51 ± 3% -4.8 0.70 ± 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.start_thread
5.50 ± 3% -4.8 0.70 ± 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
5.48 ± 3% -4.8 0.69 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
5.42 ± 3% -4.7 0.69 ± 7% perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
5.90 ± 5% -3.9 2.01 ± 4% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
4.18 ± 5% -3.8 0.37 ± 71% perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.76 ± 5% -3.8 1.98 ± 4% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
5.04 ± 7% -3.7 1.32 ± 9% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__clone
5.03 ± 7% -3.7 1.32 ± 9% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
5.02 ± 7% -3.7 1.32 ± 9% perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
5.02 ± 7% -3.7 1.32 ± 9% perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
5.62 ± 5% -3.7 1.96 ± 3% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
4.03 ± 4% -3.1 0.92 ± 7% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
6.03 ± 5% -3.1 2.94 ± 3% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
3.43 ± 5% -2.8 0.67 ± 13% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
3.43 ± 5% -2.8 0.67 ± 13% perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
3.41 ± 5% -2.7 0.66 ± 13% perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
3.40 ± 5% -2.7 0.66 ± 13% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
3.67 ± 7% -2.7 0.94 ± 10% perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.92 ± 7% -2.4 0.50 ± 46% perf-profile.calltrace.cycles-pp.stress_pthread
2.54 ± 6% -2.2 0.38 ± 70% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.46 ± 6% -1.8 0.63 ± 10% perf-profile.calltrace.cycles-pp.dup_task_struct.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
3.00 ± 6% -1.6 1.43 ± 7% perf-profile.calltrace.cycles-pp.__munmap
2.96 ± 6% -1.5 1.42 ± 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
2.96 ± 6% -1.5 1.42 ± 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
2.95 ± 6% -1.5 1.41 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
2.95 ± 6% -1.5 1.41 ± 7% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
2.02 ± 4% -1.5 0.52 ± 46% perf-profile.calltrace.cycles-pp.__lll_lock_wait
1.78 ± 3% -1.5 0.30 ±100% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
1.77 ± 3% -1.5 0.30 ±100% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
1.54 ± 6% -1.3 0.26 ±100% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
2.54 ± 6% -1.2 1.38 ± 6% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.51 ± 6% -1.1 1.37 ± 7% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
1.13 -0.7 0.40 ± 70% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.15 ± 5% -0.7 0.46 ± 45% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
1.58 ± 5% -0.6 0.94 ± 7% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
0.99 ± 5% -0.5 0.51 ± 45% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
1.01 ± 5% -0.5 0.54 ± 45% perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.82 ± 4% -0.2 0.59 ± 5% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
0.00 +0.5 0.54 ± 5% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
0.00 +0.6 0.60 ± 5% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
0.00 +0.6 0.61 ± 6% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
0.00 +0.6 0.62 ± 6% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
0.53 ± 5% +0.6 1.17 ± 13% perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
1.94 ± 2% +0.7 2.64 ± 9% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
0.00 +0.7 0.73 ± 5% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
0.00 +0.8 0.75 ± 20% perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
2.02 ± 2% +0.8 2.85 ± 9% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.74 ± 5% +0.8 1.57 ± 11% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.00 +0.9 0.90 ± 4% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
0.00 +0.9 0.92 ± 13% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues
0.86 ± 4% +1.0 1.82 ± 10% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
0.86 ± 4% +1.0 1.83 ± 10% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.00 +1.0 0.98 ± 7% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked
0.09 ±223% +1.0 1.07 ± 11% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt
0.00 +1.0 0.99 ± 6% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd
0.00 +1.0 1.00 ± 7% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range
0.09 ±223% +1.0 1.10 ± 12% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
0.00 +1.0 1.01 ± 6% perf-profile.calltrace.cycles-pp.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
0.00 +1.1 1.10 ± 5% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath
0.00 +1.1 1.12 ± 5% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock
0.00 +1.2 1.23 ± 4% perf-profile.calltrace.cycles-pp.page_add_anon_rmap.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
0.00 +1.3 1.32 ± 4% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd
0.00 +1.4 1.38 ± 5% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
0.00 +2.4 2.44 ± 10% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range
0.00 +3.1 3.10 ± 5% perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
0.00 +3.5 3.52 ± 5% perf-profile.calltrace.cycles-pp.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
0.88 ± 4% +3.8 4.69 ± 4% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
6.30 ± 6% +13.5 19.85 ± 7% perf-profile.calltrace.cycles-pp.__clone
0.00 +16.7 16.69 ± 7% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
1.19 ± 29% +17.1 18.32 ± 7% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
0.00 +17.6 17.56 ± 7% perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.63 ± 7% +17.7 18.35 ± 7% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__clone
0.59 ± 5% +17.8 18.34 ± 7% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.__clone
0.59 ± 5% +17.8 18.34 ± 7% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
0.00 +17.9 17.90 ± 7% perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.36 ± 71% +18.0 18.33 ± 7% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
0.00 +32.0 32.03 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range
0.00 +32.6 32.62 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
0.00 +36.2 36.19 ± 2% perf-profile.calltrace.cycles-pp.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
7.97 ± 4% +36.6 44.52 ± 2% perf-profile.calltrace.cycles-pp.__madvise
7.91 ± 4% +36.6 44.46 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
7.90 ± 4% +36.6 44.46 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
7.87 ± 4% +36.6 44.44 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
7.86 ± 4% +36.6 44.44 ± 2% perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
7.32 ± 4% +36.8 44.07 ± 2% perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.25 ± 4% +36.8 44.06 ± 2% perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
1.04 ± 4% +40.0 41.08 ± 2% perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
1.00 ± 3% +40.1 41.06 ± 2% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
44.98 -19.7 25.32 ± 2% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
44.98 -19.7 25.32 ± 2% perf-profile.children.cycles-pp.cpu_startup_entry
44.96 -19.6 25.31 ± 2% perf-profile.children.cycles-pp.do_idle
43.21 -19.6 23.65 ± 3% perf-profile.children.cycles-pp.start_secondary
41.98 -17.6 24.40 ± 2% perf-profile.children.cycles-pp.cpuidle_idle_call
41.21 -17.3 23.86 ± 2% perf-profile.children.cycles-pp.cpuidle_enter
41.20 -17.3 23.86 ± 2% perf-profile.children.cycles-pp.cpuidle_enter_state
12.69 ± 3% -10.6 2.12 ± 6% perf-profile.children.cycles-pp.do_exit
12.60 ± 3% -10.5 2.08 ± 7% perf-profile.children.cycles-pp.__x64_sys_exit
24.76 ± 2% -8.5 16.31 ± 2% perf-profile.children.cycles-pp.intel_idle
12.34 ± 2% -8.4 3.90 ± 5% perf-profile.children.cycles-pp.intel_idle_irq
6.96 ± 4% -5.4 1.58 ± 7% perf-profile.children.cycles-pp.ret_from_fork_asm
6.69 ± 4% -5.2 1.51 ± 7% perf-profile.children.cycles-pp.ret_from_fork
6.59 ± 3% -5.1 1.47 ± 7% perf-profile.children.cycles-pp.kthread
5.78 ± 2% -5.0 0.80 ± 8% perf-profile.children.cycles-pp.start_thread
4.68 ± 4% -4.5 0.22 ± 10% perf-profile.children.cycles-pp._raw_spin_lock_irq
5.03 ± 7% -3.7 1.32 ± 9% perf-profile.children.cycles-pp.__do_sys_clone
5.02 ± 7% -3.7 1.32 ± 9% perf-profile.children.cycles-pp.kernel_clone
4.20 ± 5% -3.7 0.53 ± 9% perf-profile.children.cycles-pp.exit_notify
4.67 ± 5% -3.6 1.10 ± 9% perf-profile.children.cycles-pp.rcu_core
4.60 ± 4% -3.5 1.06 ± 10% perf-profile.children.cycles-pp.rcu_do_batch
4.89 ± 5% -3.4 1.44 ± 11% perf-profile.children.cycles-pp.__do_softirq
5.64 ± 3% -3.2 2.39 ± 6% perf-profile.children.cycles-pp.__schedule
6.27 ± 5% -3.2 3.03 ± 4% perf-profile.children.cycles-pp.flush_tlb_mm_range
4.03 ± 4% -3.1 0.92 ± 7% perf-profile.children.cycles-pp.smpboot_thread_fn
6.68 ± 4% -3.1 3.61 ± 3% perf-profile.children.cycles-pp.tlb_finish_mmu
6.04 ± 5% -3.1 2.99 ± 4% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
6.04 ± 5% -3.0 2.99 ± 4% perf-profile.children.cycles-pp.smp_call_function_many_cond
3.77 ± 2% -3.0 0.73 ± 16% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
7.78 -3.0 4.77 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
3.43 ± 5% -2.8 0.67 ± 13% perf-profile.children.cycles-pp.run_ksoftirqd
3.67 ± 7% -2.7 0.94 ± 10% perf-profile.children.cycles-pp.copy_process
2.80 ± 6% -2.5 0.34 ± 15% perf-profile.children.cycles-pp.queued_write_lock_slowpath
3.41 ± 2% -2.5 0.96 ± 16% perf-profile.children.cycles-pp.do_futex
3.06 ± 5% -2.4 0.68 ± 16% perf-profile.children.cycles-pp.free_unref_page_commit
3.02 ± 5% -2.4 0.67 ± 16% perf-profile.children.cycles-pp.free_pcppages_bulk
2.92 ± 7% -2.3 0.58 ± 14% perf-profile.children.cycles-pp.stress_pthread
3.22 ± 3% -2.3 0.90 ± 18% perf-profile.children.cycles-pp.__x64_sys_futex
2.52 ± 5% -2.2 0.35 ± 7% perf-profile.children.cycles-pp.release_task
2.54 ± 6% -2.0 0.53 ± 10% perf-profile.children.cycles-pp.worker_thread
3.12 ± 5% -1.9 1.17 ± 11% perf-profile.children.cycles-pp.free_unref_page
2.31 ± 6% -1.9 0.45 ± 11% perf-profile.children.cycles-pp.process_one_work
2.47 ± 6% -1.8 0.63 ± 10% perf-profile.children.cycles-pp.dup_task_struct
2.19 ± 5% -1.8 0.41 ± 12% perf-profile.children.cycles-pp.delayed_vfree_work
2.14 ± 5% -1.7 0.40 ± 11% perf-profile.children.cycles-pp.vfree
3.19 ± 2% -1.6 1.58 ± 8% perf-profile.children.cycles-pp.schedule
2.06 ± 3% -1.6 0.46 ± 7% perf-profile.children.cycles-pp.__sigtimedwait
3.02 ± 6% -1.6 1.44 ± 7% perf-profile.children.cycles-pp.__munmap
1.94 ± 4% -1.6 0.39 ± 14% perf-profile.children.cycles-pp.__unfreeze_partials
2.95 ± 6% -1.5 1.41 ± 7% perf-profile.children.cycles-pp.__x64_sys_munmap
2.95 ± 6% -1.5 1.41 ± 7% perf-profile.children.cycles-pp.__vm_munmap
2.14 ± 3% -1.5 0.60 ± 21% perf-profile.children.cycles-pp.futex_wait
2.08 ± 4% -1.5 0.60 ± 19% perf-profile.children.cycles-pp.__lll_lock_wait
2.04 ± 3% -1.5 0.56 ± 20% perf-profile.children.cycles-pp.__futex_wait
1.77 ± 5% -1.5 0.32 ± 10% perf-profile.children.cycles-pp.remove_vm_area
1.86 ± 5% -1.4 0.46 ± 10% perf-profile.children.cycles-pp.open64
1.74 ± 4% -1.4 0.37 ± 7% perf-profile.children.cycles-pp.__x64_sys_rt_sigtimedwait
1.71 ± 4% -1.4 0.36 ± 8% perf-profile.children.cycles-pp.do_sigtimedwait
1.79 ± 5% -1.3 0.46 ± 9% perf-profile.children.cycles-pp.__x64_sys_openat
1.78 ± 5% -1.3 0.46 ± 8% perf-profile.children.cycles-pp.do_sys_openat2
1.61 ± 4% -1.3 0.32 ± 12% perf-profile.children.cycles-pp.poll_idle
1.65 ± 9% -1.3 0.37 ± 14% perf-profile.children.cycles-pp.pthread_create@@GLIBC_2.2.5
1.56 ± 8% -1.2 0.35 ± 7% perf-profile.children.cycles-pp.alloc_thread_stack_node
2.32 ± 3% -1.2 1.13 ± 8% perf-profile.children.cycles-pp.pick_next_task_fair
2.59 ± 6% -1.2 1.40 ± 7% perf-profile.children.cycles-pp.do_vmi_munmap
1.55 ± 4% -1.2 0.40 ± 19% perf-profile.children.cycles-pp.futex_wait_queue
1.37 ± 5% -1.1 0.22 ± 12% perf-profile.children.cycles-pp.find_unlink_vmap_area
2.52 ± 6% -1.1 1.38 ± 6% perf-profile.children.cycles-pp.do_vmi_align_munmap
1.53 ± 5% -1.1 0.39 ± 8% perf-profile.children.cycles-pp.do_filp_open
1.52 ± 5% -1.1 0.39 ± 7% perf-profile.children.cycles-pp.path_openat
1.25 ± 3% -1.1 0.14 ± 12% perf-profile.children.cycles-pp.sigpending
1.58 ± 5% -1.1 0.50 ± 6% perf-profile.children.cycles-pp.schedule_idle
1.29 ± 5% -1.1 0.21 ± 21% perf-profile.children.cycles-pp.__mprotect
1.40 ± 8% -1.1 0.32 ± 4% perf-profile.children.cycles-pp.__vmalloc_node_range
2.06 ± 3% -1.0 1.02 ± 9% perf-profile.children.cycles-pp.newidle_balance
1.04 ± 3% -1.0 0.08 ± 23% perf-profile.children.cycles-pp.__x64_sys_rt_sigpending
1.14 ± 6% -1.0 0.18 ± 18% perf-profile.children.cycles-pp.__x64_sys_mprotect
1.13 ± 6% -1.0 0.18 ± 17% perf-profile.children.cycles-pp.do_mprotect_pkey
1.30 ± 7% -0.9 0.36 ± 10% perf-profile.children.cycles-pp.wake_up_new_task
1.14 ± 9% -0.9 0.22 ± 16% perf-profile.children.cycles-pp.do_anonymous_page
0.95 ± 3% -0.9 0.04 ± 71% perf-profile.children.cycles-pp.do_sigpending
1.24 ± 3% -0.9 0.34 ± 9% perf-profile.children.cycles-pp.futex_wake
1.02 ± 6% -0.9 0.14 ± 15% perf-profile.children.cycles-pp.mprotect_fixup
1.91 ± 2% -0.9 1.06 ± 9% perf-profile.children.cycles-pp.load_balance
1.38 ± 5% -0.8 0.53 ± 6% perf-profile.children.cycles-pp.select_task_rq_fair
1.14 ± 4% -0.8 0.31 ± 12% perf-profile.children.cycles-pp.__pthread_mutex_unlock_usercnt
2.68 ± 3% -0.8 1.91 ± 6% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
1.00 ± 4% -0.7 0.26 ± 10% perf-profile.children.cycles-pp.flush_smp_call_function_queue
1.44 ± 3% -0.7 0.73 ± 10% perf-profile.children.cycles-pp.find_busiest_group
0.81 ± 6% -0.7 0.10 ± 18% perf-profile.children.cycles-pp.vma_modify
1.29 ± 3% -0.7 0.60 ± 8% perf-profile.children.cycles-pp.exit_mm
1.40 ± 3% -0.7 0.71 ± 10% perf-profile.children.cycles-pp.update_sd_lb_stats
0.78 ± 7% -0.7 0.10 ± 19% perf-profile.children.cycles-pp.__split_vma
0.90 ± 8% -0.7 0.22 ± 10% perf-profile.children.cycles-pp.__vmalloc_area_node
0.75 ± 4% -0.7 0.10 ± 5% perf-profile.children.cycles-pp.__exit_signal
1.49 ± 2% -0.7 0.84 ± 7% perf-profile.children.cycles-pp.try_to_wake_up
0.89 ± 7% -0.6 0.24 ± 10% perf-profile.children.cycles-pp.find_idlest_cpu
1.59 ± 5% -0.6 0.95 ± 7% perf-profile.children.cycles-pp.unmap_region
0.86 ± 3% -0.6 0.22 ± 26% perf-profile.children.cycles-pp.pthread_cond_timedwait@@GLIBC_2.3.2
1.59 ± 3% -0.6 0.95 ± 9% perf-profile.children.cycles-pp.irq_exit_rcu
1.24 ± 3% -0.6 0.61 ± 10% perf-profile.children.cycles-pp.update_sg_lb_stats
0.94 ± 5% -0.6 0.32 ± 11% perf-profile.children.cycles-pp.do_task_dead
0.87 ± 3% -0.6 0.25 ± 19% perf-profile.children.cycles-pp.perf_iterate_sb
0.82 ± 4% -0.6 0.22 ± 10% perf-profile.children.cycles-pp.sched_ttwu_pending
1.14 ± 3% -0.6 0.54 ± 10% perf-profile.children.cycles-pp.activate_task
0.84 -0.6 0.25 ± 10% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.81 ± 6% -0.6 0.22 ± 11% perf-profile.children.cycles-pp.find_idlest_group
0.75 ± 5% -0.6 0.18 ± 14% perf-profile.children.cycles-pp.step_into
0.74 ± 8% -0.6 0.18 ± 14% perf-profile.children.cycles-pp.__alloc_pages_bulk
0.74 ± 6% -0.5 0.19 ± 11% perf-profile.children.cycles-pp.update_sg_wakeup_stats
0.72 ± 5% -0.5 0.18 ± 15% perf-profile.children.cycles-pp.pick_link
1.06 ± 2% -0.5 0.52 ± 9% perf-profile.children.cycles-pp.enqueue_task_fair
0.77 ± 6% -0.5 0.23 ± 12% perf-profile.children.cycles-pp.unmap_vmas
0.76 ± 2% -0.5 0.22 ± 8% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.94 ± 2% -0.5 0.42 ± 10% perf-profile.children.cycles-pp.dequeue_task_fair
0.65 ± 5% -0.5 0.15 ± 18% perf-profile.children.cycles-pp.open_last_lookups
1.37 ± 3% -0.5 0.87 ± 4% perf-profile.children.cycles-pp.llist_add_batch
0.70 ± 4% -0.5 0.22 ± 19% perf-profile.children.cycles-pp.memcpy_orig
0.91 ± 4% -0.5 0.44 ± 7% perf-profile.children.cycles-pp.update_load_avg
0.67 -0.5 0.20 ± 8% perf-profile.children.cycles-pp.switch_fpu_return
0.88 ± 3% -0.5 0.42 ± 8% perf-profile.children.cycles-pp.enqueue_entity
0.91 ± 4% -0.5 0.45 ± 12% perf-profile.children.cycles-pp.ttwu_do_activate
0.77 ± 4% -0.5 0.32 ± 10% perf-profile.children.cycles-pp.schedule_hrtimeout_range_clock
0.63 ± 5% -0.4 0.20 ± 21% perf-profile.children.cycles-pp.arch_dup_task_struct
0.74 ± 3% -0.4 0.32 ± 15% perf-profile.children.cycles-pp.dequeue_entity
0.62 ± 5% -0.4 0.21 ± 5% perf-profile.children.cycles-pp.finish_task_switch
0.56 -0.4 0.16 ± 7% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
0.53 ± 4% -0.4 0.13 ± 9% perf-profile.children.cycles-pp.syscall
0.50 ± 9% -0.4 0.11 ± 18% perf-profile.children.cycles-pp.__get_vm_area_node
0.51 ± 3% -0.4 0.12 ± 12% perf-profile.children.cycles-pp.__slab_free
0.52 ± 2% -0.4 0.14 ± 10% perf-profile.children.cycles-pp.kmem_cache_free
0.75 ± 3% -0.4 0.37 ± 9% perf-profile.children.cycles-pp.exit_mm_release
0.50 ± 6% -0.4 0.12 ± 21% perf-profile.children.cycles-pp.do_send_specific
0.74 ± 3% -0.4 0.37 ± 8% perf-profile.children.cycles-pp.futex_exit_release
0.45 ± 10% -0.4 0.09 ± 17% perf-profile.children.cycles-pp.alloc_vmap_area
0.47 ± 3% -0.4 0.11 ± 20% perf-profile.children.cycles-pp.tgkill
0.68 ± 11% -0.4 0.32 ± 12% perf-profile.children.cycles-pp.__mmap
0.48 ± 3% -0.4 0.13 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.76 ± 5% -0.3 0.41 ± 10% perf-profile.children.cycles-pp.wake_up_q
0.42 ± 7% -0.3 0.08 ± 22% perf-profile.children.cycles-pp.__close
0.49 ± 7% -0.3 0.14 ± 25% perf-profile.children.cycles-pp.kmem_cache_alloc
0.49 ± 9% -0.3 0.15 ± 14% perf-profile.children.cycles-pp.mas_store_gfp
0.46 ± 4% -0.3 0.12 ± 23% perf-profile.children.cycles-pp.perf_event_task_output
0.44 ± 10% -0.3 0.10 ± 28% perf-profile.children.cycles-pp.pthread_sigqueue
0.46 ± 4% -0.3 0.12 ± 15% perf-profile.children.cycles-pp.link_path_walk
0.42 ± 8% -0.3 0.10 ± 20% perf-profile.children.cycles-pp.proc_ns_get_link
0.63 ± 10% -0.3 0.32 ± 12% perf-profile.children.cycles-pp.vm_mmap_pgoff
0.45 ± 4% -0.3 0.14 ± 13% perf-profile.children.cycles-pp.sched_move_task
0.36 ± 8% -0.3 0.06 ± 49% perf-profile.children.cycles-pp.__x64_sys_close
0.46 ± 8% -0.3 0.17 ± 14% perf-profile.children.cycles-pp.prctl
0.65 ± 3% -0.3 0.35 ± 7% perf-profile.children.cycles-pp.futex_cleanup
0.42 ± 7% -0.3 0.12 ± 15% perf-profile.children.cycles-pp.mas_store_prealloc
0.49 ± 5% -0.3 0.20 ± 13% perf-profile.children.cycles-pp.__rmqueue_pcplist
0.37 ± 7% -0.3 0.08 ± 16% perf-profile.children.cycles-pp.do_tkill
0.36 ± 10% -0.3 0.08 ± 20% perf-profile.children.cycles-pp.ns_get_path
0.37 ± 4% -0.3 0.09 ± 18% perf-profile.children.cycles-pp.setns
0.67 ± 3% -0.3 0.41 ± 8% perf-profile.children.cycles-pp.hrtimer_wakeup
0.35 ± 5% -0.3 0.10 ± 16% perf-profile.children.cycles-pp.__task_pid_nr_ns
0.41 ± 5% -0.3 0.16 ± 12% perf-profile.children.cycles-pp.mas_wr_bnode
0.35 ± 4% -0.3 0.10 ± 20% perf-profile.children.cycles-pp.rcu_cblist_dequeue
0.37 ± 5% -0.2 0.12 ± 17% perf-profile.children.cycles-pp.exit_task_stack_account
0.56 ± 4% -0.2 0.31 ± 12% perf-profile.children.cycles-pp.select_task_rq
0.29 ± 6% -0.2 0.05 ± 46% perf-profile.children.cycles-pp.mas_wr_store_entry
0.34 ± 4% -0.2 0.10 ± 27% perf-profile.children.cycles-pp.perf_event_task
0.39 ± 9% -0.2 0.15 ± 12% perf-profile.children.cycles-pp.__switch_to_asm
0.35 ± 5% -0.2 0.11 ± 11% perf-profile.children.cycles-pp.account_kernel_stack
0.30 ± 7% -0.2 0.06 ± 48% perf-profile.children.cycles-pp.__ns_get_path
0.31 ± 9% -0.2 0.07 ± 17% perf-profile.children.cycles-pp.free_vmap_area_noflush
0.31 ± 5% -0.2 0.08 ± 19% perf-profile.children.cycles-pp.__do_sys_setns
0.33 ± 7% -0.2 0.10 ± 7% perf-profile.children.cycles-pp.__free_one_page
0.31 ± 11% -0.2 0.08 ± 13% perf-profile.children.cycles-pp.__pte_alloc
0.36 ± 6% -0.2 0.13 ± 12% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.27 ± 12% -0.2 0.05 ± 71% perf-profile.children.cycles-pp.__fput
0.53 ± 9% -0.2 0.31 ± 12% perf-profile.children.cycles-pp.do_mmap
0.27 ± 12% -0.2 0.05 ± 77% perf-profile.children.cycles-pp.__x64_sys_rt_tgsigqueueinfo
0.28 ± 5% -0.2 0.06 ± 50% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.34 ± 10% -0.2 0.12 ± 29% perf-profile.children.cycles-pp.futex_wait_setup
0.27 ± 6% -0.2 0.06 ± 45% perf-profile.children.cycles-pp.__x64_sys_tgkill
0.31 ± 7% -0.2 0.11 ± 18% perf-profile.children.cycles-pp.__switch_to
0.26 ± 8% -0.2 0.06 ± 21% perf-profile.children.cycles-pp.__call_rcu_common
0.33 ± 9% -0.2 0.13 ± 18% perf-profile.children.cycles-pp.__do_sys_prctl
0.28 ± 5% -0.2 0.08 ± 17% perf-profile.children.cycles-pp.mm_release
0.52 ± 2% -0.2 0.32 ± 9% perf-profile.children.cycles-pp.__get_user_8
0.24 ± 10% -0.2 0.04 ± 72% perf-profile.children.cycles-pp.dput
0.25 ± 14% -0.2 0.05 ± 46% perf-profile.children.cycles-pp.perf_event_mmap
0.24 ± 7% -0.2 0.06 ± 50% perf-profile.children.cycles-pp.mas_walk
0.28 ± 6% -0.2 0.10 ± 24% perf-profile.children.cycles-pp.rmqueue_bulk
0.23 ± 15% -0.2 0.05 ± 46% perf-profile.children.cycles-pp.perf_event_mmap_event
0.25 ± 15% -0.2 0.08 ± 45% perf-profile.children.cycles-pp.___slab_alloc
0.20 ± 14% -0.2 0.03 ±100% perf-profile.children.cycles-pp.lookup_fast
0.20 ± 10% -0.2 0.04 ± 75% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
0.28 ± 7% -0.2 0.12 ± 24% perf-profile.children.cycles-pp.prepare_task_switch
0.22 ± 11% -0.2 0.05 ± 8% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.63 ± 5% -0.2 0.47 ± 12% perf-profile.children.cycles-pp.llist_reverse_order
0.25 ± 11% -0.2 0.09 ± 34% perf-profile.children.cycles-pp.futex_q_lock
0.21 ± 6% -0.2 0.06 ± 47% perf-profile.children.cycles-pp.kmem_cache_alloc_node
0.18 ± 11% -0.2 0.03 ±100% perf-profile.children.cycles-pp.alloc_empty_file
0.19 ± 5% -0.2 0.04 ± 71% perf-profile.children.cycles-pp.__put_task_struct
0.19 ± 15% -0.2 0.03 ± 70% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.24 ± 6% -0.2 0.09 ± 20% perf-profile.children.cycles-pp.___perf_sw_event
0.18 ± 7% -0.2 0.03 ±100% perf-profile.children.cycles-pp.perf_event_fork
0.19 ± 11% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.select_idle_core
0.30 ± 11% -0.1 0.15 ± 7% perf-profile.children.cycles-pp.pte_alloc_one
0.25 ± 6% -0.1 0.11 ± 10% perf-profile.children.cycles-pp.set_next_entity
0.20 ± 10% -0.1 0.06 ± 49% perf-profile.children.cycles-pp.__perf_event_header__init_id
0.18 ± 15% -0.1 0.03 ±101% perf-profile.children.cycles-pp.__radix_tree_lookup
0.22 ± 11% -0.1 0.08 ± 21% perf-profile.children.cycles-pp.mas_spanning_rebalance
0.20 ± 9% -0.1 0.06 ± 9% perf-profile.children.cycles-pp.stress_pthread_func
0.18 ± 12% -0.1 0.04 ± 73% perf-profile.children.cycles-pp.__getpid
0.16 ± 13% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.walk_component
0.28 ± 5% -0.1 0.15 ± 13% perf-profile.children.cycles-pp.update_curr
0.25 ± 5% -0.1 0.11 ± 22% perf-profile.children.cycles-pp.balance_fair
0.16 ± 9% -0.1 0.03 ±100% perf-profile.children.cycles-pp.futex_wake_mark
0.16 ± 12% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.get_futex_key
0.17 ± 6% -0.1 0.05 ± 47% perf-profile.children.cycles-pp.memcg_account_kmem
0.25 ± 11% -0.1 0.12 ± 11% perf-profile.children.cycles-pp._find_next_bit
0.15 ± 13% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.do_open
0.20 ± 8% -0.1 0.08 ± 16% perf-profile.children.cycles-pp.mas_rebalance
0.17 ± 13% -0.1 0.05 ± 45% perf-profile.children.cycles-pp.__memcg_kmem_charge_page
0.33 ± 6% -0.1 0.21 ± 10% perf-profile.children.cycles-pp.select_idle_sibling
0.14 ± 11% -0.1 0.03 ±100% perf-profile.children.cycles-pp.get_user_pages_fast
0.18 ± 7% -0.1 0.07 ± 14% perf-profile.children.cycles-pp.mas_alloc_nodes
0.14 ± 11% -0.1 0.03 ±101% perf-profile.children.cycles-pp.set_task_cpu
0.14 ± 12% -0.1 0.03 ±101% perf-profile.children.cycles-pp.vm_unmapped_area
0.38 ± 6% -0.1 0.27 ± 7% perf-profile.children.cycles-pp.native_sched_clock
0.16 ± 10% -0.1 0.05 ± 47% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
0.36 ± 9% -0.1 0.25 ± 12% perf-profile.children.cycles-pp.mmap_region
0.23 ± 7% -0.1 0.12 ± 9% perf-profile.children.cycles-pp.available_idle_cpu
0.13 ± 11% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.internal_get_user_pages_fast
0.16 ± 10% -0.1 0.06 ± 18% perf-profile.children.cycles-pp.get_unmapped_area
0.50 ± 7% -0.1 0.40 ± 6% perf-profile.children.cycles-pp.menu_select
0.24 ± 9% -0.1 0.14 ± 13% perf-profile.children.cycles-pp.rmqueue
0.17 ± 14% -0.1 0.07 ± 26% perf-profile.children.cycles-pp.perf_event_comm
0.17 ± 15% -0.1 0.07 ± 23% perf-profile.children.cycles-pp.perf_event_comm_event
0.17 ± 11% -0.1 0.07 ± 14% perf-profile.children.cycles-pp.pick_next_entity
0.13 ± 14% -0.1 0.03 ±102% perf-profile.children.cycles-pp.perf_output_begin
0.23 ± 6% -0.1 0.13 ± 21% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.14 ± 18% -0.1 0.04 ± 72% perf-profile.children.cycles-pp.perf_event_comm_output
0.21 ± 9% -0.1 0.12 ± 9% perf-profile.children.cycles-pp.update_rq_clock
0.16 ± 8% -0.1 0.06 ± 19% perf-profile.children.cycles-pp.mas_split
0.13 ± 14% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
0.13 ± 6% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.13 ± 7% -0.1 0.04 ± 72% perf-profile.children.cycles-pp.mas_topiary_replace
0.14 ± 8% -0.1 0.06 ± 9% perf-profile.children.cycles-pp.mas_preallocate
0.16 ± 11% -0.1 0.07 ± 18% perf-profile.children.cycles-pp.__pick_eevdf
0.11 ± 14% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.mas_empty_area_rev
0.25 ± 7% -0.1 0.17 ± 10% perf-profile.children.cycles-pp.select_idle_cpu
0.14 ± 12% -0.1 0.06 ± 14% perf-profile.children.cycles-pp.cpu_stopper_thread
0.14 ± 10% -0.1 0.06 ± 13% perf-profile.children.cycles-pp.active_load_balance_cpu_stop
0.14 ± 14% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.os_xsave
0.18 ± 6% -0.1 0.11 ± 14% perf-profile.children.cycles-pp.idle_cpu
0.17 ± 4% -0.1 0.10 ± 15% perf-profile.children.cycles-pp.hrtimer_start_range_ns
0.11 ± 14% -0.1 0.03 ±100% perf-profile.children.cycles-pp.__pthread_mutex_lock
0.32 ± 5% -0.1 0.25 ± 5% perf-profile.children.cycles-pp.sched_clock
0.11 ± 6% -0.1 0.03 ± 70% perf-profile.children.cycles-pp.wakeup_preempt
0.23 ± 7% -0.1 0.16 ± 13% perf-profile.children.cycles-pp.update_rq_clock_task
0.13 ± 8% -0.1 0.06 ± 16% perf-profile.children.cycles-pp.local_clock_noinstr
0.11 ± 10% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
0.34 ± 4% -0.1 0.27 ± 6% perf-profile.children.cycles-pp.sched_clock_cpu
0.11 ± 9% -0.1 0.04 ± 76% perf-profile.children.cycles-pp.avg_vruntime
0.15 ± 8% -0.1 0.08 ± 14% perf-profile.children.cycles-pp.update_cfs_group
0.10 ± 8% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
0.13 ± 8% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.sched_use_asym_prio
0.09 ± 12% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.getname_flags
0.18 ± 9% -0.1 0.12 ± 12% perf-profile.children.cycles-pp.__update_load_avg_se
0.11 ± 8% -0.1 0.05 ± 46% perf-profile.children.cycles-pp.place_entity
0.08 ± 12% -0.0 0.02 ± 99% perf-profile.children.cycles-pp.folio_add_lru_vma
0.10 ± 7% -0.0 0.05 ± 46% perf-profile.children.cycles-pp._find_next_and_bit
0.10 ± 6% -0.0 0.06 ± 24% perf-profile.children.cycles-pp.reweight_entity
0.03 ± 70% +0.0 0.08 ± 14% perf-profile.children.cycles-pp.perf_rotate_context
0.19 ± 10% +0.1 0.25 ± 7% perf-profile.children.cycles-pp.irqtime_account_irq
0.08 ± 11% +0.1 0.14 ± 21% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.00 +0.1 0.06 ± 14% perf-profile.children.cycles-pp.rcu_pending
0.10 ± 17% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.rebalance_domains
0.14 ± 16% +0.1 0.21 ± 12% perf-profile.children.cycles-pp.downgrade_write
0.14 ± 14% +0.1 0.21 ± 10% perf-profile.children.cycles-pp.down_read_killable
0.00 +0.1 0.07 ± 11% perf-profile.children.cycles-pp.free_tail_page_prepare
0.02 ±141% +0.1 0.09 ± 20% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.01 ±223% +0.1 0.08 ± 25% perf-profile.children.cycles-pp.arch_scale_freq_tick
0.55 ± 9% +0.1 0.62 ± 9% perf-profile.children.cycles-pp.__alloc_pages
0.34 ± 5% +0.1 0.41 ± 9% perf-profile.children.cycles-pp.clock_nanosleep
0.00 +0.1 0.08 ± 23% perf-profile.children.cycles-pp.tick_nohz_next_event
0.70 ± 2% +0.1 0.78 ± 5% perf-profile.children.cycles-pp.flush_tlb_func
0.14 ± 10% +0.1 0.23 ± 13% perf-profile.children.cycles-pp.__intel_pmu_enable_all
0.07 ± 19% +0.1 0.17 ± 17% perf-profile.children.cycles-pp.cgroup_rstat_updated
0.04 ± 71% +0.1 0.14 ± 11% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.25 ± 9% +0.1 0.38 ± 11% perf-profile.children.cycles-pp.down_read
0.43 ± 9% +0.1 0.56 ± 10% perf-profile.children.cycles-pp.get_page_from_freelist
0.00 +0.1 0.15 ± 6% perf-profile.children.cycles-pp.vm_normal_page
0.31 ± 7% +0.2 0.46 ± 9% perf-profile.children.cycles-pp.native_flush_tlb_local
0.00 +0.2 0.16 ± 8% perf-profile.children.cycles-pp.__tlb_remove_page_size
0.28 ± 11% +0.2 0.46 ± 13% perf-profile.children.cycles-pp.vma_alloc_folio
0.00 +0.2 0.24 ± 5% perf-profile.children.cycles-pp._compound_head
0.07 ± 16% +0.2 0.31 ± 6% perf-profile.children.cycles-pp.__mod_node_page_state
0.38 ± 5% +0.2 0.62 ± 7% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
0.22 ± 12% +0.2 0.47 ± 10% perf-profile.children.cycles-pp.schedule_preempt_disabled
0.38 ± 5% +0.3 0.64 ± 7% perf-profile.children.cycles-pp.perf_event_task_tick
0.00 +0.3 0.27 ± 5% perf-profile.children.cycles-pp.free_swap_cache
0.30 ± 10% +0.3 0.58 ± 10% perf-profile.children.cycles-pp.rwsem_down_read_slowpath
0.00 +0.3 0.30 ± 4% perf-profile.children.cycles-pp.free_pages_and_swap_cache
0.09 ± 10% +0.3 0.42 ± 7% perf-profile.children.cycles-pp.__mod_lruvec_state
0.00 +0.3 0.34 ± 9% perf-profile.children.cycles-pp.deferred_split_folio
0.00 +0.4 0.36 ± 13% perf-profile.children.cycles-pp.prep_compound_page
0.09 ± 10% +0.4 0.50 ± 9% perf-profile.children.cycles-pp.free_unref_page_prepare
0.00 +0.4 0.42 ± 11% perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page
1.67 ± 3% +0.4 2.12 ± 8% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.63 ± 3% +0.5 1.11 ± 12% perf-profile.children.cycles-pp.scheduler_tick
1.93 ± 3% +0.5 2.46 ± 8% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
1.92 ± 3% +0.5 2.45 ± 8% perf-profile.children.cycles-pp.hrtimer_interrupt
0.73 ± 3% +0.6 1.31 ± 11% perf-profile.children.cycles-pp.update_process_times
0.74 ± 3% +0.6 1.34 ± 11% perf-profile.children.cycles-pp.tick_sched_handle
0.20 ± 8% +0.6 0.83 ± 18% perf-profile.children.cycles-pp.__cond_resched
0.78 ± 4% +0.6 1.43 ± 12% perf-profile.children.cycles-pp.tick_nohz_highres_handler
0.12 ± 7% +0.7 0.81 ± 5% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.28 ± 7% +0.9 1.23 ± 4% perf-profile.children.cycles-pp.release_pages
0.00 +1.0 1.01 ± 6% perf-profile.children.cycles-pp.pmdp_invalidate
0.35 ± 6% +1.2 1.56 ± 5% perf-profile.children.cycles-pp.__mod_lruvec_page_state
0.30 ± 8% +1.2 1.53 ± 4% perf-profile.children.cycles-pp.tlb_batch_pages_flush
0.00 +1.3 1.26 ± 4% perf-profile.children.cycles-pp.page_add_anon_rmap
0.09 ± 11% +3.1 3.20 ± 5% perf-profile.children.cycles-pp.page_remove_rmap
1.60 ± 2% +3.4 5.04 ± 4% perf-profile.children.cycles-pp.zap_pte_range
0.03 ±100% +3.5 3.55 ± 5% perf-profile.children.cycles-pp.__split_huge_pmd_locked
41.36 +11.6 52.92 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
41.22 +11.7 52.88 ± 2% perf-profile.children.cycles-pp.do_syscall_64
6.42 ± 6% +13.5 19.88 ± 7% perf-profile.children.cycles-pp.__clone
0.82 ± 6% +16.2 16.98 ± 7% perf-profile.children.cycles-pp.clear_page_erms
2.62 ± 5% +16.4 19.04 ± 7% perf-profile.children.cycles-pp.asm_exc_page_fault
2.18 ± 5% +16.8 18.94 ± 7% perf-profile.children.cycles-pp.exc_page_fault
2.06 ± 6% +16.8 18.90 ± 7% perf-profile.children.cycles-pp.do_user_addr_fault
1.60 ± 8% +17.0 18.60 ± 7% perf-profile.children.cycles-pp.handle_mm_fault
1.52 ± 7% +17.1 18.58 ± 7% perf-profile.children.cycles-pp.__handle_mm_fault
0.30 ± 7% +17.4 17.72 ± 7% perf-profile.children.cycles-pp.clear_huge_page
0.31 ± 8% +17.6 17.90 ± 7% perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
11.66 ± 3% +22.2 33.89 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
3.29 ± 3% +30.2 33.46 perf-profile.children.cycles-pp._raw_spin_lock
0.04 ± 71% +36.2 36.21 ± 2% perf-profile.children.cycles-pp.__split_huge_pmd
8.00 ± 4% +36.5 44.54 ± 2% perf-profile.children.cycles-pp.__madvise
7.87 ± 4% +36.6 44.44 ± 2% perf-profile.children.cycles-pp.__x64_sys_madvise
7.86 ± 4% +36.6 44.44 ± 2% perf-profile.children.cycles-pp.do_madvise
7.32 ± 4% +36.8 44.07 ± 2% perf-profile.children.cycles-pp.madvise_vma_behavior
7.26 ± 4% +36.8 44.06 ± 2% perf-profile.children.cycles-pp.zap_page_range_single
1.78 +39.5 41.30 ± 2% perf-profile.children.cycles-pp.unmap_page_range
1.72 +39.6 41.28 ± 2% perf-profile.children.cycles-pp.zap_pmd_range
24.76 ± 2% -8.5 16.31 ± 2% perf-profile.self.cycles-pp.intel_idle
11.46 ± 2% -7.8 3.65 ± 5% perf-profile.self.cycles-pp.intel_idle_irq
3.16 ± 7% -2.1 1.04 ± 6% perf-profile.self.cycles-pp.smp_call_function_many_cond
1.49 ± 4% -1.2 0.30 ± 12% perf-profile.self.cycles-pp.poll_idle
1.15 ± 3% -0.6 0.50 ± 9% perf-profile.self.cycles-pp._raw_spin_lock
0.60 ± 6% -0.6 0.03 ±100% perf-profile.self.cycles-pp.queued_write_lock_slowpath
0.69 ± 4% -0.5 0.22 ± 20% perf-profile.self.cycles-pp.memcpy_orig
0.66 ± 7% -0.5 0.18 ± 11% perf-profile.self.cycles-pp.update_sg_wakeup_stats
0.59 ± 4% -0.5 0.13 ± 8% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.86 ± 3% -0.4 0.43 ± 12% perf-profile.self.cycles-pp.update_sg_lb_stats
0.56 -0.4 0.16 ± 7% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
0.48 ± 3% -0.4 0.12 ± 10% perf-profile.self.cycles-pp.__slab_free
1.18 ± 2% -0.4 0.82 ± 3% perf-profile.self.cycles-pp.llist_add_batch
0.54 ± 5% -0.3 0.19 ± 6% perf-profile.self.cycles-pp.__schedule
0.47 ± 7% -0.3 0.18 ± 13% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.34 ± 5% -0.2 0.09 ± 18% perf-profile.self.cycles-pp.kmem_cache_free
0.43 ± 4% -0.2 0.18 ± 11% perf-profile.self.cycles-pp.update_load_avg
0.35 ± 4% -0.2 0.10 ± 23% perf-profile.self.cycles-pp.rcu_cblist_dequeue
0.38 ± 9% -0.2 0.15 ± 10% perf-profile.self.cycles-pp.__switch_to_asm
0.33 ± 5% -0.2 0.10 ± 16% perf-profile.self.cycles-pp.__task_pid_nr_ns
0.36 ± 6% -0.2 0.13 ± 14% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.31 ± 6% -0.2 0.09 ± 6% perf-profile.self.cycles-pp.__free_one_page
0.28 ± 5% -0.2 0.06 ± 50% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.27 ± 13% -0.2 0.06 ± 23% perf-profile.self.cycles-pp.pthread_create@@GLIBC_2.2.5
0.30 ± 7% -0.2 0.10 ± 19% perf-profile.self.cycles-pp.__switch_to
0.27 ± 4% -0.2 0.10 ± 17% perf-profile.self.cycles-pp.finish_task_switch
0.23 ± 7% -0.2 0.06 ± 50% perf-profile.self.cycles-pp.mas_walk
0.22 ± 9% -0.2 0.05 ± 48% perf-profile.self.cycles-pp.__clone
0.63 ± 5% -0.2 0.46 ± 12% perf-profile.self.cycles-pp.llist_reverse_order
0.20 ± 4% -0.2 0.04 ± 72% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.24 ± 10% -0.1 0.09 ± 19% perf-profile.self.cycles-pp.rmqueue_bulk
0.18 ± 13% -0.1 0.03 ±101% perf-profile.self.cycles-pp.__radix_tree_lookup
0.18 ± 11% -0.1 0.04 ± 71% perf-profile.self.cycles-pp.stress_pthread_func
0.36 ± 8% -0.1 0.22 ± 11% perf-profile.self.cycles-pp.menu_select
0.22 ± 4% -0.1 0.08 ± 19% perf-profile.self.cycles-pp.___perf_sw_event
0.20 ± 13% -0.1 0.07 ± 20% perf-profile.self.cycles-pp.start_thread
0.16 ± 13% -0.1 0.03 ±101% perf-profile.self.cycles-pp.alloc_vmap_area
0.17 ± 10% -0.1 0.04 ± 73% perf-profile.self.cycles-pp.kmem_cache_alloc
0.14 ± 9% -0.1 0.03 ±100% perf-profile.self.cycles-pp.futex_wake
0.17 ± 4% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.dequeue_task_fair
0.23 ± 6% -0.1 0.12 ± 11% perf-profile.self.cycles-pp.available_idle_cpu
0.22 ± 13% -0.1 0.11 ± 12% perf-profile.self.cycles-pp._find_next_bit
0.21 ± 7% -0.1 0.10 ± 6% perf-profile.self.cycles-pp.__rmqueue_pcplist
0.37 ± 7% -0.1 0.26 ± 8% perf-profile.self.cycles-pp.native_sched_clock
0.22 ± 7% -0.1 0.12 ± 21% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.19 ± 7% -0.1 0.10 ± 11% perf-profile.self.cycles-pp.enqueue_entity
0.15 ± 5% -0.1 0.06 ± 45% perf-profile.self.cycles-pp.enqueue_task_fair
0.15 ± 11% -0.1 0.06 ± 17% perf-profile.self.cycles-pp.__pick_eevdf
0.13 ± 13% -0.1 0.05 ± 72% perf-profile.self.cycles-pp.prepare_task_switch
0.17 ± 10% -0.1 0.08 ± 8% perf-profile.self.cycles-pp.update_rq_clock_task
0.54 ± 4% -0.1 0.46 ± 6% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.14 ± 14% -0.1 0.06 ± 11% perf-profile.self.cycles-pp.os_xsave
0.11 ± 10% -0.1 0.03 ± 70% perf-profile.self.cycles-pp.try_to_wake_up
0.10 ± 8% -0.1 0.03 ±100% perf-profile.self.cycles-pp.futex_wait
0.14 ± 9% -0.1 0.07 ± 10% perf-profile.self.cycles-pp.update_curr
0.18 ± 9% -0.1 0.11 ± 14% perf-profile.self.cycles-pp.idle_cpu
0.11 ± 11% -0.1 0.04 ± 76% perf-profile.self.cycles-pp.avg_vruntime
0.15 ± 10% -0.1 0.08 ± 14% perf-profile.self.cycles-pp.update_cfs_group
0.09 ± 9% -0.1 0.03 ±100% perf-profile.self.cycles-pp.reweight_entity
0.12 ± 13% -0.1 0.06 ± 8% perf-profile.self.cycles-pp.do_idle
0.18 ± 10% -0.1 0.12 ± 13% perf-profile.self.cycles-pp.__update_load_avg_se
0.09 ± 17% -0.1 0.04 ± 71% perf-profile.self.cycles-pp.cpuidle_idle_call
0.10 ± 11% -0.0 0.06 ± 45% perf-profile.self.cycles-pp.update_rq_clock
0.12 ± 15% -0.0 0.07 ± 16% perf-profile.self.cycles-pp.update_sd_lb_stats
0.09 ± 5% -0.0 0.05 ± 46% perf-profile.self.cycles-pp._find_next_and_bit
0.01 ±223% +0.1 0.08 ± 25% perf-profile.self.cycles-pp.arch_scale_freq_tick
0.78 ± 4% +0.1 0.87 ± 4% perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
0.14 ± 10% +0.1 0.23 ± 13% perf-profile.self.cycles-pp.__intel_pmu_enable_all
0.06 ± 46% +0.1 0.15 ± 19% perf-profile.self.cycles-pp.cgroup_rstat_updated
0.19 ± 3% +0.1 0.29 ± 4% perf-profile.self.cycles-pp.cpuidle_enter_state
0.00 +0.1 0.10 ± 11% perf-profile.self.cycles-pp.__mod_lruvec_state
0.00 +0.1 0.11 ± 18% perf-profile.self.cycles-pp.__tlb_remove_page_size
0.00 +0.1 0.12 ± 9% perf-profile.self.cycles-pp.vm_normal_page
0.23 ± 7% +0.1 0.36 ± 8% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
0.20 ± 8% +0.2 0.35 ± 7% perf-profile.self.cycles-pp.__mod_lruvec_page_state
1.12 ± 2% +0.2 1.28 ± 4% perf-profile.self.cycles-pp.zap_pte_range
0.31 ± 8% +0.2 0.46 ± 9% perf-profile.self.cycles-pp.native_flush_tlb_local
0.00 +0.2 0.16 ± 5% perf-profile.self.cycles-pp._compound_head
0.06 ± 17% +0.2 0.26 ± 4% perf-profile.self.cycles-pp.__mod_node_page_state
0.00 +0.2 0.24 ± 6% perf-profile.self.cycles-pp.free_swap_cache
0.00 +0.3 0.27 ± 15% perf-profile.self.cycles-pp.clear_huge_page
0.00 +0.3 0.27 ± 11% perf-profile.self.cycles-pp.deferred_split_folio
0.00 +0.4 0.36 ± 13% perf-profile.self.cycles-pp.prep_compound_page
0.05 ± 47% +0.4 0.43 ± 9% perf-profile.self.cycles-pp.free_unref_page_prepare
0.08 ± 7% +0.5 0.57 ± 23% perf-profile.self.cycles-pp.__cond_resched
0.08 ± 12% +0.5 0.58 ± 5% perf-profile.self.cycles-pp.release_pages
0.10 ± 10% +0.5 0.63 ± 6% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.00 +1.1 1.11 ± 7% perf-profile.self.cycles-pp.__split_huge_pmd_locked
0.00 +1.2 1.18 ± 4% perf-profile.self.cycles-pp.page_add_anon_rmap
0.03 ±101% +1.3 1.35 ± 7% perf-profile.self.cycles-pp.page_remove_rmap
0.82 ± 5% +16.1 16.88 ± 7% perf-profile.self.cycles-pp.clear_page_erms
11.65 ± 3% +20.2 31.88 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
***************************************************************************************************
lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
=========================================================================================
array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
commit:
30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
---------------- ---------------------------
%stddev %change %stddev
\ | \
10.50 ± 14% +55.6% 16.33 ± 16% perf-c2c.DRAM.local
6724 -11.4% 5954 ± 2% vmstat.system.cs
2.746e+09 +16.7% 3.205e+09 ± 2% cpuidle..time
2771516 +16.0% 3213723 ± 2% cpuidle..usage
0.06 ± 4% -0.0 0.05 ± 5% mpstat.cpu.all.soft%
0.47 ± 2% -0.1 0.39 ± 2% mpstat.cpu.all.sys%
0.01 ± 85% +1700.0% 0.20 ±188% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
15.11 ± 13% -28.8% 10.76 ± 34% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
15.09 ± 13% -30.3% 10.51 ± 38% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1023952 +13.4% 1161219 meminfo.AnonHugePages
1319741 +10.8% 1461995 meminfo.AnonPages
1331039 +11.2% 1480149 meminfo.Inactive
1330865 +11.2% 1479975 meminfo.Inactive(anon)
1266202 +16.0% 1469399 ± 2% turbostat.C1E
1509871 +16.6% 1760853 ± 2% turbostat.C6
3521203 +17.4% 4134075 ± 3% turbostat.IRQ
580.32 -3.8% 558.30 turbostat.PkgWatt
77.42 -14.0% 66.60 ± 2% turbostat.RAMWatt
330416 +10.8% 366020 proc-vmstat.nr_anon_pages
500.90 +13.4% 567.99 proc-vmstat.nr_anon_transparent_hugepages
333197 +11.2% 370536 proc-vmstat.nr_inactive_anon
333197 +11.2% 370536 proc-vmstat.nr_zone_inactive_anon
129879 ± 11% -46.7% 69207 ± 12% proc-vmstat.numa_pages_migrated
3879028 +5.9% 4109180 proc-vmstat.pgalloc_normal
3403414 +6.6% 3628929 proc-vmstat.pgfree
129879 ± 11% -46.7% 69207 ± 12% proc-vmstat.pgmigrate_success
5763 +9.8% 6327 proc-vmstat.thp_fault_alloc
350993 -15.6% 296081 ± 2% stream.add_bandwidth_MBps
349830 -16.1% 293492 ± 2% stream.add_bandwidth_MBps_harmonicMean
333973 -20.5% 265439 ± 3% stream.copy_bandwidth_MBps
332930 -21.7% 260548 ± 3% stream.copy_bandwidth_MBps_harmonicMean
302788 -16.2% 253817 ± 2% stream.scale_bandwidth_MBps
302157 -17.1% 250577 ± 2% stream.scale_bandwidth_MBps_harmonicMean
1177276 +9.3% 1286614 stream.time.maximum_resident_set_size
5038 +1.1% 5095 stream.time.percent_of_cpu_this_job_got
694.19 ± 2% +19.5% 829.85 ± 2% stream.time.user_time
339047 -12.1% 298061 stream.triad_bandwidth_MBps
338186 -12.4% 296218 stream.triad_bandwidth_MBps_harmonicMean
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
8.42 ±100% -8.4 0.00 perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode
0.84 ±103% +1.7 2.57 ± 59% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.84 ±103% +1.7 2.57 ± 59% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
0.31 ±223% +2.0 2.33 ± 44% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.31 ±223% +2.0 2.33 ± 44% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
3.07 ± 56% +2.8 5.88 ± 28% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.42 ±100% -8.4 0.00 perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
8.42 ±100% -8.1 0.36 ±223% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
12.32 ± 25% -6.6 5.69 ± 69% perf-profile.children.cycles-pp.vsnprintf
12.76 ± 27% -6.6 6.19 ± 67% perf-profile.children.cycles-pp.seq_printf
3.07 ± 56% +2.8 5.88 ± 28% perf-profile.children.cycles-pp.__x64_sys_exit_group
40.11 -11.0% 35.71 ± 2% perf-stat.i.MPKI
1.563e+10 -12.3% 1.371e+10 ± 2% perf-stat.i.branch-instructions
3.721e+09 ± 2% -23.2% 2.858e+09 ± 4% perf-stat.i.cache-misses
4.471e+09 ± 3% -22.7% 3.458e+09 ± 4% perf-stat.i.cache-references
5970 ± 5% -15.9% 5021 ± 4% perf-stat.i.context-switches
1.66 ± 2% +15.8% 1.92 ± 2% perf-stat.i.cpi
41.83 ± 4% +30.6% 54.63 ± 4% perf-stat.i.cycles-between-cache-misses
2.282e+10 ± 2% -14.5% 1.952e+10 ± 2% perf-stat.i.dTLB-loads
572602 ± 3% -9.2% 519922 ± 5% perf-stat.i.dTLB-store-misses
1.483e+10 ± 2% -15.7% 1.25e+10 ± 2% perf-stat.i.dTLB-stores
9.179e+10 -13.7% 7.924e+10 ± 2% perf-stat.i.instructions
0.61 -13.4% 0.52 ± 2% perf-stat.i.ipc
373.79 ± 4% -37.8% 232.60 ± 9% perf-stat.i.metric.K/sec
251.45 -13.4% 217.72 ± 2% perf-stat.i.metric.M/sec
21446 ± 3% -24.1% 16278 ± 8% perf-stat.i.minor-faults
15.07 ± 5% -6.0 9.10 ± 10% perf-stat.i.node-load-miss-rate%
68275790 ± 5% -44.9% 37626128 ± 12% perf-stat.i.node-load-misses
21448 ± 3% -24.1% 16281 ± 8% perf-stat.i.page-faults
40.71 -11.3% 36.10 ± 2% perf-stat.overall.MPKI
1.67 +15.3% 1.93 ± 2% perf-stat.overall.cpi
41.07 ± 3% +30.1% 53.42 ± 4% perf-stat.overall.cycles-between-cache-misses
0.00 ± 2% +0.0 0.00 ± 2% perf-stat.overall.dTLB-store-miss-rate%
0.60 -13.2% 0.52 ± 2% perf-stat.overall.ipc
15.19 ± 5% -6.2 9.03 ± 11% perf-stat.overall.node-load-miss-rate%
1.4e+10 -9.3% 1.269e+10 perf-stat.ps.branch-instructions
3.352e+09 ± 3% -20.9% 2.652e+09 ± 4% perf-stat.ps.cache-misses
4.026e+09 ± 3% -20.3% 3.208e+09 ± 4% perf-stat.ps.cache-references
4888 ± 4% -10.8% 4362 ± 3% perf-stat.ps.context-switches
206092 +2.1% 210375 perf-stat.ps.cpu-clock
1.375e+11 +2.8% 1.414e+11 perf-stat.ps.cpu-cycles
258.23 ± 5% +8.8% 280.85 ± 4% perf-stat.ps.cpu-migrations
2.048e+10 -11.7% 1.809e+10 ± 2% perf-stat.ps.dTLB-loads
1.333e+10 ± 2% -13.0% 1.16e+10 ± 2% perf-stat.ps.dTLB-stores
8.231e+10 -10.8% 7.342e+10 perf-stat.ps.instructions
15755 ± 3% -16.3% 13187 ± 6% perf-stat.ps.minor-faults
61706790 ± 6% -43.8% 34699716 ± 11% perf-stat.ps.node-load-misses
15757 ± 3% -16.3% 13189 ± 6% perf-stat.ps.page-faults
206092 +2.1% 210375 perf-stat.ps.task-clock
1.217e+12 +4.1% 1.267e+12 ± 2% perf-stat.total.instructions
***************************************************************************************************
lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
commit:
30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
---------------- ---------------------------
%stddev %change %stddev
\ | \
232.12 ± 7% -12.0% 204.18 ± 8% sched_debug.cfs_rq:/.load_avg.stddev
6797 -3.3% 6576 vmstat.system.cs
15161 -0.9% 15029 vmstat.system.in
349927 +44.3% 504820 meminfo.AnonHugePages
507807 +27.1% 645169 meminfo.AnonPages
1499332 +10.2% 1652612 meminfo.Inactive(anon)
8.67 ± 62% +184.6% 24.67 ± 25% turbostat.C10
1.50 -0.1 1.45 turbostat.C1E%
3.30 -3.2% 3.20 turbostat.RAMWatt
1.40 ± 14% -0.3 1.09 ± 13% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
1.44 ± 12% -0.3 1.12 ± 13% perf-profile.children.cycles-pp.asm_exc_page_fault
0.03 ±141% +0.1 0.10 ± 30% perf-profile.children.cycles-pp.next_uptodate_folio
0.02 ±141% +0.1 0.10 ± 22% perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
0.02 ±143% +0.1 0.10 ± 25% perf-profile.self.cycles-pp.next_uptodate_folio
0.01 ±223% +0.1 0.09 ± 19% perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
19806 -3.5% 19109 phoronix-test-suite.ramspeed.Average.Integer.mb_s
283.70 +3.8% 294.50 phoronix-test-suite.time.elapsed_time
283.70 +3.8% 294.50 phoronix-test-suite.time.elapsed_time.max
120454 +1.6% 122334 phoronix-test-suite.time.maximum_resident_set_size
281337 -54.8% 127194 phoronix-test-suite.time.minor_page_faults
259.13 +4.1% 269.81 phoronix-test-suite.time.user_time
126951 +27.0% 161291 proc-vmstat.nr_anon_pages
170.86 +44.3% 246.49 proc-vmstat.nr_anon_transparent_hugepages
355917 -1.0% 352250 proc-vmstat.nr_dirty_background_threshold
712705 -1.0% 705362 proc-vmstat.nr_dirty_threshold
3265201 -1.1% 3228465 proc-vmstat.nr_free_pages
374833 +10.2% 413153 proc-vmstat.nr_inactive_anon
1767 +4.8% 1853 proc-vmstat.nr_page_table_pages
374833 +10.2% 413153 proc-vmstat.nr_zone_inactive_anon
854665 -34.3% 561406 proc-vmstat.numa_hit
854632 -34.3% 561397 proc-vmstat.numa_local
5548755 +1.1% 5610598 proc-vmstat.pgalloc_normal
1083315 -26.2% 799129 proc-vmstat.pgfault
113425 +3.7% 117656 proc-vmstat.pgreuse
9025 +7.6% 9714 proc-vmstat.thp_fault_alloc
3.38 +0.1 3.45 perf-stat.i.branch-miss-rate%
4.135e+08 -3.2% 4.003e+08 perf-stat.i.cache-misses
5.341e+08 -2.7% 5.197e+08 perf-stat.i.cache-references
6832 -3.4% 6600 perf-stat.i.context-switches
4.06 +3.1% 4.19 perf-stat.i.cpi
438639 ± 5% -18.7% 356730 ± 6% perf-stat.i.dTLB-load-misses
1.119e+09 -3.8% 1.077e+09 perf-stat.i.dTLB-loads
0.02 ± 15% -0.0 0.01 ± 26% perf-stat.i.dTLB-store-miss-rate%
80407 ± 10% -63.5% 29387 ± 23% perf-stat.i.dTLB-store-misses
7.319e+08 -3.8% 7.043e+08 perf-stat.i.dTLB-stores
57.72 +0.8 58.52 perf-stat.i.iTLB-load-miss-rate%
129846 -3.8% 124973 perf-stat.i.iTLB-load-misses
144448 -5.3% 136837 perf-stat.i.iTLB-loads
2.389e+09 -3.5% 2.305e+09 perf-stat.i.instructions
0.28 -2.9% 0.27 perf-stat.i.ipc
220.59 -3.4% 213.11 perf-stat.i.metric.M/sec
3610 -31.2% 2483 perf-stat.i.minor-faults
49238342 +1.1% 49776834 perf-stat.i.node-loads
98106028 -3.1% 95018390 perf-stat.i.node-stores
3615 -31.2% 2487 perf-stat.i.page-faults
3.65 +3.7% 3.78 perf-stat.overall.cpi
21.08 +3.3% 21.79 perf-stat.overall.cycles-between-cache-misses
0.04 ± 5% -0.0 0.03 ± 6% perf-stat.overall.dTLB-load-miss-rate%
0.01 ± 10% -0.0 0.00 ± 23% perf-stat.overall.dTLB-store-miss-rate%
0.27 -3.6% 0.26 perf-stat.overall.ipc
4.122e+08 -3.2% 3.99e+08 perf-stat.ps.cache-misses
5.324e+08 -2.7% 5.181e+08 perf-stat.ps.cache-references
6809 -3.4% 6580 perf-stat.ps.context-switches
437062 ± 5% -18.7% 355481 ± 6% perf-stat.ps.dTLB-load-misses
1.115e+09 -3.8% 1.073e+09 perf-stat.ps.dTLB-loads
80134 ± 10% -63.5% 29283 ± 23% perf-stat.ps.dTLB-store-misses
7.295e+08 -3.8% 7.021e+08 perf-stat.ps.dTLB-stores
129362 -3.7% 124535 perf-stat.ps.iTLB-load-misses
143865 -5.2% 136338 perf-stat.ps.iTLB-loads
2.381e+09 -3.5% 2.297e+09 perf-stat.ps.instructions
3596 -31.2% 2473 perf-stat.ps.minor-faults
49081949 +1.1% 49621463 perf-stat.ps.node-loads
97795918 -3.1% 94724831 perf-stat.ps.node-stores
3600 -31.2% 2477 perf-stat.ps.page-faults
***************************************************************************************************
lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
commit:
30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
---------------- ---------------------------
%stddev %change %stddev
\ | \
167.28 ± 5% -13.1% 145.32 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.avg
6845 -2.5% 6674 vmstat.system.cs
351910 ± 2% +40.2% 493341 meminfo.AnonHugePages
505908 +27.2% 643328 meminfo.AnonPages
1497656 +10.2% 1650453 meminfo.Inactive(anon)
18957 ± 13% +26.3% 23947 ± 17% turbostat.C1
1.52 -0.0 1.48 turbostat.C1E%
3.32 -2.9% 3.23 turbostat.RAMWatt
19978 -3.0% 19379 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
280.71 +3.3% 289.93 phoronix-test-suite.time.elapsed_time
280.71 +3.3% 289.93 phoronix-test-suite.time.elapsed_time.max
120465 +1.5% 122257 phoronix-test-suite.time.maximum_resident_set_size
281047 -54.7% 127190 phoronix-test-suite.time.minor_page_faults
257.03 +3.5% 265.95 phoronix-test-suite.time.user_time
126473 +27.2% 160831 proc-vmstat.nr_anon_pages
171.83 ± 2% +40.2% 240.89 proc-vmstat.nr_anon_transparent_hugepages
355973 -1.0% 352304 proc-vmstat.nr_dirty_background_threshold
712818 -1.0% 705471 proc-vmstat.nr_dirty_threshold
3265800 -1.1% 3228879 proc-vmstat.nr_free_pages
374410 +10.2% 412613 proc-vmstat.nr_inactive_anon
1770 +4.4% 1848 proc-vmstat.nr_page_table_pages
374410 +10.2% 412613 proc-vmstat.nr_zone_inactive_anon
852082 -34.9% 555093 proc-vmstat.numa_hit
852125 -34.9% 555018 proc-vmstat.numa_local
1078293 -26.6% 791038 proc-vmstat.pgfault
112693 +2.9% 116004 proc-vmstat.pgreuse
9025 +7.6% 9713 proc-vmstat.thp_fault_alloc
3.63 ± 6% +0.6 4.25 ± 9% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.25 ± 55% -0.2 0.08 ± 68% perf-profile.children.cycles-pp.ret_from_fork_asm
0.25 ± 55% -0.2 0.08 ± 68% perf-profile.children.cycles-pp.ret_from_fork
0.23 ± 56% -0.2 0.07 ± 69% perf-profile.children.cycles-pp.kthread
0.14 ± 36% -0.1 0.05 ±120% perf-profile.children.cycles-pp.do_anonymous_page
0.14 ± 35% -0.1 0.05 ± 76% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string
0.04 ± 72% +0.0 0.08 ± 19% perf-profile.children.cycles-pp.try_to_wake_up
0.04 ±118% +0.1 0.10 ± 36% perf-profile.children.cycles-pp.update_rq_clock
0.07 ± 79% +0.1 0.17 ± 21% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
7.99 ± 11% +1.0 9.02 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.23 ± 28% -0.1 0.14 ± 49% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
0.14 ± 35% -0.1 0.05 ± 76% perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string
0.06 ± 79% +0.1 0.16 ± 21% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.21 ± 34% +0.2 0.36 ± 18% perf-profile.self.cycles-pp.ktime_get
1.187e+08 -4.6% 1.133e+08 perf-stat.i.branch-instructions
3.36 +0.1 3.42 perf-stat.i.branch-miss-rate%
5492420 -3.9% 5275592 perf-stat.i.branch-misses
4.148e+08 -2.8% 4.034e+08 perf-stat.i.cache-misses
5.251e+08 -2.6% 5.114e+08 perf-stat.i.cache-references
6880 -2.5% 6711 perf-stat.i.context-switches
4.30 +2.9% 4.43 perf-stat.i.cpi
0.10 ± 7% -0.0 0.09 ± 2% perf-stat.i.dTLB-load-miss-rate%
472268 ± 6% -19.9% 378489 perf-stat.i.dTLB-load-misses
8.107e+08 -3.4% 7.831e+08 perf-stat.i.dTLB-loads
0.02 ± 16% -0.0 0.01 ± 2% perf-stat.i.dTLB-store-miss-rate%
90535 ± 11% -59.8% 36371 ± 2% perf-stat.i.dTLB-store-misses
5.323e+08 -3.3% 5.145e+08 perf-stat.i.dTLB-stores
129981 -3.0% 126061 perf-stat.i.iTLB-load-misses
143662 -3.1% 139223 perf-stat.i.iTLB-loads
2.253e+09 -3.6% 2.172e+09 perf-stat.i.instructions
0.26 -3.2% 0.25 perf-stat.i.ipc
4.71 ± 2% -6.4% 4.41 ± 2% perf-stat.i.major-faults
180.03 -3.0% 174.57 perf-stat.i.metric.M/sec
3627 -30.8% 2510 ± 2% perf-stat.i.minor-faults
3632 -30.8% 2514 ± 2% perf-stat.i.page-faults
3.88 +3.6% 4.02 perf-stat.overall.cpi
21.08 +2.7% 21.65 perf-stat.overall.cycles-between-cache-misses
0.06 ± 6% -0.0 0.05 perf-stat.overall.dTLB-load-miss-rate%
0.02 ± 11% -0.0 0.01 ± 2% perf-stat.overall.dTLB-store-miss-rate%
0.26 -3.5% 0.25 perf-stat.overall.ipc
1.182e+08 -4.6% 1.128e+08 perf-stat.ps.branch-instructions
5468166 -4.0% 5251939 perf-stat.ps.branch-misses
4.135e+08 -2.7% 4.021e+08 perf-stat.ps.cache-misses
5.234e+08 -2.6% 5.098e+08 perf-stat.ps.cache-references
6859 -2.5% 6685 perf-stat.ps.context-switches
470567 ± 6% -19.9% 377127 perf-stat.ps.dTLB-load-misses
8.079e+08 -3.4% 7.805e+08 perf-stat.ps.dTLB-loads
90221 ± 11% -59.8% 36239 ± 2% perf-stat.ps.dTLB-store-misses
5.305e+08 -3.3% 5.128e+08 perf-stat.ps.dTLB-stores
129499 -3.0% 125601 perf-stat.ps.iTLB-load-misses
143121 -3.1% 138638 perf-stat.ps.iTLB-loads
2.246e+09 -3.6% 2.165e+09 perf-stat.ps.instructions
4.69 ± 2% -6.3% 4.39 ± 2% perf-stat.ps.major-faults
3613 -30.8% 2500 ± 2% perf-stat.ps.minor-faults
3617 -30.8% 2504 ± 2% perf-stat.ps.page-faults
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-19 15:41 [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression kernel test robot @ 2023-12-20 5:27 ` Yang Shi 2023-12-20 8:29 ` Yin Fengwei 0 siblings, 1 reply; 24+ messages in thread From: Yang Shi @ 2023-12-20 5:27 UTC (permalink / raw) To: kernel test robot Cc: Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang, fengwei.yin On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote: > > > > Hello, > > for this commit, we reported > "[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression" > in Aug, 2022 when it's in linux-next/master > https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/ > > later, we reported > "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression" > in Oct, 2022 when it's in linus/master > https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/ > > and the commit was reverted finally by > commit 0ba09b1733878afe838fe35c310715fda3d46428 > Author: Linus Torvalds <torvalds@linux-foundation.org> > Date: Sun Dec 4 12:51:59 2022 -0800 > > now we noticed it goes into linux-next/master again. > > we are not sure if there is an agreement that the benefit of this commit > has already overweight performance drop in some mirco benchmark. > > we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/ > that > "This patch was applied to v6.1, but was reverted due to a regression > report. However it turned out the regression was not due to this patch. > I ping'ed Andrew to reapply this patch, Andrew may forget it. This > patch helps promote THP, so I rebased it onto the latest mm-unstable." IIRC, Huang Ying's analysis showed the regression for will-it-scale micro benchmark is fine, it was actually reverted due to kernel build regression with LLVM reported by Nathan Chancellor. Then the regression was resolved by commit 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out if page in deferred queue already"). And this patch did improve kernel build with GCC by ~3% if I remember correctly. > > however, unfortunately, in our latest tests, we still observed below regression > upon this commit. just FYI. > > > > kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on: Interesting, wasn't the same regression seen last time? And I'm a little bit confused about how pthread got regressed. I didn't see the pthread benchmark do any intensive memory alloc/free operations. Do the pthread APIs do any intensive memory operations? I saw the benchmark does allocate memory for thread stack, but it should be just 8K per thread, so it should not trigger what this patch does. With 1024 threads, the thread stacks may get merged into one single VMA (8M total), but it may do so even though the patch is not applied. > > > commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > testcase: stress-ng > test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory > parameters: > > nr_threads: 1 > disk: 1HDD > testtime: 60s > fs: ext4 > class: os > test: pthread > cpufreq_governor: performance > > > In addition to that, the commit also has significant impact on the following tests: > > +------------------+-----------------------------------------------------------------------------------------------+ > | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression | > | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory | > | test parameters | array_size=50000000 | > | | cpufreq_governor=performance | > | | iterations=10x | > | | loop=100 | > | | nr_threads=25% | > | | omp=true | > +------------------+-----------------------------------------------------------------------------------------------+ > | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression | > | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory | > | test parameters | cpufreq_governor=performance | > | | option_a=Average | > | | option_b=Integer | > | | test=ramspeed-1.4.3 | > +------------------+-----------------------------------------------------------------------------------------------+ > | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression | > | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory | > | test parameters | cpufreq_governor=performance | > | | option_a=Average | > | | option_b=Floating Point | > | | test=ramspeed-1.4.3 | > +------------------+-----------------------------------------------------------------------------------------------+ > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com > > ========================================================================================= > class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s > > commit: > 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()") > 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries") > > 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 13405796 -65.5% 4620124 cpuidle..usage > 8.00 +8.2% 8.66 ą 2% iostat.cpu.system > 1.61 -60.6% 0.63 iostat.cpu.user > 597.50 ą 14% -64.3% 213.50 ą 14% perf-c2c.DRAM.local > 1882 ą 14% -74.7% 476.83 ą 7% perf-c2c.HITM.local > 3768436 -12.9% 3283395 vmstat.memory.cache > 355105 -75.7% 86344 ą 3% vmstat.system.cs > 385435 -20.7% 305714 ą 3% vmstat.system.in > 1.13 -0.2 0.88 mpstat.cpu.all.irq% > 0.29 -0.2 0.10 ą 2% mpstat.cpu.all.soft% > 6.76 ą 2% +1.1 7.88 ą 2% mpstat.cpu.all.sys% > 1.62 -1.0 0.62 ą 2% mpstat.cpu.all.usr% > 2234397 -84.3% 350161 ą 5% stress-ng.pthread.ops > 37237 -84.3% 5834 ą 5% stress-ng.pthread.ops_per_sec > 294706 ą 2% -68.0% 94191 ą 6% stress-ng.time.involuntary_context_switches > 41442 ą 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size > 4466457 -83.9% 717053 ą 5% stress-ng.time.minor_page_faults The larger RSS and fewer page faults are expected. > 243.33 +13.5% 276.17 ą 3% stress-ng.time.percent_of_cpu_this_job_got > 131.64 +27.7% 168.11 ą 3% stress-ng.time.system_time > 19.73 -82.1% 3.53 ą 4% stress-ng.time.user_time Much less user time. And it seems to match the drop of the pthread metric. > 7715609 -80.2% 1530125 ą 4% stress-ng.time.voluntary_context_switches > 494566 -59.5% 200338 ą 3% meminfo.Active > 478287 -61.5% 184050 ą 3% meminfo.Active(anon) > 58549 ą 17% +1532.8% 956006 ą 14% meminfo.AnonHugePages > 424631 +194.9% 1252445 ą 10% meminfo.AnonPages > 3677263 -13.0% 3197755 meminfo.Cached > 5829485 ą 4% -19.0% 4724784 ą 10% meminfo.Committed_AS > 692486 +108.6% 1444669 ą 8% meminfo.Inactive > 662179 +113.6% 1414338 ą 9% meminfo.Inactive(anon) > 182416 -50.2% 90759 meminfo.Mapped > 4614466 +10.0% 5076604 ą 2% meminfo.Memused > 6985 +47.6% 10307 ą 4% meminfo.PageTables > 718445 -66.7% 238913 ą 3% meminfo.Shmem > 35906 -20.7% 28471 ą 3% meminfo.VmallocUsed > 4838522 +25.6% 6075302 meminfo.max_used_kB > 488.83 -20.9% 386.67 ą 2% turbostat.Avg_MHz > 12.95 -2.7 10.26 ą 2% turbostat.Busy% > 7156734 -87.2% 919149 ą 4% turbostat.C1 > 10.59 -8.9 1.65 ą 5% turbostat.C1% > 3702647 -55.1% 1663518 ą 2% turbostat.C1E > 32.99 -20.6 12.36 ą 3% turbostat.C1E% > 1161078 +64.5% 1909611 turbostat.C6 > 44.25 +31.8 76.10 turbostat.C6% > 0.18 -33.3% 0.12 turbostat.IPC > 74338573 ą 2% -33.9% 49159610 ą 4% turbostat.IRQ > 1381661 -91.0% 124075 ą 6% turbostat.POLL > 0.26 -0.2 0.04 ą 12% turbostat.POLL% > 96.15 -5.4% 90.95 turbostat.PkgWatt > 12.12 +19.3% 14.46 turbostat.RAMWatt > 119573 -61.5% 46012 ą 3% proc-vmstat.nr_active_anon > 106168 +195.8% 314047 ą 10% proc-vmstat.nr_anon_pages > 28.60 ą 17% +1538.5% 468.68 ą 14% proc-vmstat.nr_anon_transparent_hugepages > 923365 -13.0% 803489 proc-vmstat.nr_file_pages > 165571 +113.5% 353493 ą 9% proc-vmstat.nr_inactive_anon > 45605 -50.2% 22690 proc-vmstat.nr_mapped > 1752 +47.1% 2578 ą 4% proc-vmstat.nr_page_table_pages > 179613 -66.7% 59728 ą 3% proc-vmstat.nr_shmem > 21490 -2.4% 20981 proc-vmstat.nr_slab_reclaimable > 28260 -7.3% 26208 proc-vmstat.nr_slab_unreclaimable > 119573 -61.5% 46012 ą 3% proc-vmstat.nr_zone_active_anon > 165570 +113.5% 353492 ą 9% proc-vmstat.nr_zone_inactive_anon > 17343640 -76.3% 4116748 ą 4% proc-vmstat.numa_hit > 17364975 -76.3% 4118098 ą 4% proc-vmstat.numa_local > 249252 -66.2% 84187 ą 2% proc-vmstat.pgactivate > 27528916 +567.1% 1.836e+08 ą 5% proc-vmstat.pgalloc_normal > 4912427 -79.2% 1019949 ą 3% proc-vmstat.pgfault > 27227124 +574.1% 1.835e+08 ą 5% proc-vmstat.pgfree > 8728 +3896.4% 348802 ą 5% proc-vmstat.thp_deferred_split_page > 8730 +3895.3% 348814 ą 5% proc-vmstat.thp_fault_alloc > 8728 +3896.4% 348802 ą 5% proc-vmstat.thp_split_pmd > 316745 -21.5% 248756 ą 4% sched_debug.cfs_rq:/.avg_vruntime.avg > 112735 ą 4% -34.3% 74061 ą 6% sched_debug.cfs_rq:/.avg_vruntime.min > 0.49 ą 6% -17.2% 0.41 ą 8% sched_debug.cfs_rq:/.h_nr_running.stddev > 12143 ą120% -99.9% 15.70 ą116% sched_debug.cfs_rq:/.left_vruntime.avg > 414017 ą126% -99.9% 428.50 ą102% sched_debug.cfs_rq:/.left_vruntime.max > 68492 ą125% -99.9% 78.15 ą106% sched_debug.cfs_rq:/.left_vruntime.stddev > 41917 ą 24% -48.3% 21690 ą 57% sched_debug.cfs_rq:/.load.avg > 176151 ą 30% -56.9% 75963 ą 57% sched_debug.cfs_rq:/.load.stddev > 6489 ą 17% -29.0% 4608 ą 12% sched_debug.cfs_rq:/.load_avg.max > 4.42 ą 45% -81.1% 0.83 ą 74% sched_debug.cfs_rq:/.load_avg.min > 1112 ą 17% -31.0% 767.62 ą 11% sched_debug.cfs_rq:/.load_avg.stddev > 316745 -21.5% 248756 ą 4% sched_debug.cfs_rq:/.min_vruntime.avg > 112735 ą 4% -34.3% 74061 ą 6% sched_debug.cfs_rq:/.min_vruntime.min > 0.49 ą 6% -17.2% 0.41 ą 8% sched_debug.cfs_rq:/.nr_running.stddev > 12144 ą120% -99.9% 15.70 ą116% sched_debug.cfs_rq:/.right_vruntime.avg > 414017 ą126% -99.9% 428.50 ą102% sched_debug.cfs_rq:/.right_vruntime.max > 68492 ą125% -99.9% 78.15 ą106% sched_debug.cfs_rq:/.right_vruntime.stddev > 14.25 ą 44% -76.6% 3.33 ą 58% sched_debug.cfs_rq:/.runnable_avg.min > 11.58 ą 49% -77.7% 2.58 ą 58% sched_debug.cfs_rq:/.util_avg.min > 423972 ą 23% +59.3% 675379 ą 3% sched_debug.cpu.avg_idle.avg > 5720 ą 43% +439.5% 30864 sched_debug.cpu.avg_idle.min > 99.79 ą 2% -23.7% 76.11 ą 2% sched_debug.cpu.clock_task.stddev > 162475 ą 49% -95.8% 6813 ą 26% sched_debug.cpu.curr->pid.avg > 1061268 -84.0% 170212 ą 4% sched_debug.cpu.curr->pid.max > 365404 ą 20% -91.3% 31839 ą 10% sched_debug.cpu.curr->pid.stddev > 0.51 ą 3% -20.1% 0.41 ą 9% sched_debug.cpu.nr_running.stddev > 311923 -74.2% 80615 ą 2% sched_debug.cpu.nr_switches.avg > 565973 ą 4% -77.8% 125597 ą 10% sched_debug.cpu.nr_switches.max > 192666 ą 4% -70.6% 56695 ą 6% sched_debug.cpu.nr_switches.min > 67485 ą 8% -79.9% 13558 ą 10% sched_debug.cpu.nr_switches.stddev > 2.62 +102.1% 5.30 perf-stat.i.MPKI > 2.09e+09 -47.6% 1.095e+09 ą 4% perf-stat.i.branch-instructions > 1.56 -0.5 1.01 perf-stat.i.branch-miss-rate% > 31951200 -60.9% 12481432 ą 2% perf-stat.i.branch-misses > 19.38 +23.7 43.08 perf-stat.i.cache-miss-rate% > 26413597 -5.7% 24899132 ą 4% perf-stat.i.cache-misses > 1.363e+08 -58.3% 56906133 ą 4% perf-stat.i.cache-references > 370628 -75.8% 89743 ą 3% perf-stat.i.context-switches > 1.77 +65.1% 2.92 ą 2% perf-stat.i.cpi > 1.748e+10 -21.8% 1.367e+10 ą 2% perf-stat.i.cpu-cycles > 61611 -79.1% 12901 ą 6% perf-stat.i.cpu-migrations > 716.97 ą 2% -17.2% 593.35 ą 2% perf-stat.i.cycles-between-cache-misses > 0.12 ą 4% -0.1 0.05 perf-stat.i.dTLB-load-miss-rate% > 3066100 ą 3% -81.3% 573066 ą 5% perf-stat.i.dTLB-load-misses > 2.652e+09 -50.1% 1.324e+09 ą 4% perf-stat.i.dTLB-loads > 0.08 ą 2% -0.0 0.03 perf-stat.i.dTLB-store-miss-rate% > 1168195 ą 2% -82.9% 199438 ą 5% perf-stat.i.dTLB-store-misses > 1.478e+09 -56.8% 6.384e+08 ą 3% perf-stat.i.dTLB-stores > 8080423 -73.2% 2169371 ą 3% perf-stat.i.iTLB-load-misses > 5601321 -74.3% 1440571 ą 2% perf-stat.i.iTLB-loads > 1.028e+10 -49.7% 5.173e+09 ą 4% perf-stat.i.instructions > 1450 +73.1% 2511 ą 2% perf-stat.i.instructions-per-iTLB-miss > 0.61 -35.9% 0.39 perf-stat.i.ipc > 0.48 -21.4% 0.38 ą 2% perf-stat.i.metric.GHz > 616.28 -17.6% 507.69 ą 4% perf-stat.i.metric.K/sec > 175.16 -50.8% 86.18 ą 4% perf-stat.i.metric.M/sec > 76728 -80.8% 14724 ą 4% perf-stat.i.minor-faults > 5600408 -61.4% 2160997 ą 5% perf-stat.i.node-loads > 8873996 +52.1% 13499744 ą 5% perf-stat.i.node-stores > 112409 -81.9% 20305 ą 4% perf-stat.i.page-faults > 2.55 +89.6% 4.83 perf-stat.overall.MPKI Much more TLB misses. > 1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate% > 19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate% > 1.70 +56.4% 2.65 perf-stat.overall.cpi > 665.84 -17.5% 549.51 ą 2% perf-stat.overall.cycles-between-cache-misses > 0.12 ą 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate% > 0.08 ą 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate% > 59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate% > 1278 +86.1% 2379 ą 2% perf-stat.overall.instructions-per-iTLB-miss > 0.59 -36.1% 0.38 perf-stat.overall.ipc Worse IPC and CPI. > 2.078e+09 -48.3% 1.074e+09 ą 4% perf-stat.ps.branch-instructions > 31292687 -61.2% 12133349 ą 2% perf-stat.ps.branch-misses > 26057291 -5.9% 24512034 ą 4% perf-stat.ps.cache-misses > 1.353e+08 -58.6% 56072195 ą 4% perf-stat.ps.cache-references > 365254 -75.8% 88464 ą 3% perf-stat.ps.context-switches > 1.735e+10 -22.4% 1.346e+10 ą 2% perf-stat.ps.cpu-cycles > 60838 -79.1% 12727 ą 6% perf-stat.ps.cpu-migrations > 3056601 ą 4% -81.5% 565354 ą 4% perf-stat.ps.dTLB-load-misses > 2.636e+09 -50.7% 1.3e+09 ą 4% perf-stat.ps.dTLB-loads > 1155253 ą 2% -83.0% 196581 ą 5% perf-stat.ps.dTLB-store-misses > 1.473e+09 -57.4% 6.268e+08 ą 3% perf-stat.ps.dTLB-stores > 7997726 -73.3% 2131477 ą 3% perf-stat.ps.iTLB-load-misses > 5521346 -74.3% 1418623 ą 2% perf-stat.ps.iTLB-loads > 1.023e+10 -50.4% 5.073e+09 ą 4% perf-stat.ps.instructions > 75671 -80.9% 14479 ą 4% perf-stat.ps.minor-faults > 5549722 -61.4% 2141750 ą 4% perf-stat.ps.node-loads > 8769156 +51.6% 13296579 ą 5% perf-stat.ps.node-stores > 110795 -82.0% 19977 ą 4% perf-stat.ps.page-faults > 6.482e+11 -50.7% 3.197e+11 ą 4% perf-stat.total.instructions > 0.00 ą 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab > 0.01 ą 18% +8373.1% 0.73 ą 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64 > 0.01 ą 16% +4600.0% 0.38 ą 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit More time spent in madvise and munmap. but I'm not sure whether this is caused by tearing down the address space when exiting the test. If so it should not count in the regression. > 0.01 ą204% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap > 0.01 ą 8% +3678.9% 0.36 ą 79% perf-sched.sch_delay.avg.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64 > 0.01 ą 14% -38.5% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file > 0.01 ą 5% +2946.2% 0.26 ą 43% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm > 0.00 ą 14% +125.0% 0.01 ą 12% perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 0.02 ą170% -83.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.00 ą 69% +6578.6% 0.31 ą 4% perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior > 0.00 +100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap > 0.02 ą 86% +4234.4% 0.65 ą 4% perf-sched.sch_delay.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise > 0.01 ą 6% +6054.3% 0.47 perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range > 0.00 ą 14% +195.2% 0.01 ą 89% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] > 0.00 ą102% +340.0% 0.01 ą 85% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64 > 0.00 +100.0% 0.00 perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep > 0.00 ą 11% +66.7% 0.01 ą 21% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 > 0.01 ą 89% +1096.1% 0.15 ą 30% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi > 0.00 +141.7% 0.01 ą 61% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64 > 0.00 ą223% +9975.0% 0.07 ą203% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select > 0.00 ą 10% +789.3% 0.04 ą 69% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait > 0.00 ą 31% +6691.3% 0.26 ą 5% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise > 0.00 ą 28% +14612.5% 0.59 ą 4% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm > 0.00 ą 24% +4904.2% 0.20 ą 4% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma > 0.00 ą 28% +450.0% 0.01 ą 74% perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap > 0.00 ą 17% +984.6% 0.02 ą 79% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > 0.00 ą 20% +231.8% 0.01 ą 89% perf-sched.sch_delay.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.submit_bio_wait > 0.00 +350.0% 0.01 ą 16% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 0.02 ą 16% +320.2% 0.07 ą 2% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 0.02 ą 2% +282.1% 0.09 ą 5% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 0.00 ą 14% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab > 0.05 ą 35% +3784.5% 1.92 ą 16% perf-sched.sch_delay.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64 > 0.29 ą128% +563.3% 1.92 ą 7% perf-sched.sch_delay.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit > 0.14 ą217% -99.7% 0.00 ą223% perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap > 0.03 ą 49% -74.0% 0.01 ą 51% perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64 > 0.01 ą 54% -57.4% 0.00 ą 75% perf-sched.sch_delay.max.ms.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link > 0.12 ą 21% +873.0% 1.19 ą 60% perf-sched.sch_delay.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64 > 2.27 ą220% -99.7% 0.01 ą 19% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64 > 0.02 ą 36% -54.4% 0.01 ą 55% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 > 0.04 ą 36% -77.1% 0.01 ą 31% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file > 0.12 ą 32% +1235.8% 1.58 ą 31% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm > 2.25 ą218% -99.3% 0.02 ą 52% perf-sched.sch_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.01 ą 85% +19836.4% 2.56 ą 7% perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior > 0.03 ą 70% -93.6% 0.00 ą223% perf-sched.sch_delay.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise > 0.10 ą 16% +2984.2% 3.21 ą 6% perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range > 0.01 ą 20% +883.9% 0.05 ą177% perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] > 0.01 ą 15% +694.7% 0.08 ą123% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 > 0.00 ą223% +6966.7% 0.07 ą199% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select > 0.01 ą 38% +8384.6% 0.55 ą 72% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > 0.01 ą 13% +12995.7% 1.51 ą103% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 117.80 ą 56% -96.4% 4.26 ą 36% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 0.01 ą 68% +331.9% 0.03 perf-sched.total_sch_delay.average.ms > 4.14 +242.6% 14.20 ą 4% perf-sched.total_wait_and_delay.average.ms > 700841 -69.6% 212977 ą 3% perf-sched.total_wait_and_delay.count.ms > 4.14 +242.4% 14.16 ą 4% perf-sched.total_wait_time.average.ms > 11.68 ą 8% +213.3% 36.59 ą 28% perf-sched.wait_and_delay.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file > 10.00 ą 2% +226.1% 32.62 ą 20% perf-sched.wait_and_delay.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close > 10.55 ą 3% +259.8% 37.96 ą 7% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link > 9.80 ą 12% +196.5% 29.07 ą 32% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups > 9.80 ą 4% +234.9% 32.83 ą 14% perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open > 10.32 ą 2% +223.8% 33.42 ą 6% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open > 8.15 ą 14% +271.3% 30.25 ą 35% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64 > 9.60 ą 4% +240.8% 32.73 ą 16% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 > 10.37 ą 4% +232.0% 34.41 ą 10% perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file > 7.32 ą 46% +269.7% 27.07 ą 49% perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin > 9.88 +236.2% 33.23 ą 4% perf-sched.wait_and_delay.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru > 4.44 ą 4% +379.0% 21.27 ą 18% perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 10.05 ą 2% +235.6% 33.73 ą 11% perf-sched.wait_and_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.03 +462.6% 0.15 ą 6% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe > 6.78 ą 4% +482.1% 39.46 ą 3% perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64 > 3.17 +683.3% 24.85 ą 8% perf-sched.wait_and_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex > 36.64 ą 13% +244.7% 126.32 ą 6% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll > 9.81 +302.4% 39.47 ą 4% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64 > 1.05 +48.2% 1.56 perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise > 0.93 +14.2% 1.06 ą 2% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma > 9.93 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread > 12.02 ą 3% +139.8% 28.83 ą 6% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 6.09 ą 2% +403.0% 30.64 ą 5% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 23.17 ą 19% -83.5% 3.83 ą143% perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio > 79.83 ą 9% -55.1% 35.83 ą 16% perf-sched.wait_and_delay.count.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close > 14.83 ą 14% -59.6% 6.00 ą 56% perf-sched.wait_and_delay.count.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64 > 8.50 ą 17% -80.4% 1.67 ą 89% perf-sched.wait_and_delay.count.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link > 114.00 ą 14% -62.4% 42.83 ą 11% perf-sched.wait_and_delay.count.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link > 94.67 ą 7% -48.1% 49.17 ą 13% perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open > 59.83 ą 13% -76.0% 14.33 ą 48% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write > 103.00 ą 12% -48.1% 53.50 ą 20% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open > 19.33 ą 16% -56.0% 8.50 ą 29% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64 > 68.17 ą 11% -39.1% 41.50 ą 19% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file > 36.67 ą 22% -79.1% 7.67 ą 46% perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.perf_poll.do_poll.constprop > 465.50 ą 9% -47.4% 244.83 ą 11% perf-sched.wait_and_delay.count.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru > 14492 ą 3% -96.3% 533.67 ą 10% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 128.67 ą 7% -53.5% 59.83 ą 10% perf-sched.wait_and_delay.count.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe > 7.67 ą 34% -80.4% 1.50 ą107% perf-sched.wait_and_delay.count.__cond_resched.vunmap_p4d_range.__vunmap_range_noflush.remove_vm_area.vfree > 147533 -81.0% 28023 ą 5% perf-sched.wait_and_delay.count.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe > 4394 ą 4% -78.5% 942.83 ą 7% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64 > 228791 -79.3% 47383 ą 4% perf-sched.wait_and_delay.count.futex_wait_queue.__futex_wait.futex_wait.do_futex > 368.50 ą 2% -67.1% 121.33 ą 3% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll > 147506 -81.0% 28010 ą 5% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64 > 5387 ą 6% -16.7% 4488 ą 5% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise > 8303 ą 2% -56.9% 3579 ą 5% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma > 14.67 ą 7% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.ext4_lazyinit_thread.part.0.kthread > 370.50 ą141% +221.9% 1192 ą 5% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 24395 ą 2% -51.2% 11914 ą 6% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 31053 ą 2% -80.5% 6047 ą 5% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 16.41 ą 2% +342.7% 72.65 ą 29% perf-sched.wait_and_delay.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file > 16.49 ą 3% +463.3% 92.90 ą 27% perf-sched.wait_and_delay.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close > 17.32 ą 5% +520.9% 107.52 ą 14% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link > 15.38 ą 6% +325.2% 65.41 ą 22% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups > 16.73 ą 4% +456.2% 93.04 ą 11% perf-sched.wait_and_delay.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open > 17.14 ą 3% +510.6% 104.68 ą 14% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open > 15.70 ą 4% +379.4% 75.25 ą 28% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64 > 15.70 ą 3% +422.1% 81.97 ą 19% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 > 16.38 +528.4% 102.91 ą 21% perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file > 45.20 ą 48% +166.0% 120.23 ą 27% perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin > 17.25 +495.5% 102.71 ą 2% perf-sched.wait_and_delay.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru > 402.57 ą 15% -52.8% 189.90 ą 14% perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 16.96 ą 4% +521.3% 105.40 ą 15% perf-sched.wait_and_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe > 28.45 +517.3% 175.65 ą 14% perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe > 22.49 +628.5% 163.83 ą 16% perf-sched.wait_and_delay.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex > 26.53 ą 30% +326.9% 113.25 ą 16% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64 > 15.54 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread > 1.67 ą141% +284.6% 6.44 ą 4% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 0.07 ą 34% -93.6% 0.00 ą105% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc > 10.21 ą 15% +295.8% 40.43 ą 50% perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe > 3.89 ą 40% -99.8% 0.01 ą113% perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab > 11.67 ą 8% +213.5% 36.58 ą 28% perf-sched.wait_time.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file > 9.98 ą 2% +226.8% 32.61 ą 20% perf-sched.wait_time.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close > 1.03 +71.2% 1.77 ą 20% perf-sched.wait_time.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64 > 0.06 ą 79% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup > 0.05 ą 22% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap > 0.08 ą 82% -98.2% 0.00 ą223% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe > 10.72 ą 10% +166.9% 28.61 ą 29% perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64 > 10.53 ą 3% +260.5% 37.95 ą 7% perf-sched.wait_time.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link > 9.80 ą 12% +196.6% 29.06 ą 32% perf-sched.wait_time.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups > 9.80 ą 4% +235.1% 32.82 ą 14% perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open > 9.50 ą 12% +281.9% 36.27 ą 70% perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write > 10.31 ą 2% +223.9% 33.40 ą 6% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open > 8.04 ą 15% +276.1% 30.25 ą 35% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64 > 9.60 ą 4% +240.9% 32.72 ą 16% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 > 0.06 ą 66% -98.3% 0.00 ą223% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma > 10.36 ą 4% +232.1% 34.41 ą 10% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file > 0.08 ą 50% -95.7% 0.00 ą100% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify > 0.01 ą 49% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range > 0.03 ą 73% -87.4% 0.00 ą145% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone > 8.01 ą 25% +238.0% 27.07 ą 49% perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin > 9.86 +237.0% 33.23 ą 4% perf-sched.wait_time.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru > 4.44 ą 4% +379.2% 21.26 ą 18% perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 10.03 +236.3% 33.73 ą 11% perf-sched.wait_time.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.97 ą 8% -87.8% 0.12 ą221% perf-sched.wait_time.avg.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise > 0.02 ą 13% +1846.8% 0.45 ą 11% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap > 1.01 +64.7% 1.66 perf-sched.wait_time.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise > 0.75 ą 4% +852.1% 7.10 ą 5% perf-sched.wait_time.avg.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] > 0.03 +462.6% 0.15 ą 6% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.24 ą 4% +25.3% 0.30 ą 8% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 > 1.98 ą 15% +595.7% 13.80 ą 90% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt > 2.78 ą 14% +444.7% 15.12 ą 16% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function > 6.77 ą 4% +483.0% 39.44 ą 3% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64 > 3.17 +684.7% 24.85 ą 8% perf-sched.wait_time.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex > 36.64 ą 13% +244.7% 126.32 ą 6% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll > 9.79 +303.0% 39.45 ą 4% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64 > 1.05 +23.8% 1.30 perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise > 0.86 +101.2% 1.73 ą 3% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm > 0.11 ą 21% +438.9% 0.61 ą 15% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap > 0.32 ą 4% +28.5% 0.41 ą 13% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > 12.00 ą 3% +139.6% 28.76 ą 6% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 6.07 ą 2% +403.5% 30.56 ą 5% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 0.38 ą 41% -98.8% 0.00 ą105% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc > 0.36 ą 34% -84.3% 0.06 ą200% perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.vma_alloc_folio.do_anonymous_page > 0.36 ą 51% -92.9% 0.03 ą114% perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault > 15.98 ą 5% +361.7% 73.80 ą 23% perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.51 ą 14% -92.8% 0.04 ą196% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.__vmalloc_area_node.__vmalloc_node_range > 8.56 ą 11% -99.9% 0.01 ą126% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab > 0.43 ą 32% -68.2% 0.14 ą119% perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_node_trace.__get_vm_area_node.__vmalloc_node_range > 0.46 ą 20% -89.3% 0.05 ą184% perf-sched.wait_time.max.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct > 16.40 ą 2% +342.9% 72.65 ą 29% perf-sched.wait_time.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file > 0.31 ą 63% -76.2% 0.07 ą169% perf-sched.wait_time.max.ms.__cond_resched.cgroup_css_set_fork.cgroup_can_fork.copy_process.kernel_clone > 0.14 ą 93% +258.7% 0.49 ą 14% perf-sched.wait_time.max.ms.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault > 16.49 ą 3% +463.5% 92.89 ą 27% perf-sched.wait_time.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close > 1.09 +171.0% 2.96 ą 10% perf-sched.wait_time.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64 > 1.16 ą 7% +155.1% 2.97 ą 4% perf-sched.wait_time.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit > 0.19 ą 78% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup > 0.33 ą 35% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap > 0.20 ą101% -99.3% 0.00 ą223% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe > 17.31 ą 5% +521.0% 107.51 ą 14% perf-sched.wait_time.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link > 15.38 ą 6% +325.3% 65.40 ą 22% perf-sched.wait_time.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups > 16.72 ą 4% +456.6% 93.04 ą 11% perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open > 1.16 ą 2% +88.7% 2.20 ą 33% perf-sched.wait_time.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64 > 53.96 ą 32% +444.0% 293.53 ą109% perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write > 17.13 ą 2% +511.2% 104.68 ą 14% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open > 15.69 ą 4% +379.5% 75.25 ą 28% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64 > 15.70 ą 3% +422.2% 81.97 ą 19% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0 > 0.27 ą 80% -99.6% 0.00 ą223% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma > 16.37 +528.6% 102.90 ą 21% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file > 0.44 ą 33% -99.1% 0.00 ą104% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify > 0.02 ą 83% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range > 0.08 ą 83% -95.4% 0.00 ą147% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone > 1.16 ą 2% +134.7% 2.72 ą 19% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm > 49.88 ą 25% +141.0% 120.23 ą 27% perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin > 17.24 +495.7% 102.70 ą 2% perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru > 402.56 ą 15% -52.8% 189.89 ą 14% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 16.96 ą 4% +521.4% 105.39 ą 15% perf-sched.wait_time.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe > 1.06 +241.7% 3.61 ą 4% perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior > 1.07 -88.9% 0.12 ą221% perf-sched.wait_time.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise > 0.28 ą 27% +499.0% 1.67 ą 18% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap > 1.21 ą 2% +207.2% 3.71 ą 3% perf-sched.wait_time.max.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise > 13.43 ą 26% +38.8% 18.64 perf-sched.wait_time.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] > 28.45 +517.3% 175.65 ą 14% perf-sched.wait_time.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe > 0.79 ą 10% +62.2% 1.28 ą 25% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 > 13.22 ą 2% +317.2% 55.16 ą 35% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function > 834.29 ą 28% -48.5% 429.53 ą 94% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi > 22.48 +628.6% 163.83 ą 16% perf-sched.wait_time.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex > 22.74 ą 18% +398.0% 113.25 ą 16% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64 > 7.72 ą 7% +80.6% 13.95 ą 2% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap > 0.74 ą 4% +77.2% 1.31 ą 32% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone > 5.01 +14.1% 5.72 ą 2% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread > 44.98 -19.7 25.32 ą 2% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify > 43.21 -19.6 23.65 ą 3% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify > 43.21 -19.6 23.65 ą 3% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify > 43.18 -19.5 23.63 ą 3% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify > 40.30 -17.5 22.75 ą 3% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify > 41.10 -17.4 23.66 ą 2% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry > 39.55 -17.3 22.24 ą 3% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary > 24.76 ą 2% -8.5 16.23 ą 3% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle > 8.68 ą 4% -6.5 2.22 ą 6% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call > 7.23 ą 4% -5.8 1.46 ą 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe > 7.23 ą 4% -5.8 1.46 ą 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe > 7.11 ą 4% -5.7 1.39 ą 7% perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe > 7.09 ą 4% -5.7 1.39 ą 7% perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe > 6.59 ą 3% -5.1 1.47 ą 7% perf-profile.calltrace.cycles-pp.ret_from_fork_asm > 6.59 ą 3% -5.1 1.47 ą 7% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm > 6.59 ą 3% -5.1 1.47 ą 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm > 5.76 ą 2% -5.0 0.80 ą 9% perf-profile.calltrace.cycles-pp.start_thread > 7.43 ą 2% -4.9 2.52 ą 7% perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle > 5.51 ą 3% -4.8 0.70 ą 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.start_thread > 5.50 ą 3% -4.8 0.70 ą 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread > 5.48 ą 3% -4.8 0.69 ą 7% perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread > 5.42 ą 3% -4.7 0.69 ą 7% perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread > 5.90 ą 5% -3.9 2.01 ą 4% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise > 4.18 ą 5% -3.8 0.37 ą 71% perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe > 5.76 ą 5% -3.8 1.98 ą 4% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior > 5.04 ą 7% -3.7 1.32 ą 9% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__clone > 5.03 ą 7% -3.7 1.32 ą 9% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone > 5.02 ą 7% -3.7 1.32 ą 9% perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone > 5.02 ą 7% -3.7 1.32 ą 9% perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone > 5.62 ą 5% -3.7 1.96 ą 3% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single > 4.03 ą 4% -3.1 0.92 ą 7% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 6.03 ą 5% -3.1 2.94 ą 3% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise > 3.43 ą 5% -2.8 0.67 ą 13% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm > 3.43 ą 5% -2.8 0.67 ą 13% perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork > 3.41 ą 5% -2.7 0.66 ą 13% perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread > 3.40 ą 5% -2.7 0.66 ą 13% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn > 3.67 ą 7% -2.7 0.94 ą 10% perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe > 2.92 ą 7% -2.4 0.50 ą 46% perf-profile.calltrace.cycles-pp.stress_pthread > 2.54 ą 6% -2.2 0.38 ą 70% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 2.46 ą 6% -1.8 0.63 ą 10% perf-profile.calltrace.cycles-pp.dup_task_struct.copy_process.kernel_clone.__do_sys_clone.do_syscall_64 > 3.00 ą 6% -1.6 1.43 ą 7% perf-profile.calltrace.cycles-pp.__munmap > 2.96 ą 6% -1.5 1.42 ą 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap > 2.96 ą 6% -1.5 1.42 ą 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 2.95 ą 6% -1.5 1.41 ą 7% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 2.95 ą 6% -1.5 1.41 ą 7% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 2.02 ą 4% -1.5 0.52 ą 46% perf-profile.calltrace.cycles-pp.__lll_lock_wait > 1.78 ą 3% -1.5 0.30 ą100% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__lll_lock_wait > 1.77 ą 3% -1.5 0.30 ą100% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__lll_lock_wait > 1.54 ą 6% -1.3 0.26 ą100% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify > 2.54 ą 6% -1.2 1.38 ą 6% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe > 2.51 ą 6% -1.1 1.37 ą 7% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 > 1.13 -0.7 0.40 ą 70% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe > 1.15 ą 5% -0.7 0.46 ą 45% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu > 1.58 ą 5% -0.6 0.94 ą 7% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap > 0.99 ą 5% -0.5 0.51 ą 45% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state > 1.01 ą 5% -0.5 0.54 ą 45% perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter > 0.82 ą 4% -0.2 0.59 ą 5% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu > 0.00 +0.5 0.54 ą 5% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function > 0.00 +0.6 0.60 ą 5% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior > 0.00 +0.6 0.61 ą 6% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap > 0.00 +0.6 0.62 ą 6% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap > 0.53 ą 5% +0.6 1.17 ą 13% perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt > 1.94 ą 2% +0.7 2.64 ą 9% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call > 0.00 +0.7 0.73 ą 5% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range > 0.00 +0.8 0.75 ą 20% perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault > 2.02 ą 2% +0.8 2.85 ą 9% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle > 0.74 ą 5% +0.8 1.57 ą 11% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt > 0.00 +0.9 0.90 ą 4% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise > 0.00 +0.9 0.92 ą 13% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues > 0.86 ą 4% +1.0 1.82 ą 10% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state > 0.86 ą 4% +1.0 1.83 ą 10% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter > 0.00 +1.0 0.98 ą 7% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked > 0.09 ą223% +1.0 1.07 ą 11% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt > 0.00 +1.0 0.99 ą 6% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd > 0.00 +1.0 1.00 ą 7% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range > 0.09 ą223% +1.0 1.10 ą 12% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt > 0.00 +1.0 1.01 ą 6% perf-profile.calltrace.cycles-pp.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range > 0.00 +1.1 1.10 ą 5% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath > 0.00 +1.1 1.12 ą 5% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock > 0.00 +1.2 1.23 ą 4% perf-profile.calltrace.cycles-pp.page_add_anon_rmap.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range > 0.00 +1.3 1.32 ą 4% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd > 0.00 +1.4 1.38 ą 5% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range > 0.00 +2.4 2.44 ą 10% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range > 0.00 +3.1 3.10 ą 5% perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single > 0.00 +3.5 3.52 ą 5% perf-profile.calltrace.cycles-pp.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single > 0.88 ą 4% +3.8 4.69 ą 4% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior > 6.30 ą 6% +13.5 19.85 ą 7% perf-profile.calltrace.cycles-pp.__clone > 0.00 +16.7 16.69 ą 7% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault > 1.19 ą 29% +17.1 18.32 ą 7% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault > 0.00 +17.6 17.56 ą 7% perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault > 0.63 ą 7% +17.7 18.35 ą 7% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__clone > 0.59 ą 5% +17.8 18.34 ą 7% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.__clone > 0.59 ą 5% +17.8 18.34 ą 7% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone > 0.00 +17.9 17.90 ą 7% perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault > 0.36 ą 71% +18.0 18.33 ą 7% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone > 0.00 +32.0 32.03 ą 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range > 0.00 +32.6 32.62 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single > 0.00 +36.2 36.19 ą 2% perf-profile.calltrace.cycles-pp.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior > 7.97 ą 4% +36.6 44.52 ą 2% perf-profile.calltrace.cycles-pp.__madvise > 7.91 ą 4% +36.6 44.46 ą 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise > 7.90 ą 4% +36.6 44.46 ą 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise > 7.87 ą 4% +36.6 44.44 ą 2% perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise > 7.86 ą 4% +36.6 44.44 ą 2% perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise > 7.32 ą 4% +36.8 44.07 ą 2% perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe > 7.25 ą 4% +36.8 44.06 ą 2% perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64 > 1.04 ą 4% +40.0 41.08 ą 2% perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise > 1.00 ą 3% +40.1 41.06 ą 2% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise > 44.98 -19.7 25.32 ą 2% perf-profile.children.cycles-pp.secondary_startup_64_no_verify > 44.98 -19.7 25.32 ą 2% perf-profile.children.cycles-pp.cpu_startup_entry > 44.96 -19.6 25.31 ą 2% perf-profile.children.cycles-pp.do_idle > 43.21 -19.6 23.65 ą 3% perf-profile.children.cycles-pp.start_secondary > 41.98 -17.6 24.40 ą 2% perf-profile.children.cycles-pp.cpuidle_idle_call > 41.21 -17.3 23.86 ą 2% perf-profile.children.cycles-pp.cpuidle_enter > 41.20 -17.3 23.86 ą 2% perf-profile.children.cycles-pp.cpuidle_enter_state > 12.69 ą 3% -10.6 2.12 ą 6% perf-profile.children.cycles-pp.do_exit > 12.60 ą 3% -10.5 2.08 ą 7% perf-profile.children.cycles-pp.__x64_sys_exit > 24.76 ą 2% -8.5 16.31 ą 2% perf-profile.children.cycles-pp.intel_idle > 12.34 ą 2% -8.4 3.90 ą 5% perf-profile.children.cycles-pp.intel_idle_irq > 6.96 ą 4% -5.4 1.58 ą 7% perf-profile.children.cycles-pp.ret_from_fork_asm > 6.69 ą 4% -5.2 1.51 ą 7% perf-profile.children.cycles-pp.ret_from_fork > 6.59 ą 3% -5.1 1.47 ą 7% perf-profile.children.cycles-pp.kthread > 5.78 ą 2% -5.0 0.80 ą 8% perf-profile.children.cycles-pp.start_thread > 4.68 ą 4% -4.5 0.22 ą 10% perf-profile.children.cycles-pp._raw_spin_lock_irq > 5.03 ą 7% -3.7 1.32 ą 9% perf-profile.children.cycles-pp.__do_sys_clone > 5.02 ą 7% -3.7 1.32 ą 9% perf-profile.children.cycles-pp.kernel_clone > 4.20 ą 5% -3.7 0.53 ą 9% perf-profile.children.cycles-pp.exit_notify > 4.67 ą 5% -3.6 1.10 ą 9% perf-profile.children.cycles-pp.rcu_core > 4.60 ą 4% -3.5 1.06 ą 10% perf-profile.children.cycles-pp.rcu_do_batch > 4.89 ą 5% -3.4 1.44 ą 11% perf-profile.children.cycles-pp.__do_softirq > 5.64 ą 3% -3.2 2.39 ą 6% perf-profile.children.cycles-pp.__schedule > 6.27 ą 5% -3.2 3.03 ą 4% perf-profile.children.cycles-pp.flush_tlb_mm_range > 4.03 ą 4% -3.1 0.92 ą 7% perf-profile.children.cycles-pp.smpboot_thread_fn > 6.68 ą 4% -3.1 3.61 ą 3% perf-profile.children.cycles-pp.tlb_finish_mmu > 6.04 ą 5% -3.1 2.99 ą 4% perf-profile.children.cycles-pp.on_each_cpu_cond_mask > 6.04 ą 5% -3.0 2.99 ą 4% perf-profile.children.cycles-pp.smp_call_function_many_cond > 3.77 ą 2% -3.0 0.73 ą 16% perf-profile.children.cycles-pp._raw_spin_lock_irqsave > 7.78 -3.0 4.77 ą 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt > 3.43 ą 5% -2.8 0.67 ą 13% perf-profile.children.cycles-pp.run_ksoftirqd > 3.67 ą 7% -2.7 0.94 ą 10% perf-profile.children.cycles-pp.copy_process > 2.80 ą 6% -2.5 0.34 ą 15% perf-profile.children.cycles-pp.queued_write_lock_slowpath > 3.41 ą 2% -2.5 0.96 ą 16% perf-profile.children.cycles-pp.do_futex > 3.06 ą 5% -2.4 0.68 ą 16% perf-profile.children.cycles-pp.free_unref_page_commit > 3.02 ą 5% -2.4 0.67 ą 16% perf-profile.children.cycles-pp.free_pcppages_bulk > 2.92 ą 7% -2.3 0.58 ą 14% perf-profile.children.cycles-pp.stress_pthread > 3.22 ą 3% -2.3 0.90 ą 18% perf-profile.children.cycles-pp.__x64_sys_futex > 2.52 ą 5% -2.2 0.35 ą 7% perf-profile.children.cycles-pp.release_task > 2.54 ą 6% -2.0 0.53 ą 10% perf-profile.children.cycles-pp.worker_thread > 3.12 ą 5% -1.9 1.17 ą 11% perf-profile.children.cycles-pp.free_unref_page > 2.31 ą 6% -1.9 0.45 ą 11% perf-profile.children.cycles-pp.process_one_work > 2.47 ą 6% -1.8 0.63 ą 10% perf-profile.children.cycles-pp.dup_task_struct > 2.19 ą 5% -1.8 0.41 ą 12% perf-profile.children.cycles-pp.delayed_vfree_work > 2.14 ą 5% -1.7 0.40 ą 11% perf-profile.children.cycles-pp.vfree > 3.19 ą 2% -1.6 1.58 ą 8% perf-profile.children.cycles-pp.schedule > 2.06 ą 3% -1.6 0.46 ą 7% perf-profile.children.cycles-pp.__sigtimedwait > 3.02 ą 6% -1.6 1.44 ą 7% perf-profile.children.cycles-pp.__munmap > 1.94 ą 4% -1.6 0.39 ą 14% perf-profile.children.cycles-pp.__unfreeze_partials > 2.95 ą 6% -1.5 1.41 ą 7% perf-profile.children.cycles-pp.__x64_sys_munmap > 2.95 ą 6% -1.5 1.41 ą 7% perf-profile.children.cycles-pp.__vm_munmap > 2.14 ą 3% -1.5 0.60 ą 21% perf-profile.children.cycles-pp.futex_wait > 2.08 ą 4% -1.5 0.60 ą 19% perf-profile.children.cycles-pp.__lll_lock_wait > 2.04 ą 3% -1.5 0.56 ą 20% perf-profile.children.cycles-pp.__futex_wait > 1.77 ą 5% -1.5 0.32 ą 10% perf-profile.children.cycles-pp.remove_vm_area > 1.86 ą 5% -1.4 0.46 ą 10% perf-profile.children.cycles-pp.open64 > 1.74 ą 4% -1.4 0.37 ą 7% perf-profile.children.cycles-pp.__x64_sys_rt_sigtimedwait > 1.71 ą 4% -1.4 0.36 ą 8% perf-profile.children.cycles-pp.do_sigtimedwait > 1.79 ą 5% -1.3 0.46 ą 9% perf-profile.children.cycles-pp.__x64_sys_openat > 1.78 ą 5% -1.3 0.46 ą 8% perf-profile.children.cycles-pp.do_sys_openat2 > 1.61 ą 4% -1.3 0.32 ą 12% perf-profile.children.cycles-pp.poll_idle > 1.65 ą 9% -1.3 0.37 ą 14% perf-profile.children.cycles-pp.pthread_create@@GLIBC_2.2.5 > 1.56 ą 8% -1.2 0.35 ą 7% perf-profile.children.cycles-pp.alloc_thread_stack_node > 2.32 ą 3% -1.2 1.13 ą 8% perf-profile.children.cycles-pp.pick_next_task_fair > 2.59 ą 6% -1.2 1.40 ą 7% perf-profile.children.cycles-pp.do_vmi_munmap > 1.55 ą 4% -1.2 0.40 ą 19% perf-profile.children.cycles-pp.futex_wait_queue > 1.37 ą 5% -1.1 0.22 ą 12% perf-profile.children.cycles-pp.find_unlink_vmap_area > 2.52 ą 6% -1.1 1.38 ą 6% perf-profile.children.cycles-pp.do_vmi_align_munmap > 1.53 ą 5% -1.1 0.39 ą 8% perf-profile.children.cycles-pp.do_filp_open > 1.52 ą 5% -1.1 0.39 ą 7% perf-profile.children.cycles-pp.path_openat > 1.25 ą 3% -1.1 0.14 ą 12% perf-profile.children.cycles-pp.sigpending > 1.58 ą 5% -1.1 0.50 ą 6% perf-profile.children.cycles-pp.schedule_idle > 1.29 ą 5% -1.1 0.21 ą 21% perf-profile.children.cycles-pp.__mprotect > 1.40 ą 8% -1.1 0.32 ą 4% perf-profile.children.cycles-pp.__vmalloc_node_range > 2.06 ą 3% -1.0 1.02 ą 9% perf-profile.children.cycles-pp.newidle_balance > 1.04 ą 3% -1.0 0.08 ą 23% perf-profile.children.cycles-pp.__x64_sys_rt_sigpending > 1.14 ą 6% -1.0 0.18 ą 18% perf-profile.children.cycles-pp.__x64_sys_mprotect > 1.13 ą 6% -1.0 0.18 ą 17% perf-profile.children.cycles-pp.do_mprotect_pkey > 1.30 ą 7% -0.9 0.36 ą 10% perf-profile.children.cycles-pp.wake_up_new_task > 1.14 ą 9% -0.9 0.22 ą 16% perf-profile.children.cycles-pp.do_anonymous_page > 0.95 ą 3% -0.9 0.04 ą 71% perf-profile.children.cycles-pp.do_sigpending > 1.24 ą 3% -0.9 0.34 ą 9% perf-profile.children.cycles-pp.futex_wake > 1.02 ą 6% -0.9 0.14 ą 15% perf-profile.children.cycles-pp.mprotect_fixup > 1.91 ą 2% -0.9 1.06 ą 9% perf-profile.children.cycles-pp.load_balance > 1.38 ą 5% -0.8 0.53 ą 6% perf-profile.children.cycles-pp.select_task_rq_fair > 1.14 ą 4% -0.8 0.31 ą 12% perf-profile.children.cycles-pp.__pthread_mutex_unlock_usercnt > 2.68 ą 3% -0.8 1.91 ą 6% perf-profile.children.cycles-pp.__flush_smp_call_function_queue > 1.00 ą 4% -0.7 0.26 ą 10% perf-profile.children.cycles-pp.flush_smp_call_function_queue > 1.44 ą 3% -0.7 0.73 ą 10% perf-profile.children.cycles-pp.find_busiest_group > 0.81 ą 6% -0.7 0.10 ą 18% perf-profile.children.cycles-pp.vma_modify > 1.29 ą 3% -0.7 0.60 ą 8% perf-profile.children.cycles-pp.exit_mm > 1.40 ą 3% -0.7 0.71 ą 10% perf-profile.children.cycles-pp.update_sd_lb_stats > 0.78 ą 7% -0.7 0.10 ą 19% perf-profile.children.cycles-pp.__split_vma > 0.90 ą 8% -0.7 0.22 ą 10% perf-profile.children.cycles-pp.__vmalloc_area_node > 0.75 ą 4% -0.7 0.10 ą 5% perf-profile.children.cycles-pp.__exit_signal > 1.49 ą 2% -0.7 0.84 ą 7% perf-profile.children.cycles-pp.try_to_wake_up > 0.89 ą 7% -0.6 0.24 ą 10% perf-profile.children.cycles-pp.find_idlest_cpu > 1.59 ą 5% -0.6 0.95 ą 7% perf-profile.children.cycles-pp.unmap_region > 0.86 ą 3% -0.6 0.22 ą 26% perf-profile.children.cycles-pp.pthread_cond_timedwait@@GLIBC_2.3.2 > 1.59 ą 3% -0.6 0.95 ą 9% perf-profile.children.cycles-pp.irq_exit_rcu > 1.24 ą 3% -0.6 0.61 ą 10% perf-profile.children.cycles-pp.update_sg_lb_stats > 0.94 ą 5% -0.6 0.32 ą 11% perf-profile.children.cycles-pp.do_task_dead > 0.87 ą 3% -0.6 0.25 ą 19% perf-profile.children.cycles-pp.perf_iterate_sb > 0.82 ą 4% -0.6 0.22 ą 10% perf-profile.children.cycles-pp.sched_ttwu_pending > 1.14 ą 3% -0.6 0.54 ą 10% perf-profile.children.cycles-pp.activate_task > 0.84 -0.6 0.25 ą 10% perf-profile.children.cycles-pp.syscall_exit_to_user_mode > 0.81 ą 6% -0.6 0.22 ą 11% perf-profile.children.cycles-pp.find_idlest_group > 0.75 ą 5% -0.6 0.18 ą 14% perf-profile.children.cycles-pp.step_into > 0.74 ą 8% -0.6 0.18 ą 14% perf-profile.children.cycles-pp.__alloc_pages_bulk > 0.74 ą 6% -0.5 0.19 ą 11% perf-profile.children.cycles-pp.update_sg_wakeup_stats > 0.72 ą 5% -0.5 0.18 ą 15% perf-profile.children.cycles-pp.pick_link > 1.06 ą 2% -0.5 0.52 ą 9% perf-profile.children.cycles-pp.enqueue_task_fair > 0.77 ą 6% -0.5 0.23 ą 12% perf-profile.children.cycles-pp.unmap_vmas > 0.76 ą 2% -0.5 0.22 ą 8% perf-profile.children.cycles-pp.exit_to_user_mode_prepare > 0.94 ą 2% -0.5 0.42 ą 10% perf-profile.children.cycles-pp.dequeue_task_fair > 0.65 ą 5% -0.5 0.15 ą 18% perf-profile.children.cycles-pp.open_last_lookups > 1.37 ą 3% -0.5 0.87 ą 4% perf-profile.children.cycles-pp.llist_add_batch > 0.70 ą 4% -0.5 0.22 ą 19% perf-profile.children.cycles-pp.memcpy_orig > 0.91 ą 4% -0.5 0.44 ą 7% perf-profile.children.cycles-pp.update_load_avg > 0.67 -0.5 0.20 ą 8% perf-profile.children.cycles-pp.switch_fpu_return > 0.88 ą 3% -0.5 0.42 ą 8% perf-profile.children.cycles-pp.enqueue_entity > 0.91 ą 4% -0.5 0.45 ą 12% perf-profile.children.cycles-pp.ttwu_do_activate > 0.77 ą 4% -0.5 0.32 ą 10% perf-profile.children.cycles-pp.schedule_hrtimeout_range_clock > 0.63 ą 5% -0.4 0.20 ą 21% perf-profile.children.cycles-pp.arch_dup_task_struct > 0.74 ą 3% -0.4 0.32 ą 15% perf-profile.children.cycles-pp.dequeue_entity > 0.62 ą 5% -0.4 0.21 ą 5% perf-profile.children.cycles-pp.finish_task_switch > 0.56 -0.4 0.16 ą 7% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate > 0.53 ą 4% -0.4 0.13 ą 9% perf-profile.children.cycles-pp.syscall > 0.50 ą 9% -0.4 0.11 ą 18% perf-profile.children.cycles-pp.__get_vm_area_node > 0.51 ą 3% -0.4 0.12 ą 12% perf-profile.children.cycles-pp.__slab_free > 0.52 ą 2% -0.4 0.14 ą 10% perf-profile.children.cycles-pp.kmem_cache_free > 0.75 ą 3% -0.4 0.37 ą 9% perf-profile.children.cycles-pp.exit_mm_release > 0.50 ą 6% -0.4 0.12 ą 21% perf-profile.children.cycles-pp.do_send_specific > 0.74 ą 3% -0.4 0.37 ą 8% perf-profile.children.cycles-pp.futex_exit_release > 0.45 ą 10% -0.4 0.09 ą 17% perf-profile.children.cycles-pp.alloc_vmap_area > 0.47 ą 3% -0.4 0.11 ą 20% perf-profile.children.cycles-pp.tgkill > 0.68 ą 11% -0.4 0.32 ą 12% perf-profile.children.cycles-pp.__mmap > 0.48 ą 3% -0.4 0.13 ą 6% perf-profile.children.cycles-pp.entry_SYSCALL_64 > 0.76 ą 5% -0.3 0.41 ą 10% perf-profile.children.cycles-pp.wake_up_q > 0.42 ą 7% -0.3 0.08 ą 22% perf-profile.children.cycles-pp.__close > 0.49 ą 7% -0.3 0.14 ą 25% perf-profile.children.cycles-pp.kmem_cache_alloc > 0.49 ą 9% -0.3 0.15 ą 14% perf-profile.children.cycles-pp.mas_store_gfp > 0.46 ą 4% -0.3 0.12 ą 23% perf-profile.children.cycles-pp.perf_event_task_output > 0.44 ą 10% -0.3 0.10 ą 28% perf-profile.children.cycles-pp.pthread_sigqueue > 0.46 ą 4% -0.3 0.12 ą 15% perf-profile.children.cycles-pp.link_path_walk > 0.42 ą 8% -0.3 0.10 ą 20% perf-profile.children.cycles-pp.proc_ns_get_link > 0.63 ą 10% -0.3 0.32 ą 12% perf-profile.children.cycles-pp.vm_mmap_pgoff > 0.45 ą 4% -0.3 0.14 ą 13% perf-profile.children.cycles-pp.sched_move_task > 0.36 ą 8% -0.3 0.06 ą 49% perf-profile.children.cycles-pp.__x64_sys_close > 0.46 ą 8% -0.3 0.17 ą 14% perf-profile.children.cycles-pp.prctl > 0.65 ą 3% -0.3 0.35 ą 7% perf-profile.children.cycles-pp.futex_cleanup > 0.42 ą 7% -0.3 0.12 ą 15% perf-profile.children.cycles-pp.mas_store_prealloc > 0.49 ą 5% -0.3 0.20 ą 13% perf-profile.children.cycles-pp.__rmqueue_pcplist > 0.37 ą 7% -0.3 0.08 ą 16% perf-profile.children.cycles-pp.do_tkill > 0.36 ą 10% -0.3 0.08 ą 20% perf-profile.children.cycles-pp.ns_get_path > 0.37 ą 4% -0.3 0.09 ą 18% perf-profile.children.cycles-pp.setns > 0.67 ą 3% -0.3 0.41 ą 8% perf-profile.children.cycles-pp.hrtimer_wakeup > 0.35 ą 5% -0.3 0.10 ą 16% perf-profile.children.cycles-pp.__task_pid_nr_ns > 0.41 ą 5% -0.3 0.16 ą 12% perf-profile.children.cycles-pp.mas_wr_bnode > 0.35 ą 4% -0.3 0.10 ą 20% perf-profile.children.cycles-pp.rcu_cblist_dequeue > 0.37 ą 5% -0.2 0.12 ą 17% perf-profile.children.cycles-pp.exit_task_stack_account > 0.56 ą 4% -0.2 0.31 ą 12% perf-profile.children.cycles-pp.select_task_rq > 0.29 ą 6% -0.2 0.05 ą 46% perf-profile.children.cycles-pp.mas_wr_store_entry > 0.34 ą 4% -0.2 0.10 ą 27% perf-profile.children.cycles-pp.perf_event_task > 0.39 ą 9% -0.2 0.15 ą 12% perf-profile.children.cycles-pp.__switch_to_asm > 0.35 ą 5% -0.2 0.11 ą 11% perf-profile.children.cycles-pp.account_kernel_stack > 0.30 ą 7% -0.2 0.06 ą 48% perf-profile.children.cycles-pp.__ns_get_path > 0.31 ą 9% -0.2 0.07 ą 17% perf-profile.children.cycles-pp.free_vmap_area_noflush > 0.31 ą 5% -0.2 0.08 ą 19% perf-profile.children.cycles-pp.__do_sys_setns > 0.33 ą 7% -0.2 0.10 ą 7% perf-profile.children.cycles-pp.__free_one_page > 0.31 ą 11% -0.2 0.08 ą 13% perf-profile.children.cycles-pp.__pte_alloc > 0.36 ą 6% -0.2 0.13 ą 12% perf-profile.children.cycles-pp.switch_mm_irqs_off > 0.27 ą 12% -0.2 0.05 ą 71% perf-profile.children.cycles-pp.__fput > 0.53 ą 9% -0.2 0.31 ą 12% perf-profile.children.cycles-pp.do_mmap > 0.27 ą 12% -0.2 0.05 ą 77% perf-profile.children.cycles-pp.__x64_sys_rt_tgsigqueueinfo > 0.28 ą 5% -0.2 0.06 ą 50% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack > 0.34 ą 10% -0.2 0.12 ą 29% perf-profile.children.cycles-pp.futex_wait_setup > 0.27 ą 6% -0.2 0.06 ą 45% perf-profile.children.cycles-pp.__x64_sys_tgkill > 0.31 ą 7% -0.2 0.11 ą 18% perf-profile.children.cycles-pp.__switch_to > 0.26 ą 8% -0.2 0.06 ą 21% perf-profile.children.cycles-pp.__call_rcu_common > 0.33 ą 9% -0.2 0.13 ą 18% perf-profile.children.cycles-pp.__do_sys_prctl > 0.28 ą 5% -0.2 0.08 ą 17% perf-profile.children.cycles-pp.mm_release > 0.52 ą 2% -0.2 0.32 ą 9% perf-profile.children.cycles-pp.__get_user_8 > 0.24 ą 10% -0.2 0.04 ą 72% perf-profile.children.cycles-pp.dput > 0.25 ą 14% -0.2 0.05 ą 46% perf-profile.children.cycles-pp.perf_event_mmap > 0.24 ą 7% -0.2 0.06 ą 50% perf-profile.children.cycles-pp.mas_walk > 0.28 ą 6% -0.2 0.10 ą 24% perf-profile.children.cycles-pp.rmqueue_bulk > 0.23 ą 15% -0.2 0.05 ą 46% perf-profile.children.cycles-pp.perf_event_mmap_event > 0.25 ą 15% -0.2 0.08 ą 45% perf-profile.children.cycles-pp.___slab_alloc > 0.20 ą 14% -0.2 0.03 ą100% perf-profile.children.cycles-pp.lookup_fast > 0.20 ą 10% -0.2 0.04 ą 75% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook > 0.28 ą 7% -0.2 0.12 ą 24% perf-profile.children.cycles-pp.prepare_task_switch > 0.22 ą 11% -0.2 0.05 ą 8% perf-profile.children.cycles-pp.ttwu_queue_wakelist > 0.63 ą 5% -0.2 0.47 ą 12% perf-profile.children.cycles-pp.llist_reverse_order > 0.25 ą 11% -0.2 0.09 ą 34% perf-profile.children.cycles-pp.futex_q_lock > 0.21 ą 6% -0.2 0.06 ą 47% perf-profile.children.cycles-pp.kmem_cache_alloc_node > 0.18 ą 11% -0.2 0.03 ą100% perf-profile.children.cycles-pp.alloc_empty_file > 0.19 ą 5% -0.2 0.04 ą 71% perf-profile.children.cycles-pp.__put_task_struct > 0.19 ą 15% -0.2 0.03 ą 70% perf-profile.children.cycles-pp.asm_sysvec_call_function_single > 0.24 ą 6% -0.2 0.09 ą 20% perf-profile.children.cycles-pp.___perf_sw_event > 0.18 ą 7% -0.2 0.03 ą100% perf-profile.children.cycles-pp.perf_event_fork > 0.19 ą 11% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.select_idle_core > 0.30 ą 11% -0.1 0.15 ą 7% perf-profile.children.cycles-pp.pte_alloc_one > 0.25 ą 6% -0.1 0.11 ą 10% perf-profile.children.cycles-pp.set_next_entity > 0.20 ą 10% -0.1 0.06 ą 49% perf-profile.children.cycles-pp.__perf_event_header__init_id > 0.18 ą 15% -0.1 0.03 ą101% perf-profile.children.cycles-pp.__radix_tree_lookup > 0.22 ą 11% -0.1 0.08 ą 21% perf-profile.children.cycles-pp.mas_spanning_rebalance > 0.20 ą 9% -0.1 0.06 ą 9% perf-profile.children.cycles-pp.stress_pthread_func > 0.18 ą 12% -0.1 0.04 ą 73% perf-profile.children.cycles-pp.__getpid > 0.16 ą 13% -0.1 0.02 ą 99% perf-profile.children.cycles-pp.walk_component > 0.28 ą 5% -0.1 0.15 ą 13% perf-profile.children.cycles-pp.update_curr > 0.25 ą 5% -0.1 0.11 ą 22% perf-profile.children.cycles-pp.balance_fair > 0.16 ą 9% -0.1 0.03 ą100% perf-profile.children.cycles-pp.futex_wake_mark > 0.16 ą 12% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.get_futex_key > 0.17 ą 6% -0.1 0.05 ą 47% perf-profile.children.cycles-pp.memcg_account_kmem > 0.25 ą 11% -0.1 0.12 ą 11% perf-profile.children.cycles-pp._find_next_bit > 0.15 ą 13% -0.1 0.02 ą 99% perf-profile.children.cycles-pp.do_open > 0.20 ą 8% -0.1 0.08 ą 16% perf-profile.children.cycles-pp.mas_rebalance > 0.17 ą 13% -0.1 0.05 ą 45% perf-profile.children.cycles-pp.__memcg_kmem_charge_page > 0.33 ą 6% -0.1 0.21 ą 10% perf-profile.children.cycles-pp.select_idle_sibling > 0.14 ą 11% -0.1 0.03 ą100% perf-profile.children.cycles-pp.get_user_pages_fast > 0.18 ą 7% -0.1 0.07 ą 14% perf-profile.children.cycles-pp.mas_alloc_nodes > 0.14 ą 11% -0.1 0.03 ą101% perf-profile.children.cycles-pp.set_task_cpu > 0.14 ą 12% -0.1 0.03 ą101% perf-profile.children.cycles-pp.vm_unmapped_area > 0.38 ą 6% -0.1 0.27 ą 7% perf-profile.children.cycles-pp.native_sched_clock > 0.16 ą 10% -0.1 0.05 ą 47% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown > 0.36 ą 9% -0.1 0.25 ą 12% perf-profile.children.cycles-pp.mmap_region > 0.23 ą 7% -0.1 0.12 ą 9% perf-profile.children.cycles-pp.available_idle_cpu > 0.13 ą 11% -0.1 0.02 ą 99% perf-profile.children.cycles-pp.internal_get_user_pages_fast > 0.16 ą 10% -0.1 0.06 ą 18% perf-profile.children.cycles-pp.get_unmapped_area > 0.50 ą 7% -0.1 0.40 ą 6% perf-profile.children.cycles-pp.menu_select > 0.24 ą 9% -0.1 0.14 ą 13% perf-profile.children.cycles-pp.rmqueue > 0.17 ą 14% -0.1 0.07 ą 26% perf-profile.children.cycles-pp.perf_event_comm > 0.17 ą 15% -0.1 0.07 ą 23% perf-profile.children.cycles-pp.perf_event_comm_event > 0.17 ą 11% -0.1 0.07 ą 14% perf-profile.children.cycles-pp.pick_next_entity > 0.13 ą 14% -0.1 0.03 ą102% perf-profile.children.cycles-pp.perf_output_begin > 0.23 ą 6% -0.1 0.13 ą 21% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq > 0.14 ą 18% -0.1 0.04 ą 72% perf-profile.children.cycles-pp.perf_event_comm_output > 0.21 ą 9% -0.1 0.12 ą 9% perf-profile.children.cycles-pp.update_rq_clock > 0.16 ą 8% -0.1 0.06 ą 19% perf-profile.children.cycles-pp.mas_split > 0.13 ą 14% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested > 0.13 ą 6% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.syscall_return_via_sysret > 0.13 ą 7% -0.1 0.04 ą 72% perf-profile.children.cycles-pp.mas_topiary_replace > 0.14 ą 8% -0.1 0.06 ą 9% perf-profile.children.cycles-pp.mas_preallocate > 0.16 ą 11% -0.1 0.07 ą 18% perf-profile.children.cycles-pp.__pick_eevdf > 0.11 ą 14% -0.1 0.02 ą 99% perf-profile.children.cycles-pp.mas_empty_area_rev > 0.25 ą 7% -0.1 0.17 ą 10% perf-profile.children.cycles-pp.select_idle_cpu > 0.14 ą 12% -0.1 0.06 ą 14% perf-profile.children.cycles-pp.cpu_stopper_thread > 0.14 ą 10% -0.1 0.06 ą 13% perf-profile.children.cycles-pp.active_load_balance_cpu_stop > 0.14 ą 14% -0.1 0.06 ą 11% perf-profile.children.cycles-pp.os_xsave > 0.18 ą 6% -0.1 0.11 ą 14% perf-profile.children.cycles-pp.idle_cpu > 0.17 ą 4% -0.1 0.10 ą 15% perf-profile.children.cycles-pp.hrtimer_start_range_ns > 0.11 ą 14% -0.1 0.03 ą100% perf-profile.children.cycles-pp.__pthread_mutex_lock > 0.32 ą 5% -0.1 0.25 ą 5% perf-profile.children.cycles-pp.sched_clock > 0.11 ą 6% -0.1 0.03 ą 70% perf-profile.children.cycles-pp.wakeup_preempt > 0.23 ą 7% -0.1 0.16 ą 13% perf-profile.children.cycles-pp.update_rq_clock_task > 0.13 ą 8% -0.1 0.06 ą 16% perf-profile.children.cycles-pp.local_clock_noinstr > 0.11 ą 10% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk > 0.34 ą 4% -0.1 0.27 ą 6% perf-profile.children.cycles-pp.sched_clock_cpu > 0.11 ą 9% -0.1 0.04 ą 76% perf-profile.children.cycles-pp.avg_vruntime > 0.15 ą 8% -0.1 0.08 ą 14% perf-profile.children.cycles-pp.update_cfs_group > 0.10 ą 8% -0.1 0.04 ą 71% perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk > 0.13 ą 8% -0.1 0.06 ą 11% perf-profile.children.cycles-pp.sched_use_asym_prio > 0.09 ą 12% -0.1 0.02 ą 99% perf-profile.children.cycles-pp.getname_flags > 0.18 ą 9% -0.1 0.12 ą 12% perf-profile.children.cycles-pp.__update_load_avg_se > 0.11 ą 8% -0.1 0.05 ą 46% perf-profile.children.cycles-pp.place_entity > 0.08 ą 12% -0.0 0.02 ą 99% perf-profile.children.cycles-pp.folio_add_lru_vma > 0.10 ą 7% -0.0 0.05 ą 46% perf-profile.children.cycles-pp._find_next_and_bit > 0.10 ą 6% -0.0 0.06 ą 24% perf-profile.children.cycles-pp.reweight_entity > 0.03 ą 70% +0.0 0.08 ą 14% perf-profile.children.cycles-pp.perf_rotate_context > 0.19 ą 10% +0.1 0.25 ą 7% perf-profile.children.cycles-pp.irqtime_account_irq > 0.08 ą 11% +0.1 0.14 ą 21% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler > 0.00 +0.1 0.06 ą 14% perf-profile.children.cycles-pp.rcu_pending > 0.10 ą 17% +0.1 0.16 ą 13% perf-profile.children.cycles-pp.rebalance_domains > 0.14 ą 16% +0.1 0.21 ą 12% perf-profile.children.cycles-pp.downgrade_write > 0.14 ą 14% +0.1 0.21 ą 10% perf-profile.children.cycles-pp.down_read_killable > 0.00 +0.1 0.07 ą 11% perf-profile.children.cycles-pp.free_tail_page_prepare > 0.02 ą141% +0.1 0.09 ą 20% perf-profile.children.cycles-pp.rcu_sched_clock_irq > 0.01 ą223% +0.1 0.08 ą 25% perf-profile.children.cycles-pp.arch_scale_freq_tick > 0.55 ą 9% +0.1 0.62 ą 9% perf-profile.children.cycles-pp.__alloc_pages > 0.34 ą 5% +0.1 0.41 ą 9% perf-profile.children.cycles-pp.clock_nanosleep > 0.00 +0.1 0.08 ą 23% perf-profile.children.cycles-pp.tick_nohz_next_event > 0.70 ą 2% +0.1 0.78 ą 5% perf-profile.children.cycles-pp.flush_tlb_func > 0.14 ą 10% +0.1 0.23 ą 13% perf-profile.children.cycles-pp.__intel_pmu_enable_all > 0.07 ą 19% +0.1 0.17 ą 17% perf-profile.children.cycles-pp.cgroup_rstat_updated > 0.04 ą 71% +0.1 0.14 ą 11% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length > 0.25 ą 9% +0.1 0.38 ą 11% perf-profile.children.cycles-pp.down_read > 0.43 ą 9% +0.1 0.56 ą 10% perf-profile.children.cycles-pp.get_page_from_freelist > 0.00 +0.1 0.15 ą 6% perf-profile.children.cycles-pp.vm_normal_page > 0.31 ą 7% +0.2 0.46 ą 9% perf-profile.children.cycles-pp.native_flush_tlb_local > 0.00 +0.2 0.16 ą 8% perf-profile.children.cycles-pp.__tlb_remove_page_size > 0.28 ą 11% +0.2 0.46 ą 13% perf-profile.children.cycles-pp.vma_alloc_folio > 0.00 +0.2 0.24 ą 5% perf-profile.children.cycles-pp._compound_head > 0.07 ą 16% +0.2 0.31 ą 6% perf-profile.children.cycles-pp.__mod_node_page_state > 0.38 ą 5% +0.2 0.62 ą 7% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context > 0.22 ą 12% +0.2 0.47 ą 10% perf-profile.children.cycles-pp.schedule_preempt_disabled > 0.38 ą 5% +0.3 0.64 ą 7% perf-profile.children.cycles-pp.perf_event_task_tick > 0.00 +0.3 0.27 ą 5% perf-profile.children.cycles-pp.free_swap_cache > 0.30 ą 10% +0.3 0.58 ą 10% perf-profile.children.cycles-pp.rwsem_down_read_slowpath > 0.00 +0.3 0.30 ą 4% perf-profile.children.cycles-pp.free_pages_and_swap_cache > 0.09 ą 10% +0.3 0.42 ą 7% perf-profile.children.cycles-pp.__mod_lruvec_state > 0.00 +0.3 0.34 ą 9% perf-profile.children.cycles-pp.deferred_split_folio > 0.00 +0.4 0.36 ą 13% perf-profile.children.cycles-pp.prep_compound_page > 0.09 ą 10% +0.4 0.50 ą 9% perf-profile.children.cycles-pp.free_unref_page_prepare > 0.00 +0.4 0.42 ą 11% perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page > 1.67 ą 3% +0.4 2.12 ą 8% perf-profile.children.cycles-pp.__hrtimer_run_queues > 0.63 ą 3% +0.5 1.11 ą 12% perf-profile.children.cycles-pp.scheduler_tick > 1.93 ą 3% +0.5 2.46 ą 8% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt > 1.92 ą 3% +0.5 2.45 ą 8% perf-profile.children.cycles-pp.hrtimer_interrupt > 0.73 ą 3% +0.6 1.31 ą 11% perf-profile.children.cycles-pp.update_process_times > 0.74 ą 3% +0.6 1.34 ą 11% perf-profile.children.cycles-pp.tick_sched_handle > 0.20 ą 8% +0.6 0.83 ą 18% perf-profile.children.cycles-pp.__cond_resched > 0.78 ą 4% +0.6 1.43 ą 12% perf-profile.children.cycles-pp.tick_nohz_highres_handler > 0.12 ą 7% +0.7 0.81 ą 5% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state > 0.28 ą 7% +0.9 1.23 ą 4% perf-profile.children.cycles-pp.release_pages > 0.00 +1.0 1.01 ą 6% perf-profile.children.cycles-pp.pmdp_invalidate > 0.35 ą 6% +1.2 1.56 ą 5% perf-profile.children.cycles-pp.__mod_lruvec_page_state > 0.30 ą 8% +1.2 1.53 ą 4% perf-profile.children.cycles-pp.tlb_batch_pages_flush > 0.00 +1.3 1.26 ą 4% perf-profile.children.cycles-pp.page_add_anon_rmap > 0.09 ą 11% +3.1 3.20 ą 5% perf-profile.children.cycles-pp.page_remove_rmap > 1.60 ą 2% +3.4 5.04 ą 4% perf-profile.children.cycles-pp.zap_pte_range > 0.03 ą100% +3.5 3.55 ą 5% perf-profile.children.cycles-pp.__split_huge_pmd_locked > 41.36 +11.6 52.92 ą 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe > 41.22 +11.7 52.88 ą 2% perf-profile.children.cycles-pp.do_syscall_64 > 6.42 ą 6% +13.5 19.88 ą 7% perf-profile.children.cycles-pp.__clone > 0.82 ą 6% +16.2 16.98 ą 7% perf-profile.children.cycles-pp.clear_page_erms > 2.62 ą 5% +16.4 19.04 ą 7% perf-profile.children.cycles-pp.asm_exc_page_fault > 2.18 ą 5% +16.8 18.94 ą 7% perf-profile.children.cycles-pp.exc_page_fault > 2.06 ą 6% +16.8 18.90 ą 7% perf-profile.children.cycles-pp.do_user_addr_fault > 1.60 ą 8% +17.0 18.60 ą 7% perf-profile.children.cycles-pp.handle_mm_fault > 1.52 ą 7% +17.1 18.58 ą 7% perf-profile.children.cycles-pp.__handle_mm_fault > 0.30 ą 7% +17.4 17.72 ą 7% perf-profile.children.cycles-pp.clear_huge_page > 0.31 ą 8% +17.6 17.90 ą 7% perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page > 11.66 ą 3% +22.2 33.89 ą 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath > 3.29 ą 3% +30.2 33.46 perf-profile.children.cycles-pp._raw_spin_lock > 0.04 ą 71% +36.2 36.21 ą 2% perf-profile.children.cycles-pp.__split_huge_pmd > 8.00 ą 4% +36.5 44.54 ą 2% perf-profile.children.cycles-pp.__madvise > 7.87 ą 4% +36.6 44.44 ą 2% perf-profile.children.cycles-pp.__x64_sys_madvise > 7.86 ą 4% +36.6 44.44 ą 2% perf-profile.children.cycles-pp.do_madvise > 7.32 ą 4% +36.8 44.07 ą 2% perf-profile.children.cycles-pp.madvise_vma_behavior > 7.26 ą 4% +36.8 44.06 ą 2% perf-profile.children.cycles-pp.zap_page_range_single > 1.78 +39.5 41.30 ą 2% perf-profile.children.cycles-pp.unmap_page_range > 1.72 +39.6 41.28 ą 2% perf-profile.children.cycles-pp.zap_pmd_range > 24.76 ą 2% -8.5 16.31 ą 2% perf-profile.self.cycles-pp.intel_idle > 11.46 ą 2% -7.8 3.65 ą 5% perf-profile.self.cycles-pp.intel_idle_irq > 3.16 ą 7% -2.1 1.04 ą 6% perf-profile.self.cycles-pp.smp_call_function_many_cond > 1.49 ą 4% -1.2 0.30 ą 12% perf-profile.self.cycles-pp.poll_idle > 1.15 ą 3% -0.6 0.50 ą 9% perf-profile.self.cycles-pp._raw_spin_lock > 0.60 ą 6% -0.6 0.03 ą100% perf-profile.self.cycles-pp.queued_write_lock_slowpath > 0.69 ą 4% -0.5 0.22 ą 20% perf-profile.self.cycles-pp.memcpy_orig > 0.66 ą 7% -0.5 0.18 ą 11% perf-profile.self.cycles-pp.update_sg_wakeup_stats > 0.59 ą 4% -0.5 0.13 ą 8% perf-profile.self.cycles-pp._raw_spin_lock_irq > 0.86 ą 3% -0.4 0.43 ą 12% perf-profile.self.cycles-pp.update_sg_lb_stats > 0.56 -0.4 0.16 ą 7% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate > 0.48 ą 3% -0.4 0.12 ą 10% perf-profile.self.cycles-pp.__slab_free > 1.18 ą 2% -0.4 0.82 ą 3% perf-profile.self.cycles-pp.llist_add_batch > 0.54 ą 5% -0.3 0.19 ą 6% perf-profile.self.cycles-pp.__schedule > 0.47 ą 7% -0.3 0.18 ą 13% perf-profile.self.cycles-pp._raw_spin_lock_irqsave > 0.34 ą 5% -0.2 0.09 ą 18% perf-profile.self.cycles-pp.kmem_cache_free > 0.43 ą 4% -0.2 0.18 ą 11% perf-profile.self.cycles-pp.update_load_avg > 0.35 ą 4% -0.2 0.10 ą 23% perf-profile.self.cycles-pp.rcu_cblist_dequeue > 0.38 ą 9% -0.2 0.15 ą 10% perf-profile.self.cycles-pp.__switch_to_asm > 0.33 ą 5% -0.2 0.10 ą 16% perf-profile.self.cycles-pp.__task_pid_nr_ns > 0.36 ą 6% -0.2 0.13 ą 14% perf-profile.self.cycles-pp.switch_mm_irqs_off > 0.31 ą 6% -0.2 0.09 ą 6% perf-profile.self.cycles-pp.__free_one_page > 0.28 ą 5% -0.2 0.06 ą 50% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack > 0.27 ą 13% -0.2 0.06 ą 23% perf-profile.self.cycles-pp.pthread_create@@GLIBC_2.2.5 > 0.30 ą 7% -0.2 0.10 ą 19% perf-profile.self.cycles-pp.__switch_to > 0.27 ą 4% -0.2 0.10 ą 17% perf-profile.self.cycles-pp.finish_task_switch > 0.23 ą 7% -0.2 0.06 ą 50% perf-profile.self.cycles-pp.mas_walk > 0.22 ą 9% -0.2 0.05 ą 48% perf-profile.self.cycles-pp.__clone > 0.63 ą 5% -0.2 0.46 ą 12% perf-profile.self.cycles-pp.llist_reverse_order > 0.20 ą 4% -0.2 0.04 ą 72% perf-profile.self.cycles-pp.entry_SYSCALL_64 > 0.24 ą 10% -0.1 0.09 ą 19% perf-profile.self.cycles-pp.rmqueue_bulk > 0.18 ą 13% -0.1 0.03 ą101% perf-profile.self.cycles-pp.__radix_tree_lookup > 0.18 ą 11% -0.1 0.04 ą 71% perf-profile.self.cycles-pp.stress_pthread_func > 0.36 ą 8% -0.1 0.22 ą 11% perf-profile.self.cycles-pp.menu_select > 0.22 ą 4% -0.1 0.08 ą 19% perf-profile.self.cycles-pp.___perf_sw_event > 0.20 ą 13% -0.1 0.07 ą 20% perf-profile.self.cycles-pp.start_thread > 0.16 ą 13% -0.1 0.03 ą101% perf-profile.self.cycles-pp.alloc_vmap_area > 0.17 ą 10% -0.1 0.04 ą 73% perf-profile.self.cycles-pp.kmem_cache_alloc > 0.14 ą 9% -0.1 0.03 ą100% perf-profile.self.cycles-pp.futex_wake > 0.17 ą 4% -0.1 0.06 ą 11% perf-profile.self.cycles-pp.dequeue_task_fair > 0.23 ą 6% -0.1 0.12 ą 11% perf-profile.self.cycles-pp.available_idle_cpu > 0.22 ą 13% -0.1 0.11 ą 12% perf-profile.self.cycles-pp._find_next_bit > 0.21 ą 7% -0.1 0.10 ą 6% perf-profile.self.cycles-pp.__rmqueue_pcplist > 0.37 ą 7% -0.1 0.26 ą 8% perf-profile.self.cycles-pp.native_sched_clock > 0.22 ą 7% -0.1 0.12 ą 21% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq > 0.19 ą 7% -0.1 0.10 ą 11% perf-profile.self.cycles-pp.enqueue_entity > 0.15 ą 5% -0.1 0.06 ą 45% perf-profile.self.cycles-pp.enqueue_task_fair > 0.15 ą 11% -0.1 0.06 ą 17% perf-profile.self.cycles-pp.__pick_eevdf > 0.13 ą 13% -0.1 0.05 ą 72% perf-profile.self.cycles-pp.prepare_task_switch > 0.17 ą 10% -0.1 0.08 ą 8% perf-profile.self.cycles-pp.update_rq_clock_task > 0.54 ą 4% -0.1 0.46 ą 6% perf-profile.self.cycles-pp.__flush_smp_call_function_queue > 0.14 ą 14% -0.1 0.06 ą 11% perf-profile.self.cycles-pp.os_xsave > 0.11 ą 10% -0.1 0.03 ą 70% perf-profile.self.cycles-pp.try_to_wake_up > 0.10 ą 8% -0.1 0.03 ą100% perf-profile.self.cycles-pp.futex_wait > 0.14 ą 9% -0.1 0.07 ą 10% perf-profile.self.cycles-pp.update_curr > 0.18 ą 9% -0.1 0.11 ą 14% perf-profile.self.cycles-pp.idle_cpu > 0.11 ą 11% -0.1 0.04 ą 76% perf-profile.self.cycles-pp.avg_vruntime > 0.15 ą 10% -0.1 0.08 ą 14% perf-profile.self.cycles-pp.update_cfs_group > 0.09 ą 9% -0.1 0.03 ą100% perf-profile.self.cycles-pp.reweight_entity > 0.12 ą 13% -0.1 0.06 ą 8% perf-profile.self.cycles-pp.do_idle > 0.18 ą 10% -0.1 0.12 ą 13% perf-profile.self.cycles-pp.__update_load_avg_se > 0.09 ą 17% -0.1 0.04 ą 71% perf-profile.self.cycles-pp.cpuidle_idle_call > 0.10 ą 11% -0.0 0.06 ą 45% perf-profile.self.cycles-pp.update_rq_clock > 0.12 ą 15% -0.0 0.07 ą 16% perf-profile.self.cycles-pp.update_sd_lb_stats > 0.09 ą 5% -0.0 0.05 ą 46% perf-profile.self.cycles-pp._find_next_and_bit > 0.01 ą223% +0.1 0.08 ą 25% perf-profile.self.cycles-pp.arch_scale_freq_tick > 0.78 ą 4% +0.1 0.87 ą 4% perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys > 0.14 ą 10% +0.1 0.23 ą 13% perf-profile.self.cycles-pp.__intel_pmu_enable_all > 0.06 ą 46% +0.1 0.15 ą 19% perf-profile.self.cycles-pp.cgroup_rstat_updated > 0.19 ą 3% +0.1 0.29 ą 4% perf-profile.self.cycles-pp.cpuidle_enter_state > 0.00 +0.1 0.10 ą 11% perf-profile.self.cycles-pp.__mod_lruvec_state > 0.00 +0.1 0.11 ą 18% perf-profile.self.cycles-pp.__tlb_remove_page_size > 0.00 +0.1 0.12 ą 9% perf-profile.self.cycles-pp.vm_normal_page > 0.23 ą 7% +0.1 0.36 ą 8% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context > 0.20 ą 8% +0.2 0.35 ą 7% perf-profile.self.cycles-pp.__mod_lruvec_page_state > 1.12 ą 2% +0.2 1.28 ą 4% perf-profile.self.cycles-pp.zap_pte_range > 0.31 ą 8% +0.2 0.46 ą 9% perf-profile.self.cycles-pp.native_flush_tlb_local > 0.00 +0.2 0.16 ą 5% perf-profile.self.cycles-pp._compound_head > 0.06 ą 17% +0.2 0.26 ą 4% perf-profile.self.cycles-pp.__mod_node_page_state > 0.00 +0.2 0.24 ą 6% perf-profile.self.cycles-pp.free_swap_cache > 0.00 +0.3 0.27 ą 15% perf-profile.self.cycles-pp.clear_huge_page > 0.00 +0.3 0.27 ą 11% perf-profile.self.cycles-pp.deferred_split_folio > 0.00 +0.4 0.36 ą 13% perf-profile.self.cycles-pp.prep_compound_page > 0.05 ą 47% +0.4 0.43 ą 9% perf-profile.self.cycles-pp.free_unref_page_prepare > 0.08 ą 7% +0.5 0.57 ą 23% perf-profile.self.cycles-pp.__cond_resched > 0.08 ą 12% +0.5 0.58 ą 5% perf-profile.self.cycles-pp.release_pages > 0.10 ą 10% +0.5 0.63 ą 6% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state > 0.00 +1.1 1.11 ą 7% perf-profile.self.cycles-pp.__split_huge_pmd_locked > 0.00 +1.2 1.18 ą 4% perf-profile.self.cycles-pp.page_add_anon_rmap > 0.03 ą101% +1.3 1.35 ą 7% perf-profile.self.cycles-pp.page_remove_rmap > 0.82 ą 5% +16.1 16.88 ą 7% perf-profile.self.cycles-pp.clear_page_erms > 11.65 ą 3% +20.2 31.88 ą 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath > > > *************************************************************************************************** > lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory > ========================================================================================= > array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase: > 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream > > commit: > 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()") > 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries") > > 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 10.50 ą 14% +55.6% 16.33 ą 16% perf-c2c.DRAM.local > 6724 -11.4% 5954 ą 2% vmstat.system.cs > 2.746e+09 +16.7% 3.205e+09 ą 2% cpuidle..time > 2771516 +16.0% 3213723 ą 2% cpuidle..usage > 0.06 ą 4% -0.0 0.05 ą 5% mpstat.cpu.all.soft% > 0.47 ą 2% -0.1 0.39 ą 2% mpstat.cpu.all.sys% > 0.01 ą 85% +1700.0% 0.20 ą188% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read > 15.11 ą 13% -28.8% 10.76 ą 34% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 15.09 ą 13% -30.3% 10.51 ą 38% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm > 1023952 +13.4% 1161219 meminfo.AnonHugePages > 1319741 +10.8% 1461995 meminfo.AnonPages > 1331039 +11.2% 1480149 meminfo.Inactive > 1330865 +11.2% 1479975 meminfo.Inactive(anon) > 1266202 +16.0% 1469399 ą 2% turbostat.C1E > 1509871 +16.6% 1760853 ą 2% turbostat.C6 > 3521203 +17.4% 4134075 ą 3% turbostat.IRQ > 580.32 -3.8% 558.30 turbostat.PkgWatt > 77.42 -14.0% 66.60 ą 2% turbostat.RAMWatt > 330416 +10.8% 366020 proc-vmstat.nr_anon_pages > 500.90 +13.4% 567.99 proc-vmstat.nr_anon_transparent_hugepages > 333197 +11.2% 370536 proc-vmstat.nr_inactive_anon > 333197 +11.2% 370536 proc-vmstat.nr_zone_inactive_anon > 129879 ą 11% -46.7% 69207 ą 12% proc-vmstat.numa_pages_migrated > 3879028 +5.9% 4109180 proc-vmstat.pgalloc_normal > 3403414 +6.6% 3628929 proc-vmstat.pgfree > 129879 ą 11% -46.7% 69207 ą 12% proc-vmstat.pgmigrate_success > 5763 +9.8% 6327 proc-vmstat.thp_fault_alloc > 350993 -15.6% 296081 ą 2% stream.add_bandwidth_MBps > 349830 -16.1% 293492 ą 2% stream.add_bandwidth_MBps_harmonicMean > 333973 -20.5% 265439 ą 3% stream.copy_bandwidth_MBps > 332930 -21.7% 260548 ą 3% stream.copy_bandwidth_MBps_harmonicMean > 302788 -16.2% 253817 ą 2% stream.scale_bandwidth_MBps > 302157 -17.1% 250577 ą 2% stream.scale_bandwidth_MBps_harmonicMean > 1177276 +9.3% 1286614 stream.time.maximum_resident_set_size > 5038 +1.1% 5095 stream.time.percent_of_cpu_this_job_got > 694.19 ą 2% +19.5% 829.85 ą 2% stream.time.user_time > 339047 -12.1% 298061 stream.triad_bandwidth_MBps > 338186 -12.4% 296218 stream.triad_bandwidth_MBps_harmonicMean > 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi > 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi > 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi > 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi > 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi > 8.42 ą100% -8.4 0.00 perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode > 0.84 ą103% +1.7 2.57 ą 59% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle > 0.84 ą103% +1.7 2.57 ą 59% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call > 0.31 ą223% +2.0 2.33 ą 44% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter > 0.31 ą223% +2.0 2.33 ą 44% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state > 3.07 ą 56% +2.8 5.88 ą 28% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe > 8.42 ą100% -8.4 0.00 perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi > 8.42 ą100% -8.1 0.36 ą223% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode > 12.32 ą 25% -6.6 5.69 ą 69% perf-profile.children.cycles-pp.vsnprintf > 12.76 ą 27% -6.6 6.19 ą 67% perf-profile.children.cycles-pp.seq_printf > 3.07 ą 56% +2.8 5.88 ą 28% perf-profile.children.cycles-pp.__x64_sys_exit_group > 40.11 -11.0% 35.71 ą 2% perf-stat.i.MPKI > 1.563e+10 -12.3% 1.371e+10 ą 2% perf-stat.i.branch-instructions > 3.721e+09 ą 2% -23.2% 2.858e+09 ą 4% perf-stat.i.cache-misses > 4.471e+09 ą 3% -22.7% 3.458e+09 ą 4% perf-stat.i.cache-references > 5970 ą 5% -15.9% 5021 ą 4% perf-stat.i.context-switches > 1.66 ą 2% +15.8% 1.92 ą 2% perf-stat.i.cpi > 41.83 ą 4% +30.6% 54.63 ą 4% perf-stat.i.cycles-between-cache-misses > 2.282e+10 ą 2% -14.5% 1.952e+10 ą 2% perf-stat.i.dTLB-loads > 572602 ą 3% -9.2% 519922 ą 5% perf-stat.i.dTLB-store-misses > 1.483e+10 ą 2% -15.7% 1.25e+10 ą 2% perf-stat.i.dTLB-stores > 9.179e+10 -13.7% 7.924e+10 ą 2% perf-stat.i.instructions > 0.61 -13.4% 0.52 ą 2% perf-stat.i.ipc > 373.79 ą 4% -37.8% 232.60 ą 9% perf-stat.i.metric.K/sec > 251.45 -13.4% 217.72 ą 2% perf-stat.i.metric.M/sec > 21446 ą 3% -24.1% 16278 ą 8% perf-stat.i.minor-faults > 15.07 ą 5% -6.0 9.10 ą 10% perf-stat.i.node-load-miss-rate% > 68275790 ą 5% -44.9% 37626128 ą 12% perf-stat.i.node-load-misses > 21448 ą 3% -24.1% 16281 ą 8% perf-stat.i.page-faults > 40.71 -11.3% 36.10 ą 2% perf-stat.overall.MPKI > 1.67 +15.3% 1.93 ą 2% perf-stat.overall.cpi > 41.07 ą 3% +30.1% 53.42 ą 4% perf-stat.overall.cycles-between-cache-misses > 0.00 ą 2% +0.0 0.00 ą 2% perf-stat.overall.dTLB-store-miss-rate% > 0.60 -13.2% 0.52 ą 2% perf-stat.overall.ipc > 15.19 ą 5% -6.2 9.03 ą 11% perf-stat.overall.node-load-miss-rate% > 1.4e+10 -9.3% 1.269e+10 perf-stat.ps.branch-instructions > 3.352e+09 ą 3% -20.9% 2.652e+09 ą 4% perf-stat.ps.cache-misses > 4.026e+09 ą 3% -20.3% 3.208e+09 ą 4% perf-stat.ps.cache-references > 4888 ą 4% -10.8% 4362 ą 3% perf-stat.ps.context-switches > 206092 +2.1% 210375 perf-stat.ps.cpu-clock > 1.375e+11 +2.8% 1.414e+11 perf-stat.ps.cpu-cycles > 258.23 ą 5% +8.8% 280.85 ą 4% perf-stat.ps.cpu-migrations > 2.048e+10 -11.7% 1.809e+10 ą 2% perf-stat.ps.dTLB-loads > 1.333e+10 ą 2% -13.0% 1.16e+10 ą 2% perf-stat.ps.dTLB-stores > 8.231e+10 -10.8% 7.342e+10 perf-stat.ps.instructions > 15755 ą 3% -16.3% 13187 ą 6% perf-stat.ps.minor-faults > 61706790 ą 6% -43.8% 34699716 ą 11% perf-stat.ps.node-load-misses > 15757 ą 3% -16.3% 13189 ą 6% perf-stat.ps.page-faults > 206092 +2.1% 210375 perf-stat.ps.task-clock > 1.217e+12 +4.1% 1.267e+12 ą 2% perf-stat.total.instructions > > > > *************************************************************************************************** > lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite > > commit: > 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()") > 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries") > > 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 232.12 ą 7% -12.0% 204.18 ą 8% sched_debug.cfs_rq:/.load_avg.stddev > 6797 -3.3% 6576 vmstat.system.cs > 15161 -0.9% 15029 vmstat.system.in > 349927 +44.3% 504820 meminfo.AnonHugePages > 507807 +27.1% 645169 meminfo.AnonPages > 1499332 +10.2% 1652612 meminfo.Inactive(anon) > 8.67 ą 62% +184.6% 24.67 ą 25% turbostat.C10 > 1.50 -0.1 1.45 turbostat.C1E% > 3.30 -3.2% 3.20 turbostat.RAMWatt > 1.40 ą 14% -0.3 1.09 ą 13% perf-profile.calltrace.cycles-pp.asm_exc_page_fault > 1.44 ą 12% -0.3 1.12 ą 13% perf-profile.children.cycles-pp.asm_exc_page_fault > 0.03 ą141% +0.1 0.10 ą 30% perf-profile.children.cycles-pp.next_uptodate_folio > 0.02 ą141% +0.1 0.10 ą 22% perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup > 0.02 ą143% +0.1 0.10 ą 25% perf-profile.self.cycles-pp.next_uptodate_folio > 0.01 ą223% +0.1 0.09 ą 19% perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup > 19806 -3.5% 19109 phoronix-test-suite.ramspeed.Average.Integer.mb_s > 283.70 +3.8% 294.50 phoronix-test-suite.time.elapsed_time > 283.70 +3.8% 294.50 phoronix-test-suite.time.elapsed_time.max > 120454 +1.6% 122334 phoronix-test-suite.time.maximum_resident_set_size > 281337 -54.8% 127194 phoronix-test-suite.time.minor_page_faults > 259.13 +4.1% 269.81 phoronix-test-suite.time.user_time > 126951 +27.0% 161291 proc-vmstat.nr_anon_pages > 170.86 +44.3% 246.49 proc-vmstat.nr_anon_transparent_hugepages > 355917 -1.0% 352250 proc-vmstat.nr_dirty_background_threshold > 712705 -1.0% 705362 proc-vmstat.nr_dirty_threshold > 3265201 -1.1% 3228465 proc-vmstat.nr_free_pages > 374833 +10.2% 413153 proc-vmstat.nr_inactive_anon > 1767 +4.8% 1853 proc-vmstat.nr_page_table_pages > 374833 +10.2% 413153 proc-vmstat.nr_zone_inactive_anon > 854665 -34.3% 561406 proc-vmstat.numa_hit > 854632 -34.3% 561397 proc-vmstat.numa_local > 5548755 +1.1% 5610598 proc-vmstat.pgalloc_normal > 1083315 -26.2% 799129 proc-vmstat.pgfault > 113425 +3.7% 117656 proc-vmstat.pgreuse > 9025 +7.6% 9714 proc-vmstat.thp_fault_alloc > 3.38 +0.1 3.45 perf-stat.i.branch-miss-rate% > 4.135e+08 -3.2% 4.003e+08 perf-stat.i.cache-misses > 5.341e+08 -2.7% 5.197e+08 perf-stat.i.cache-references > 6832 -3.4% 6600 perf-stat.i.context-switches > 4.06 +3.1% 4.19 perf-stat.i.cpi > 438639 ą 5% -18.7% 356730 ą 6% perf-stat.i.dTLB-load-misses > 1.119e+09 -3.8% 1.077e+09 perf-stat.i.dTLB-loads > 0.02 ą 15% -0.0 0.01 ą 26% perf-stat.i.dTLB-store-miss-rate% > 80407 ą 10% -63.5% 29387 ą 23% perf-stat.i.dTLB-store-misses > 7.319e+08 -3.8% 7.043e+08 perf-stat.i.dTLB-stores > 57.72 +0.8 58.52 perf-stat.i.iTLB-load-miss-rate% > 129846 -3.8% 124973 perf-stat.i.iTLB-load-misses > 144448 -5.3% 136837 perf-stat.i.iTLB-loads > 2.389e+09 -3.5% 2.305e+09 perf-stat.i.instructions > 0.28 -2.9% 0.27 perf-stat.i.ipc > 220.59 -3.4% 213.11 perf-stat.i.metric.M/sec > 3610 -31.2% 2483 perf-stat.i.minor-faults > 49238342 +1.1% 49776834 perf-stat.i.node-loads > 98106028 -3.1% 95018390 perf-stat.i.node-stores > 3615 -31.2% 2487 perf-stat.i.page-faults > 3.65 +3.7% 3.78 perf-stat.overall.cpi > 21.08 +3.3% 21.79 perf-stat.overall.cycles-between-cache-misses > 0.04 ą 5% -0.0 0.03 ą 6% perf-stat.overall.dTLB-load-miss-rate% > 0.01 ą 10% -0.0 0.00 ą 23% perf-stat.overall.dTLB-store-miss-rate% > 0.27 -3.6% 0.26 perf-stat.overall.ipc > 4.122e+08 -3.2% 3.99e+08 perf-stat.ps.cache-misses > 5.324e+08 -2.7% 5.181e+08 perf-stat.ps.cache-references > 6809 -3.4% 6580 perf-stat.ps.context-switches > 437062 ą 5% -18.7% 355481 ą 6% perf-stat.ps.dTLB-load-misses > 1.115e+09 -3.8% 1.073e+09 perf-stat.ps.dTLB-loads > 80134 ą 10% -63.5% 29283 ą 23% perf-stat.ps.dTLB-store-misses > 7.295e+08 -3.8% 7.021e+08 perf-stat.ps.dTLB-stores > 129362 -3.7% 124535 perf-stat.ps.iTLB-load-misses > 143865 -5.2% 136338 perf-stat.ps.iTLB-loads > 2.381e+09 -3.5% 2.297e+09 perf-stat.ps.instructions > 3596 -31.2% 2473 perf-stat.ps.minor-faults > 49081949 +1.1% 49621463 perf-stat.ps.node-loads > 97795918 -3.1% 94724831 perf-stat.ps.node-stores > 3600 -31.2% 2477 perf-stat.ps.page-faults > > > > *************************************************************************************************** > lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite > > commit: > 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()") > 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries") > > 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 167.28 ą 5% -13.1% 145.32 ą 6% sched_debug.cfs_rq:/.util_est_enqueued.avg > 6845 -2.5% 6674 vmstat.system.cs > 351910 ą 2% +40.2% 493341 meminfo.AnonHugePages > 505908 +27.2% 643328 meminfo.AnonPages > 1497656 +10.2% 1650453 meminfo.Inactive(anon) > 18957 ą 13% +26.3% 23947 ą 17% turbostat.C1 > 1.52 -0.0 1.48 turbostat.C1E% > 3.32 -2.9% 3.23 turbostat.RAMWatt > 19978 -3.0% 19379 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s > 280.71 +3.3% 289.93 phoronix-test-suite.time.elapsed_time > 280.71 +3.3% 289.93 phoronix-test-suite.time.elapsed_time.max > 120465 +1.5% 122257 phoronix-test-suite.time.maximum_resident_set_size > 281047 -54.7% 127190 phoronix-test-suite.time.minor_page_faults > 257.03 +3.5% 265.95 phoronix-test-suite.time.user_time > 126473 +27.2% 160831 proc-vmstat.nr_anon_pages > 171.83 ą 2% +40.2% 240.89 proc-vmstat.nr_anon_transparent_hugepages > 355973 -1.0% 352304 proc-vmstat.nr_dirty_background_threshold > 712818 -1.0% 705471 proc-vmstat.nr_dirty_threshold > 3265800 -1.1% 3228879 proc-vmstat.nr_free_pages > 374410 +10.2% 412613 proc-vmstat.nr_inactive_anon > 1770 +4.4% 1848 proc-vmstat.nr_page_table_pages > 374410 +10.2% 412613 proc-vmstat.nr_zone_inactive_anon > 852082 -34.9% 555093 proc-vmstat.numa_hit > 852125 -34.9% 555018 proc-vmstat.numa_local > 1078293 -26.6% 791038 proc-vmstat.pgfault > 112693 +2.9% 116004 proc-vmstat.pgreuse > 9025 +7.6% 9713 proc-vmstat.thp_fault_alloc > 3.63 ą 6% +0.6 4.25 ą 9% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt > 0.25 ą 55% -0.2 0.08 ą 68% perf-profile.children.cycles-pp.ret_from_fork_asm > 0.25 ą 55% -0.2 0.08 ą 68% perf-profile.children.cycles-pp.ret_from_fork > 0.23 ą 56% -0.2 0.07 ą 69% perf-profile.children.cycles-pp.kthread > 0.14 ą 36% -0.1 0.05 ą120% perf-profile.children.cycles-pp.do_anonymous_page > 0.14 ą 35% -0.1 0.05 ą 76% perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string > 0.04 ą 72% +0.0 0.08 ą 19% perf-profile.children.cycles-pp.try_to_wake_up > 0.04 ą118% +0.1 0.10 ą 36% perf-profile.children.cycles-pp.update_rq_clock > 0.07 ą 79% +0.1 0.17 ą 21% perf-profile.children.cycles-pp._raw_spin_lock_irqsave > 7.99 ą 11% +1.0 9.02 ą 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt > 0.23 ą 28% -0.1 0.14 ą 49% perf-profile.self.cycles-pp.irqentry_exit_to_user_mode > 0.14 ą 35% -0.1 0.05 ą 76% perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string > 0.06 ą 79% +0.1 0.16 ą 21% perf-profile.self.cycles-pp._raw_spin_lock_irqsave > 0.21 ą 34% +0.2 0.36 ą 18% perf-profile.self.cycles-pp.ktime_get > 1.187e+08 -4.6% 1.133e+08 perf-stat.i.branch-instructions > 3.36 +0.1 3.42 perf-stat.i.branch-miss-rate% > 5492420 -3.9% 5275592 perf-stat.i.branch-misses > 4.148e+08 -2.8% 4.034e+08 perf-stat.i.cache-misses > 5.251e+08 -2.6% 5.114e+08 perf-stat.i.cache-references > 6880 -2.5% 6711 perf-stat.i.context-switches > 4.30 +2.9% 4.43 perf-stat.i.cpi > 0.10 ą 7% -0.0 0.09 ą 2% perf-stat.i.dTLB-load-miss-rate% > 472268 ą 6% -19.9% 378489 perf-stat.i.dTLB-load-misses > 8.107e+08 -3.4% 7.831e+08 perf-stat.i.dTLB-loads > 0.02 ą 16% -0.0 0.01 ą 2% perf-stat.i.dTLB-store-miss-rate% > 90535 ą 11% -59.8% 36371 ą 2% perf-stat.i.dTLB-store-misses > 5.323e+08 -3.3% 5.145e+08 perf-stat.i.dTLB-stores > 129981 -3.0% 126061 perf-stat.i.iTLB-load-misses > 143662 -3.1% 139223 perf-stat.i.iTLB-loads > 2.253e+09 -3.6% 2.172e+09 perf-stat.i.instructions > 0.26 -3.2% 0.25 perf-stat.i.ipc > 4.71 ą 2% -6.4% 4.41 ą 2% perf-stat.i.major-faults > 180.03 -3.0% 174.57 perf-stat.i.metric.M/sec > 3627 -30.8% 2510 ą 2% perf-stat.i.minor-faults > 3632 -30.8% 2514 ą 2% perf-stat.i.page-faults > 3.88 +3.6% 4.02 perf-stat.overall.cpi > 21.08 +2.7% 21.65 perf-stat.overall.cycles-between-cache-misses > 0.06 ą 6% -0.0 0.05 perf-stat.overall.dTLB-load-miss-rate% > 0.02 ą 11% -0.0 0.01 ą 2% perf-stat.overall.dTLB-store-miss-rate% > 0.26 -3.5% 0.25 perf-stat.overall.ipc > 1.182e+08 -4.6% 1.128e+08 perf-stat.ps.branch-instructions > 5468166 -4.0% 5251939 perf-stat.ps.branch-misses > 4.135e+08 -2.7% 4.021e+08 perf-stat.ps.cache-misses > 5.234e+08 -2.6% 5.098e+08 perf-stat.ps.cache-references > 6859 -2.5% 6685 perf-stat.ps.context-switches > 470567 ą 6% -19.9% 377127 perf-stat.ps.dTLB-load-misses > 8.079e+08 -3.4% 7.805e+08 perf-stat.ps.dTLB-loads > 90221 ą 11% -59.8% 36239 ą 2% perf-stat.ps.dTLB-store-misses > 5.305e+08 -3.3% 5.128e+08 perf-stat.ps.dTLB-stores > 129499 -3.0% 125601 perf-stat.ps.iTLB-load-misses > 143121 -3.1% 138638 perf-stat.ps.iTLB-loads > 2.246e+09 -3.6% 2.165e+09 perf-stat.ps.instructions > 4.69 ą 2% -6.3% 4.39 ą 2% perf-stat.ps.major-faults > 3613 -30.8% 2500 ą 2% perf-stat.ps.minor-faults > 3617 -30.8% 2504 ą 2% perf-stat.ps.page-faults > > > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-20 5:27 ` Yang Shi @ 2023-12-20 8:29 ` Yin Fengwei 2023-12-20 15:42 ` Christoph Lameter (Ampere) 2023-12-20 20:09 ` Yang Shi 0 siblings, 2 replies; 24+ messages in thread From: Yin Fengwei @ 2023-12-20 8:29 UTC (permalink / raw) To: Yang Shi, kernel test robot Cc: Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On 2023/12/20 13:27, Yang Shi wrote: > On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote: >> >> >> >> Hello, >> >> for this commit, we reported >> "[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression" >> in Aug, 2022 when it's in linux-next/master >> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/ >> >> later, we reported >> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression" >> in Oct, 2022 when it's in linus/master >> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/ >> >> and the commit was reverted finally by >> commit 0ba09b1733878afe838fe35c310715fda3d46428 >> Author: Linus Torvalds <torvalds@linux-foundation.org> >> Date: Sun Dec 4 12:51:59 2022 -0800 >> >> now we noticed it goes into linux-next/master again. >> >> we are not sure if there is an agreement that the benefit of this commit >> has already overweight performance drop in some mirco benchmark. >> >> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/ >> that >> "This patch was applied to v6.1, but was reverted due to a regression >> report. However it turned out the regression was not due to this patch. >> I ping'ed Andrew to reapply this patch, Andrew may forget it. This >> patch helps promote THP, so I rebased it onto the latest mm-unstable." > > IIRC, Huang Ying's analysis showed the regression for will-it-scale > micro benchmark is fine, it was actually reverted due to kernel build > regression with LLVM reported by Nathan Chancellor. Then the > regression was resolved by commit > 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out > if page in deferred queue already"). And this patch did improve kernel > build with GCC by ~3% if I remember correctly. > >> >> however, unfortunately, in our latest tests, we still observed below regression >> upon this commit. just FYI. >> >> >> >> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on: > > Interesting, wasn't the same regression seen last time? And I'm a > little bit confused about how pthread got regressed. I didn't see the > pthread benchmark do any intensive memory alloc/free operations. Do > the pthread APIs do any intensive memory operations? I saw the > benchmark does allocate memory for thread stack, but it should be just > 8K per thread, so it should not trigger what this patch does. With > 1024 threads, the thread stacks may get merged into one single VMA (8M > total), but it may do so even though the patch is not applied. stress-ng.pthread test code is strange here: https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573 Even it allocates its own stack, but that attr is not passed to pthread_create. So it's still glibc to allocate stack for pthread which is 8M size. This is why this patch can impact the stress-ng.pthread testing. My understanding is this is different regression (if it's a valid regression). The previous hotspot was in: deferred_split_huge_page deferred_split_huge_page deferred_split_huge_page spin_lock while this time, the hotspot is in (pmd_lock from do_madvise I suppose): - 55.02% zap_pmd_range.isra.0 - 53.42% __split_huge_pmd - 51.74% _raw_spin_lock - 51.73% native_queued_spin_lock_slowpath + 3.03% asm_sysvec_call_function - 1.67% __split_huge_pmd_locked - 0.87% pmdp_invalidate + 0.86% flush_tlb_mm_range - 1.60% zap_pte_range - 1.04% page_remove_rmap 0.55% __mod_lruvec_page_state > >> >> >> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries") >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master >> >> testcase: stress-ng >> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory >> parameters: >> >> nr_threads: 1 >> disk: 1HDD >> testtime: 60s >> fs: ext4 >> class: os >> test: pthread >> cpufreq_governor: performance >> >> >> In addition to that, the commit also has significant impact on the following tests: >> >> +------------------+-----------------------------------------------------------------------------------------------+ >> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression | >> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory | >> | test parameters | array_size=50000000 | >> | | cpufreq_governor=performance | >> | | iterations=10x | >> | | loop=100 | >> | | nr_threads=25% | >> | | omp=true | >> +------------------+-----------------------------------------------------------------------------------------------+ >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression | >> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory | >> | test parameters | cpufreq_governor=performance | >> | | option_a=Average | >> | | option_b=Integer | >> | | test=ramspeed-1.4.3 | >> +------------------+-----------------------------------------------------------------------------------------------+ >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression | >> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory | >> | test parameters | cpufreq_governor=performance | >> | | option_a=Average | >> | | option_b=Floating Point | >> | | test=ramspeed-1.4.3 | >> +------------------+-----------------------------------------------------------------------------------------------+ >> >> >> If you fix the issue in a separate patch/commit (i.e. not just a new version of >> the same patch/commit), kindly add following tags >> | Reported-by: kernel test robot <oliver.sang@intel.com> >> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com >> >> >> Details are as below: >> --------------------------------------------------------------------------------------------------> >> >> >> The kernel config and materials to reproduce are available at: >> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com >> >> ========================================================================================= >> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: >> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s >> >> commit: >> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()") >> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries") >> >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 >> ---------------- --------------------------- >> %stddev %change %stddev >> \ | \ >> 13405796 -65.5% 4620124 cpuidle..usage >> 8.00 +8.2% 8.66 ą 2% iostat.cpu.system >> 1.61 -60.6% 0.63 iostat.cpu.user >> 597.50 ą 14% -64.3% 213.50 ą 14% perf-c2c.DRAM.local >> 1882 ą 14% -74.7% 476.83 ą 7% perf-c2c.HITM.local >> 3768436 -12.9% 3283395 vmstat.memory.cache >> 355105 -75.7% 86344 ą 3% vmstat.system.cs >> 385435 -20.7% 305714 ą 3% vmstat.system.in >> 1.13 -0.2 0.88 mpstat.cpu.all.irq% >> 0.29 -0.2 0.10 ą 2% mpstat.cpu.all.soft% >> 6.76 ą 2% +1.1 7.88 ą 2% mpstat.cpu.all.sys% >> 1.62 -1.0 0.62 ą 2% mpstat.cpu.all.usr% >> 2234397 -84.3% 350161 ą 5% stress-ng.pthread.ops >> 37237 -84.3% 5834 ą 5% stress-ng.pthread.ops_per_sec >> 294706 ą 2% -68.0% 94191 ą 6% stress-ng.time.involuntary_context_switches >> 41442 ą 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size >> 4466457 -83.9% 717053 ą 5% stress-ng.time.minor_page_faults > > The larger RSS and fewer page faults are expected. > >> 243.33 +13.5% 276.17 ą 3% stress-ng.time.percent_of_cpu_this_job_got >> 131.64 +27.7% 168.11 ą 3% stress-ng.time.system_time >> 19.73 -82.1% 3.53 ą 4% stress-ng.time.user_time > > Much less user time. And it seems to match the drop of the pthread metric. > >> 7715609 -80.2% 1530125 ą 4% stress-ng.time.voluntary_context_switches >> 76728 -80.8% 14724 ą 4% perf-stat.i.minor-faults >> 5600408 -61.4% 2160997 ą 5% perf-stat.i.node-loads >> 8873996 +52.1% 13499744 ą 5% perf-stat.i.node-stores >> 112409 -81.9% 20305 ą 4% perf-stat.i.page-faults >> 2.55 +89.6% 4.83 perf-stat.overall.MPKI > > Much more TLB misses. > >> 1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate% >> 19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate% >> 1.70 +56.4% 2.65 perf-stat.overall.cpi >> 665.84 -17.5% 549.51 ą 2% perf-stat.overall.cycles-between-cache-misses >> 0.12 ą 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate% >> 0.08 ą 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate% >> 59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate% >> 1278 +86.1% 2379 ą 2% perf-stat.overall.instructions-per-iTLB-miss >> 0.59 -36.1% 0.38 perf-stat.overall.ipc > > Worse IPC and CPI. > >> 2.078e+09 -48.3% 1.074e+09 ą 4% perf-stat.ps.branch-instructions >> 31292687 -61.2% 12133349 ą 2% perf-stat.ps.branch-misses >> 26057291 -5.9% 24512034 ą 4% perf-stat.ps.cache-misses >> 1.353e+08 -58.6% 56072195 ą 4% perf-stat.ps.cache-references >> 365254 -75.8% 88464 ą 3% perf-stat.ps.context-switches >> 1.735e+10 -22.4% 1.346e+10 ą 2% perf-stat.ps.cpu-cycles >> 60838 -79.1% 12727 ą 6% perf-stat.ps.cpu-migrations >> 3056601 ą 4% -81.5% 565354 ą 4% perf-stat.ps.dTLB-load-misses >> 2.636e+09 -50.7% 1.3e+09 ą 4% perf-stat.ps.dTLB-loads >> 1155253 ą 2% -83.0% 196581 ą 5% perf-stat.ps.dTLB-store-misses >> 1.473e+09 -57.4% 6.268e+08 ą 3% perf-stat.ps.dTLB-stores >> 7997726 -73.3% 2131477 ą 3% perf-stat.ps.iTLB-load-misses >> 5521346 -74.3% 1418623 ą 2% perf-stat.ps.iTLB-loads >> 1.023e+10 -50.4% 5.073e+09 ą 4% perf-stat.ps.instructions >> 75671 -80.9% 14479 ą 4% perf-stat.ps.minor-faults >> 5549722 -61.4% 2141750 ą 4% perf-stat.ps.node-loads >> 8769156 +51.6% 13296579 ą 5% perf-stat.ps.node-stores >> 110795 -82.0% 19977 ą 4% perf-stat.ps.page-faults >> 6.482e+11 -50.7% 3.197e+11 ą 4% perf-stat.total.instructions >> 0.00 ą 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab >> 0.01 ą 18% +8373.1% 0.73 ą 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64 >> 0.01 ą 16% +4600.0% 0.38 ą 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit > > More time spent in madvise and munmap. but I'm not sure whether this > is caused by tearing down the address space when exiting the test. If > so it should not count in the regression. It's not for the whole address space tearing down. It's for pthread stack tearing down when pthread exit (can be treated as address space tearing down? I suppose so). https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384 https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576 Another thing is whether it's worthy to make stack use THP? It may be useful for some apps which need large stack size? Regards Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-20 8:29 ` Yin Fengwei @ 2023-12-20 15:42 ` Christoph Lameter (Ampere) 2023-12-20 20:14 ` Yang Shi 2023-12-20 20:09 ` Yang Shi 1 sibling, 1 reply; 24+ messages in thread From: Christoph Lameter (Ampere) @ 2023-12-20 15:42 UTC (permalink / raw) To: Yin Fengwei Cc: Yang Shi, kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, ying.huang, feng.tang On Wed, 20 Dec 2023, Yin Fengwei wrote: >> Interesting, wasn't the same regression seen last time? And I'm a >> little bit confused about how pthread got regressed. I didn't see the >> pthread benchmark do any intensive memory alloc/free operations. Do >> the pthread APIs do any intensive memory operations? I saw the >> benchmark does allocate memory for thread stack, but it should be just >> 8K per thread, so it should not trigger what this patch does. With >> 1024 threads, the thread stacks may get merged into one single VMA (8M >> total), but it may do so even though the patch is not applied. > stress-ng.pthread test code is strange here: > > https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573 > > Even it allocates its own stack, but that attr is not passed > to pthread_create. So it's still glibc to allocate stack for > pthread which is 8M size. This is why this patch can impact > the stress-ng.pthread testing. Hmmm... The use of calloc() for 8M triggers an mmap I guess. Why is that memory slower if we align the adress to a 2M boundary? Because THP can act faster and creates more overhead? > while this time, the hotspot is in (pmd_lock from do_madvise I suppose): > - 55.02% zap_pmd_range.isra.0 > - 53.42% __split_huge_pmd > - 51.74% _raw_spin_lock > - 51.73% native_queued_spin_lock_slowpath > + 3.03% asm_sysvec_call_function > - 1.67% __split_huge_pmd_locked > - 0.87% pmdp_invalidate > + 0.86% flush_tlb_mm_range > - 1.60% zap_pte_range > - 1.04% page_remove_rmap > 0.55% __mod_lruvec_page_state Ok so we have 2M mappings and they are split because of some action on 4K segments? Guess because of the guard pages? >> More time spent in madvise and munmap. but I'm not sure whether this >> is caused by tearing down the address space when exiting the test. If >> so it should not count in the regression. > It's not for the whole address space tearing down. It's for pthread > stack tearing down when pthread exit (can be treated as address space > tearing down? I suppose so). > > https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384 > https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576 > > Another thing is whether it's worthy to make stack use THP? It may be > useful for some apps which need large stack size? No can do since a calloc is used to allocate the stack. How can the kernel distinguish the allocation? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-20 15:42 ` Christoph Lameter (Ampere) @ 2023-12-20 20:14 ` Yang Shi 0 siblings, 0 replies; 24+ messages in thread From: Yang Shi @ 2023-12-20 20:14 UTC (permalink / raw) To: Christoph Lameter (Ampere) Cc: Yin Fengwei, kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, ying.huang, feng.tang On Wed, Dec 20, 2023 at 7:42 AM Christoph Lameter (Ampere) <cl@linux.com> wrote: > > On Wed, 20 Dec 2023, Yin Fengwei wrote: > > >> Interesting, wasn't the same regression seen last time? And I'm a > >> little bit confused about how pthread got regressed. I didn't see the > >> pthread benchmark do any intensive memory alloc/free operations. Do > >> the pthread APIs do any intensive memory operations? I saw the > >> benchmark does allocate memory for thread stack, but it should be just > >> 8K per thread, so it should not trigger what this patch does. With > >> 1024 threads, the thread stacks may get merged into one single VMA (8M > >> total), but it may do so even though the patch is not applied. > > stress-ng.pthread test code is strange here: > > > > https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573 > > > > Even it allocates its own stack, but that attr is not passed > > to pthread_create. So it's still glibc to allocate stack for > > pthread which is 8M size. This is why this patch can impact > > the stress-ng.pthread testing. > > Hmmm... The use of calloc() for 8M triggers an mmap I guess. > > Why is that memory slower if we align the adress to a 2M boundary? Because > THP can act faster and creates more overhead? glibc calls madvise() to free unused stack, that may have higher cost due to THP (splitting pmd, deferred split queue, etc). > > > while this time, the hotspot is in (pmd_lock from do_madvise I suppose): > > - 55.02% zap_pmd_range.isra.0 > > - 53.42% __split_huge_pmd > > - 51.74% _raw_spin_lock > > - 51.73% native_queued_spin_lock_slowpath > > + 3.03% asm_sysvec_call_function > > - 1.67% __split_huge_pmd_locked > > - 0.87% pmdp_invalidate > > + 0.86% flush_tlb_mm_range > > - 1.60% zap_pte_range > > - 1.04% page_remove_rmap > > 0.55% __mod_lruvec_page_state > > Ok so we have 2M mappings and they are split because of some action on 4K > segments? Guess because of the guard pages? It should not relate to guard pages, just due to free unused stack which may be partial 2M. > > >> More time spent in madvise and munmap. but I'm not sure whether this > >> is caused by tearing down the address space when exiting the test. If > >> so it should not count in the regression. > > It's not for the whole address space tearing down. It's for pthread > > stack tearing down when pthread exit (can be treated as address space > > tearing down? I suppose so). > > > > https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384 > > https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576 > > > > Another thing is whether it's worthy to make stack use THP? It may be > > useful for some apps which need large stack size? > > No can do since a calloc is used to allocate the stack. How can the kernel > distinguish the allocation? Just by VM_GROWSDOWN | VM_GROWSUP. The user space needs to tell kernel this area is stack by setting proper flags. For example, ffffca1df000-ffffca200000 rw-p 00000000 00:00 0 [stack] Size: 132 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 60 kB Pss: 60 kB Pss_Dirty: 60 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 60 kB Referenced: 60 kB Anonymous: 60 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd wr mr mw me gd ac The "gd" flag means GROWSDOWN. But it totally depends on glibc in terms of how it considers about "stack". So glibc just uses calloc() to allocate stack area. > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-20 8:29 ` Yin Fengwei 2023-12-20 15:42 ` Christoph Lameter (Ampere) @ 2023-12-20 20:09 ` Yang Shi 2023-12-21 0:26 ` Yang Shi 1 sibling, 1 reply; 24+ messages in thread From: Yang Shi @ 2023-12-20 20:09 UTC (permalink / raw) To: Yin Fengwei Cc: kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On Wed, Dec 20, 2023 at 12:34 AM Yin Fengwei <fengwei.yin@intel.com> wrote: > > > > On 2023/12/20 13:27, Yang Shi wrote: > > On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote: > >> > >> > >> > >> Hello, > >> > >> for this commit, we reported > >> "[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression" > >> in Aug, 2022 when it's in linux-next/master > >> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/ > >> > >> later, we reported > >> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression" > >> in Oct, 2022 when it's in linus/master > >> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/ > >> > >> and the commit was reverted finally by > >> commit 0ba09b1733878afe838fe35c310715fda3d46428 > >> Author: Linus Torvalds <torvalds@linux-foundation.org> > >> Date: Sun Dec 4 12:51:59 2022 -0800 > >> > >> now we noticed it goes into linux-next/master again. > >> > >> we are not sure if there is an agreement that the benefit of this commit > >> has already overweight performance drop in some mirco benchmark. > >> > >> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/ > >> that > >> "This patch was applied to v6.1, but was reverted due to a regression > >> report. However it turned out the regression was not due to this patch. > >> I ping'ed Andrew to reapply this patch, Andrew may forget it. This > >> patch helps promote THP, so I rebased it onto the latest mm-unstable." > > > > IIRC, Huang Ying's analysis showed the regression for will-it-scale > > micro benchmark is fine, it was actually reverted due to kernel build > > regression with LLVM reported by Nathan Chancellor. Then the > > regression was resolved by commit > > 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out > > if page in deferred queue already"). And this patch did improve kernel > > build with GCC by ~3% if I remember correctly. > > > >> > >> however, unfortunately, in our latest tests, we still observed below regression > >> upon this commit. just FYI. > >> > >> > >> > >> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on: > > > > Interesting, wasn't the same regression seen last time? And I'm a > > little bit confused about how pthread got regressed. I didn't see the > > pthread benchmark do any intensive memory alloc/free operations. Do > > the pthread APIs do any intensive memory operations? I saw the > > benchmark does allocate memory for thread stack, but it should be just > > 8K per thread, so it should not trigger what this patch does. With > > 1024 threads, the thread stacks may get merged into one single VMA (8M > > total), but it may do so even though the patch is not applied. > stress-ng.pthread test code is strange here: > > https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573 > > Even it allocates its own stack, but that attr is not passed > to pthread_create. So it's still glibc to allocate stack for > pthread which is 8M size. This is why this patch can impact > the stress-ng.pthread testing. Aha, nice catch, I overlooked that. > > > My understanding is this is different regression (if it's a valid > regression). The previous hotspot was in: > deferred_split_huge_page > deferred_split_huge_page > deferred_split_huge_page > spin_lock > > while this time, the hotspot is in (pmd_lock from do_madvise I suppose): > - 55.02% zap_pmd_range.isra.0 > - 53.42% __split_huge_pmd > - 51.74% _raw_spin_lock > - 51.73% native_queued_spin_lock_slowpath > + 3.03% asm_sysvec_call_function > - 1.67% __split_huge_pmd_locked > - 0.87% pmdp_invalidate > + 0.86% flush_tlb_mm_range > - 1.60% zap_pte_range > - 1.04% page_remove_rmap > 0.55% __mod_lruvec_page_state > > > > > >> > >> > >> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries") > >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > >> > >> testcase: stress-ng > >> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory > >> parameters: > >> > >> nr_threads: 1 > >> disk: 1HDD > >> testtime: 60s > >> fs: ext4 > >> class: os > >> test: pthread > >> cpufreq_governor: performance > >> > >> > >> In addition to that, the commit also has significant impact on the following tests: > >> > >> +------------------+-----------------------------------------------------------------------------------------------+ > >> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression | > >> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory | > >> | test parameters | array_size=50000000 | > >> | | cpufreq_governor=performance | > >> | | iterations=10x | > >> | | loop=100 | > >> | | nr_threads=25% | > >> | | omp=true | > >> +------------------+-----------------------------------------------------------------------------------------------+ > >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression | > >> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory | > >> | test parameters | cpufreq_governor=performance | > >> | | option_a=Average | > >> | | option_b=Integer | > >> | | test=ramspeed-1.4.3 | > >> +------------------+-----------------------------------------------------------------------------------------------+ > >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression | > >> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory | > >> | test parameters | cpufreq_governor=performance | > >> | | option_a=Average | > >> | | option_b=Floating Point | > >> | | test=ramspeed-1.4.3 | > >> +------------------+-----------------------------------------------------------------------------------------------+ > >> > >> > >> If you fix the issue in a separate patch/commit (i.e. not just a new version of > >> the same patch/commit), kindly add following tags > >> | Reported-by: kernel test robot <oliver.sang@intel.com> > >> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com > >> > >> > >> Details are as below: > >> --------------------------------------------------------------------------------------------------> > >> > >> > >> The kernel config and materials to reproduce are available at: > >> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com > >> > >> ========================================================================================= > >> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > >> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s > >> > >> commit: > >> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()") > >> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries") > >> > >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 > >> ---------------- --------------------------- > >> %stddev %change %stddev > >> \ | \ > >> 13405796 -65.5% 4620124 cpuidle..usage > >> 8.00 +8.2% 8.66 ą 2% iostat.cpu.system > >> 1.61 -60.6% 0.63 iostat.cpu.user > >> 597.50 ą 14% -64.3% 213.50 ą 14% perf-c2c.DRAM.local > >> 1882 ą 14% -74.7% 476.83 ą 7% perf-c2c.HITM.local > >> 3768436 -12.9% 3283395 vmstat.memory.cache > >> 355105 -75.7% 86344 ą 3% vmstat.system.cs > >> 385435 -20.7% 305714 ą 3% vmstat.system.in > >> 1.13 -0.2 0.88 mpstat.cpu.all.irq% > >> 0.29 -0.2 0.10 ą 2% mpstat.cpu.all.soft% > >> 6.76 ą 2% +1.1 7.88 ą 2% mpstat.cpu.all.sys% > >> 1.62 -1.0 0.62 ą 2% mpstat.cpu.all.usr% > >> 2234397 -84.3% 350161 ą 5% stress-ng.pthread.ops > >> 37237 -84.3% 5834 ą 5% stress-ng.pthread.ops_per_sec > >> 294706 ą 2% -68.0% 94191 ą 6% stress-ng.time.involuntary_context_switches > >> 41442 ą 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size > >> 4466457 -83.9% 717053 ą 5% stress-ng.time.minor_page_faults > > > > The larger RSS and fewer page faults are expected. > > > >> 243.33 +13.5% 276.17 ą 3% stress-ng.time.percent_of_cpu_this_job_got > >> 131.64 +27.7% 168.11 ą 3% stress-ng.time.system_time > >> 19.73 -82.1% 3.53 ą 4% stress-ng.time.user_time > > > > Much less user time. And it seems to match the drop of the pthread metric. > > > >> 7715609 -80.2% 1530125 ą 4% stress-ng.time.voluntary_context_switches > >> 76728 -80.8% 14724 ą 4% perf-stat.i.minor-faults > >> 5600408 -61.4% 2160997 ą 5% perf-stat.i.node-loads > >> 8873996 +52.1% 13499744 ą 5% perf-stat.i.node-stores > >> 112409 -81.9% 20305 ą 4% perf-stat.i.page-faults > >> 2.55 +89.6% 4.83 perf-stat.overall.MPKI > > > > Much more TLB misses. > > > >> 1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate% > >> 19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate% > >> 1.70 +56.4% 2.65 perf-stat.overall.cpi > >> 665.84 -17.5% 549.51 ą 2% perf-stat.overall.cycles-between-cache-misses > >> 0.12 ą 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate% > >> 0.08 ą 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate% > >> 59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate% > >> 1278 +86.1% 2379 ą 2% perf-stat.overall.instructions-per-iTLB-miss > >> 0.59 -36.1% 0.38 perf-stat.overall.ipc > > > > Worse IPC and CPI. > > > >> 2.078e+09 -48.3% 1.074e+09 ą 4% perf-stat.ps.branch-instructions > >> 31292687 -61.2% 12133349 ą 2% perf-stat.ps.branch-misses > >> 26057291 -5.9% 24512034 ą 4% perf-stat.ps.cache-misses > >> 1.353e+08 -58.6% 56072195 ą 4% perf-stat.ps.cache-references > >> 365254 -75.8% 88464 ą 3% perf-stat.ps.context-switches > >> 1.735e+10 -22.4% 1.346e+10 ą 2% perf-stat.ps.cpu-cycles > >> 60838 -79.1% 12727 ą 6% perf-stat.ps.cpu-migrations > >> 3056601 ą 4% -81.5% 565354 ą 4% perf-stat.ps.dTLB-load-misses > >> 2.636e+09 -50.7% 1.3e+09 ą 4% perf-stat.ps.dTLB-loads > >> 1155253 ą 2% -83.0% 196581 ą 5% perf-stat.ps.dTLB-store-misses > >> 1.473e+09 -57.4% 6.268e+08 ą 3% perf-stat.ps.dTLB-stores > >> 7997726 -73.3% 2131477 ą 3% perf-stat.ps.iTLB-load-misses > >> 5521346 -74.3% 1418623 ą 2% perf-stat.ps.iTLB-loads > >> 1.023e+10 -50.4% 5.073e+09 ą 4% perf-stat.ps.instructions > >> 75671 -80.9% 14479 ą 4% perf-stat.ps.minor-faults > >> 5549722 -61.4% 2141750 ą 4% perf-stat.ps.node-loads > >> 8769156 +51.6% 13296579 ą 5% perf-stat.ps.node-stores > >> 110795 -82.0% 19977 ą 4% perf-stat.ps.page-faults > >> 6.482e+11 -50.7% 3.197e+11 ą 4% perf-stat.total.instructions > >> 0.00 ą 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab > >> 0.01 ą 18% +8373.1% 0.73 ą 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64 > >> 0.01 ą 16% +4600.0% 0.38 ą 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit > > > > More time spent in madvise and munmap. but I'm not sure whether this > > is caused by tearing down the address space when exiting the test. If > > so it should not count in the regression. > It's not for the whole address space tearing down. It's for pthread > stack tearing down when pthread exit (can be treated as address space > tearing down? I suppose so). > > https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384 > https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576 It explains the problem. The madvise() does have some extra overhead for handling THP (splitting pmd, deferred split queue, etc). > > Another thing is whether it's worthy to make stack use THP? It may be > useful for some apps which need large stack size? Kernel actually doesn't apply THP to stack (see vma_is_temporary_stack()). But kernel can't know whether the VMA is stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc doesn't set the proper flags to tell kernel the area is stack, kernel just treats it as normal anonymous area. So glibc should set up stack properly IMHO. > > > Regards > Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-20 20:09 ` Yang Shi @ 2023-12-21 0:26 ` Yang Shi 2023-12-21 0:58 ` Yin Fengwei 0 siblings, 1 reply; 24+ messages in thread From: Yang Shi @ 2023-12-21 0:26 UTC (permalink / raw) To: Yin Fengwei Cc: kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On Wed, Dec 20, 2023 at 12:09 PM Yang Shi <shy828301@gmail.com> wrote: > > On Wed, Dec 20, 2023 at 12:34 AM Yin Fengwei <fengwei.yin@intel.com> wrote: > > > > > > > > On 2023/12/20 13:27, Yang Shi wrote: > > > On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote: > > >> > > >> > > >> > > >> Hello, > > >> > > >> for this commit, we reported > > >> "[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression" > > >> in Aug, 2022 when it's in linux-next/master > > >> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/ > > >> > > >> later, we reported > > >> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression" > > >> in Oct, 2022 when it's in linus/master > > >> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/ > > >> > > >> and the commit was reverted finally by > > >> commit 0ba09b1733878afe838fe35c310715fda3d46428 > > >> Author: Linus Torvalds <torvalds@linux-foundation.org> > > >> Date: Sun Dec 4 12:51:59 2022 -0800 > > >> > > >> now we noticed it goes into linux-next/master again. > > >> > > >> we are not sure if there is an agreement that the benefit of this commit > > >> has already overweight performance drop in some mirco benchmark. > > >> > > >> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/ > > >> that > > >> "This patch was applied to v6.1, but was reverted due to a regression > > >> report. However it turned out the regression was not due to this patch. > > >> I ping'ed Andrew to reapply this patch, Andrew may forget it. This > > >> patch helps promote THP, so I rebased it onto the latest mm-unstable." > > > > > > IIRC, Huang Ying's analysis showed the regression for will-it-scale > > > micro benchmark is fine, it was actually reverted due to kernel build > > > regression with LLVM reported by Nathan Chancellor. Then the > > > regression was resolved by commit > > > 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out > > > if page in deferred queue already"). And this patch did improve kernel > > > build with GCC by ~3% if I remember correctly. > > > > > >> > > >> however, unfortunately, in our latest tests, we still observed below regression > > >> upon this commit. just FYI. > > >> > > >> > > >> > > >> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on: > > > > > > Interesting, wasn't the same regression seen last time? And I'm a > > > little bit confused about how pthread got regressed. I didn't see the > > > pthread benchmark do any intensive memory alloc/free operations. Do > > > the pthread APIs do any intensive memory operations? I saw the > > > benchmark does allocate memory for thread stack, but it should be just > > > 8K per thread, so it should not trigger what this patch does. With > > > 1024 threads, the thread stacks may get merged into one single VMA (8M > > > total), but it may do so even though the patch is not applied. > > stress-ng.pthread test code is strange here: > > > > https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573 > > > > Even it allocates its own stack, but that attr is not passed > > to pthread_create. So it's still glibc to allocate stack for > > pthread which is 8M size. This is why this patch can impact > > the stress-ng.pthread testing. > > Aha, nice catch, I overlooked that. > > > > > > > My understanding is this is different regression (if it's a valid > > regression). The previous hotspot was in: > > deferred_split_huge_page > > deferred_split_huge_page > > deferred_split_huge_page > > spin_lock > > > > while this time, the hotspot is in (pmd_lock from do_madvise I suppose): > > - 55.02% zap_pmd_range.isra.0 > > - 53.42% __split_huge_pmd > > - 51.74% _raw_spin_lock > > - 51.73% native_queued_spin_lock_slowpath > > + 3.03% asm_sysvec_call_function > > - 1.67% __split_huge_pmd_locked > > - 0.87% pmdp_invalidate > > + 0.86% flush_tlb_mm_range > > - 1.60% zap_pte_range > > - 1.04% page_remove_rmap > > 0.55% __mod_lruvec_page_state > > > > > > > > > >> > > >> > > >> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries") > > >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > >> > > >> testcase: stress-ng > > >> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory > > >> parameters: > > >> > > >> nr_threads: 1 > > >> disk: 1HDD > > >> testtime: 60s > > >> fs: ext4 > > >> class: os > > >> test: pthread > > >> cpufreq_governor: performance > > >> > > >> > > >> In addition to that, the commit also has significant impact on the following tests: > > >> > > >> +------------------+-----------------------------------------------------------------------------------------------+ > > >> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression | > > >> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory | > > >> | test parameters | array_size=50000000 | > > >> | | cpufreq_governor=performance | > > >> | | iterations=10x | > > >> | | loop=100 | > > >> | | nr_threads=25% | > > >> | | omp=true | > > >> +------------------+-----------------------------------------------------------------------------------------------+ > > >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression | > > >> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory | > > >> | test parameters | cpufreq_governor=performance | > > >> | | option_a=Average | > > >> | | option_b=Integer | > > >> | | test=ramspeed-1.4.3 | > > >> +------------------+-----------------------------------------------------------------------------------------------+ > > >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression | > > >> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory | > > >> | test parameters | cpufreq_governor=performance | > > >> | | option_a=Average | > > >> | | option_b=Floating Point | > > >> | | test=ramspeed-1.4.3 | > > >> +------------------+-----------------------------------------------------------------------------------------------+ > > >> > > >> > > >> If you fix the issue in a separate patch/commit (i.e. not just a new version of > > >> the same patch/commit), kindly add following tags > > >> | Reported-by: kernel test robot <oliver.sang@intel.com> > > >> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com > > >> > > >> > > >> Details are as below: > > >> --------------------------------------------------------------------------------------------------> > > >> > > >> > > >> The kernel config and materials to reproduce are available at: > > >> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com > > >> > > >> ========================================================================================= > > >> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > > >> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s > > >> > > >> commit: > > >> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()") > > >> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries") > > >> > > >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 > > >> ---------------- --------------------------- > > >> %stddev %change %stddev > > >> \ | \ > > >> 13405796 -65.5% 4620124 cpuidle..usage > > >> 8.00 +8.2% 8.66 ą 2% iostat.cpu.system > > >> 1.61 -60.6% 0.63 iostat.cpu.user > > >> 597.50 ą 14% -64.3% 213.50 ą 14% perf-c2c.DRAM.local > > >> 1882 ą 14% -74.7% 476.83 ą 7% perf-c2c.HITM.local > > >> 3768436 -12.9% 3283395 vmstat.memory.cache > > >> 355105 -75.7% 86344 ą 3% vmstat.system.cs > > >> 385435 -20.7% 305714 ą 3% vmstat.system.in > > >> 1.13 -0.2 0.88 mpstat.cpu.all.irq% > > >> 0.29 -0.2 0.10 ą 2% mpstat.cpu.all.soft% > > >> 6.76 ą 2% +1.1 7.88 ą 2% mpstat.cpu.all.sys% > > >> 1.62 -1.0 0.62 ą 2% mpstat.cpu.all.usr% > > >> 2234397 -84.3% 350161 ą 5% stress-ng.pthread.ops > > >> 37237 -84.3% 5834 ą 5% stress-ng.pthread.ops_per_sec > > >> 294706 ą 2% -68.0% 94191 ą 6% stress-ng.time.involuntary_context_switches > > >> 41442 ą 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size > > >> 4466457 -83.9% 717053 ą 5% stress-ng.time.minor_page_faults > > > > > > The larger RSS and fewer page faults are expected. > > > > > >> 243.33 +13.5% 276.17 ą 3% stress-ng.time.percent_of_cpu_this_job_got > > >> 131.64 +27.7% 168.11 ą 3% stress-ng.time.system_time > > >> 19.73 -82.1% 3.53 ą 4% stress-ng.time.user_time > > > > > > Much less user time. And it seems to match the drop of the pthread metric. > > > > > >> 7715609 -80.2% 1530125 ą 4% stress-ng.time.voluntary_context_switches > > >> 76728 -80.8% 14724 ą 4% perf-stat.i.minor-faults > > >> 5600408 -61.4% 2160997 ą 5% perf-stat.i.node-loads > > >> 8873996 +52.1% 13499744 ą 5% perf-stat.i.node-stores > > >> 112409 -81.9% 20305 ą 4% perf-stat.i.page-faults > > >> 2.55 +89.6% 4.83 perf-stat.overall.MPKI > > > > > > Much more TLB misses. > > > > > >> 1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate% > > >> 19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate% > > >> 1.70 +56.4% 2.65 perf-stat.overall.cpi > > >> 665.84 -17.5% 549.51 ą 2% perf-stat.overall.cycles-between-cache-misses > > >> 0.12 ą 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate% > > >> 0.08 ą 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate% > > >> 59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate% > > >> 1278 +86.1% 2379 ą 2% perf-stat.overall.instructions-per-iTLB-miss > > >> 0.59 -36.1% 0.38 perf-stat.overall.ipc > > > > > > Worse IPC and CPI. > > > > > >> 2.078e+09 -48.3% 1.074e+09 ą 4% perf-stat.ps.branch-instructions > > >> 31292687 -61.2% 12133349 ą 2% perf-stat.ps.branch-misses > > >> 26057291 -5.9% 24512034 ą 4% perf-stat.ps.cache-misses > > >> 1.353e+08 -58.6% 56072195 ą 4% perf-stat.ps.cache-references > > >> 365254 -75.8% 88464 ą 3% perf-stat.ps.context-switches > > >> 1.735e+10 -22.4% 1.346e+10 ą 2% perf-stat.ps.cpu-cycles > > >> 60838 -79.1% 12727 ą 6% perf-stat.ps.cpu-migrations > > >> 3056601 ą 4% -81.5% 565354 ą 4% perf-stat.ps.dTLB-load-misses > > >> 2.636e+09 -50.7% 1.3e+09 ą 4% perf-stat.ps.dTLB-loads > > >> 1155253 ą 2% -83.0% 196581 ą 5% perf-stat.ps.dTLB-store-misses > > >> 1.473e+09 -57.4% 6.268e+08 ą 3% perf-stat.ps.dTLB-stores > > >> 7997726 -73.3% 2131477 ą 3% perf-stat.ps.iTLB-load-misses > > >> 5521346 -74.3% 1418623 ą 2% perf-stat.ps.iTLB-loads > > >> 1.023e+10 -50.4% 5.073e+09 ą 4% perf-stat.ps.instructions > > >> 75671 -80.9% 14479 ą 4% perf-stat.ps.minor-faults > > >> 5549722 -61.4% 2141750 ą 4% perf-stat.ps.node-loads > > >> 8769156 +51.6% 13296579 ą 5% perf-stat.ps.node-stores > > >> 110795 -82.0% 19977 ą 4% perf-stat.ps.page-faults > > >> 6.482e+11 -50.7% 3.197e+11 ą 4% perf-stat.total.instructions > > >> 0.00 ą 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab > > >> 0.01 ą 18% +8373.1% 0.73 ą 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64 > > >> 0.01 ą 16% +4600.0% 0.38 ą 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit > > > > > > More time spent in madvise and munmap. but I'm not sure whether this > > > is caused by tearing down the address space when exiting the test. If > > > so it should not count in the regression. > > It's not for the whole address space tearing down. It's for pthread > > stack tearing down when pthread exit (can be treated as address space > > tearing down? I suppose so). > > > > https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384 > > https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576 > > It explains the problem. The madvise() does have some extra overhead > for handling THP (splitting pmd, deferred split queue, etc). > > > > > Another thing is whether it's worthy to make stack use THP? It may be > > useful for some apps which need large stack size? > > Kernel actually doesn't apply THP to stack (see > vma_is_temporary_stack()). But kernel can't know whether the VMA is > stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc > doesn't set the proper flags to tell kernel the area is stack, kernel > just treats it as normal anonymous area. So glibc should set up stack > properly IMHO. If I read the code correctly, nptl allocates stack by the below code: mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); See https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L563 The MAP_STACK is used, but it is a no-op on Linux. So the alternative is to make MAP_STACK useful on Linux instead of changing glibc. But the blast radius seems much wider. > > > > > > > Regards > > Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-21 0:26 ` Yang Shi @ 2023-12-21 0:58 ` Yin Fengwei 2023-12-21 1:02 ` Yin Fengwei ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Yin Fengwei @ 2023-12-21 0:58 UTC (permalink / raw) To: Yang Shi Cc: kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On 2023/12/21 08:26, Yang Shi wrote: > On Wed, Dec 20, 2023 at 12:09 PM Yang Shi <shy828301@gmail.com> wrote: >> >> On Wed, Dec 20, 2023 at 12:34 AM Yin Fengwei <fengwei.yin@intel.com> wrote: >>> >>> >>> >>> On 2023/12/20 13:27, Yang Shi wrote: >>>> On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote: >>>>> >>>>> >>>>> >>>>> Hello, >>>>> >>>>> for this commit, we reported >>>>> "[mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression" >>>>> in Aug, 2022 when it's in linux-next/master >>>>> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/ >>>>> >>>>> later, we reported >>>>> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression" >>>>> in Oct, 2022 when it's in linus/master >>>>> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/ >>>>> >>>>> and the commit was reverted finally by >>>>> commit 0ba09b1733878afe838fe35c310715fda3d46428 >>>>> Author: Linus Torvalds <torvalds@linux-foundation.org> >>>>> Date: Sun Dec 4 12:51:59 2022 -0800 >>>>> >>>>> now we noticed it goes into linux-next/master again. >>>>> >>>>> we are not sure if there is an agreement that the benefit of this commit >>>>> has already overweight performance drop in some mirco benchmark. >>>>> >>>>> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/ >>>>> that >>>>> "This patch was applied to v6.1, but was reverted due to a regression >>>>> report. However it turned out the regression was not due to this patch. >>>>> I ping'ed Andrew to reapply this patch, Andrew may forget it. This >>>>> patch helps promote THP, so I rebased it onto the latest mm-unstable." >>>> >>>> IIRC, Huang Ying's analysis showed the regression for will-it-scale >>>> micro benchmark is fine, it was actually reverted due to kernel build >>>> regression with LLVM reported by Nathan Chancellor. Then the >>>> regression was resolved by commit >>>> 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out >>>> if page in deferred queue already"). And this patch did improve kernel >>>> build with GCC by ~3% if I remember correctly. >>>> >>>>> >>>>> however, unfortunately, in our latest tests, we still observed below regression >>>>> upon this commit. just FYI. >>>>> >>>>> >>>>> >>>>> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on: >>>> >>>> Interesting, wasn't the same regression seen last time? And I'm a >>>> little bit confused about how pthread got regressed. I didn't see the >>>> pthread benchmark do any intensive memory alloc/free operations. Do >>>> the pthread APIs do any intensive memory operations? I saw the >>>> benchmark does allocate memory for thread stack, but it should be just >>>> 8K per thread, so it should not trigger what this patch does. With >>>> 1024 threads, the thread stacks may get merged into one single VMA (8M >>>> total), but it may do so even though the patch is not applied. >>> stress-ng.pthread test code is strange here: >>> >>> https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573 >>> >>> Even it allocates its own stack, but that attr is not passed >>> to pthread_create. So it's still glibc to allocate stack for >>> pthread which is 8M size. This is why this patch can impact >>> the stress-ng.pthread testing. >> >> Aha, nice catch, I overlooked that. >> >>> >>> >>> My understanding is this is different regression (if it's a valid >>> regression). The previous hotspot was in: >>> deferred_split_huge_page >>> deferred_split_huge_page >>> deferred_split_huge_page >>> spin_lock >>> >>> while this time, the hotspot is in (pmd_lock from do_madvise I suppose): >>> - 55.02% zap_pmd_range.isra.0 >>> - 53.42% __split_huge_pmd >>> - 51.74% _raw_spin_lock >>> - 51.73% native_queued_spin_lock_slowpath >>> + 3.03% asm_sysvec_call_function >>> - 1.67% __split_huge_pmd_locked >>> - 0.87% pmdp_invalidate >>> + 0.86% flush_tlb_mm_range >>> - 1.60% zap_pte_range >>> - 1.04% page_remove_rmap >>> 0.55% __mod_lruvec_page_state >>> >>> >>>> >>>>> >>>>> >>>>> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries") >>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master >>>>> >>>>> testcase: stress-ng >>>>> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory >>>>> parameters: >>>>> >>>>> nr_threads: 1 >>>>> disk: 1HDD >>>>> testtime: 60s >>>>> fs: ext4 >>>>> class: os >>>>> test: pthread >>>>> cpufreq_governor: performance >>>>> >>>>> >>>>> In addition to that, the commit also has significant impact on the following tests: >>>>> >>>>> +------------------+-----------------------------------------------------------------------------------------------+ >>>>> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression | >>>>> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory | >>>>> | test parameters | array_size=50000000 | >>>>> | | cpufreq_governor=performance | >>>>> | | iterations=10x | >>>>> | | loop=100 | >>>>> | | nr_threads=25% | >>>>> | | omp=true | >>>>> +------------------+-----------------------------------------------------------------------------------------------+ >>>>> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression | >>>>> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory | >>>>> | test parameters | cpufreq_governor=performance | >>>>> | | option_a=Average | >>>>> | | option_b=Integer | >>>>> | | test=ramspeed-1.4.3 | >>>>> +------------------+-----------------------------------------------------------------------------------------------+ >>>>> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression | >>>>> | test machine | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory | >>>>> | test parameters | cpufreq_governor=performance | >>>>> | | option_a=Average | >>>>> | | option_b=Floating Point | >>>>> | | test=ramspeed-1.4.3 | >>>>> +------------------+-----------------------------------------------------------------------------------------------+ >>>>> >>>>> >>>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of >>>>> the same patch/commit), kindly add following tags >>>>> | Reported-by: kernel test robot <oliver.sang@intel.com> >>>>> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com >>>>> >>>>> >>>>> Details are as below: >>>>> --------------------------------------------------------------------------------------------------> >>>>> >>>>> >>>>> The kernel config and materials to reproduce are available at: >>>>> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com >>>>> >>>>> ========================================================================================= >>>>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: >>>>> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s >>>>> >>>>> commit: >>>>> 30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()") >>>>> 1111d46b5c ("mm: align larger anonymous mappings on THP boundaries") >>>>> >>>>> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 >>>>> ---------------- --------------------------- >>>>> %stddev %change %stddev >>>>> \ | \ >>>>> 13405796 -65.5% 4620124 cpuidle..usage >>>>> 8.00 +8.2% 8.66 ą 2% iostat.cpu.system >>>>> 1.61 -60.6% 0.63 iostat.cpu.user >>>>> 597.50 ą 14% -64.3% 213.50 ą 14% perf-c2c.DRAM.local >>>>> 1882 ą 14% -74.7% 476.83 ą 7% perf-c2c.HITM.local >>>>> 3768436 -12.9% 3283395 vmstat.memory.cache >>>>> 355105 -75.7% 86344 ą 3% vmstat.system.cs >>>>> 385435 -20.7% 305714 ą 3% vmstat.system.in >>>>> 1.13 -0.2 0.88 mpstat.cpu.all.irq% >>>>> 0.29 -0.2 0.10 ą 2% mpstat.cpu.all.soft% >>>>> 6.76 ą 2% +1.1 7.88 ą 2% mpstat.cpu.all.sys% >>>>> 1.62 -1.0 0.62 ą 2% mpstat.cpu.all.usr% >>>>> 2234397 -84.3% 350161 ą 5% stress-ng.pthread.ops >>>>> 37237 -84.3% 5834 ą 5% stress-ng.pthread.ops_per_sec >>>>> 294706 ą 2% -68.0% 94191 ą 6% stress-ng.time.involuntary_context_switches >>>>> 41442 ą 2% +5023.4% 2123284 stress-ng.time.maximum_resident_set_size >>>>> 4466457 -83.9% 717053 ą 5% stress-ng.time.minor_page_faults >>>> >>>> The larger RSS and fewer page faults are expected. >>>> >>>>> 243.33 +13.5% 276.17 ą 3% stress-ng.time.percent_of_cpu_this_job_got >>>>> 131.64 +27.7% 168.11 ą 3% stress-ng.time.system_time >>>>> 19.73 -82.1% 3.53 ą 4% stress-ng.time.user_time >>>> >>>> Much less user time. And it seems to match the drop of the pthread metric. >>>> >>>>> 7715609 -80.2% 1530125 ą 4% stress-ng.time.voluntary_context_switches >>>>> 76728 -80.8% 14724 ą 4% perf-stat.i.minor-faults >>>>> 5600408 -61.4% 2160997 ą 5% perf-stat.i.node-loads >>>>> 8873996 +52.1% 13499744 ą 5% perf-stat.i.node-stores >>>>> 112409 -81.9% 20305 ą 4% perf-stat.i.page-faults >>>>> 2.55 +89.6% 4.83 perf-stat.overall.MPKI >>>> >>>> Much more TLB misses. >>>> >>>>> 1.51 -0.4 1.13 perf-stat.overall.branch-miss-rate% >>>>> 19.26 +24.5 43.71 perf-stat.overall.cache-miss-rate% >>>>> 1.70 +56.4% 2.65 perf-stat.overall.cpi >>>>> 665.84 -17.5% 549.51 ą 2% perf-stat.overall.cycles-between-cache-misses >>>>> 0.12 ą 4% -0.1 0.04 perf-stat.overall.dTLB-load-miss-rate% >>>>> 0.08 ą 2% -0.0 0.03 perf-stat.overall.dTLB-store-miss-rate% >>>>> 59.16 +0.9 60.04 perf-stat.overall.iTLB-load-miss-rate% >>>>> 1278 +86.1% 2379 ą 2% perf-stat.overall.instructions-per-iTLB-miss >>>>> 0.59 -36.1% 0.38 perf-stat.overall.ipc >>>> >>>> Worse IPC and CPI. >>>> >>>>> 2.078e+09 -48.3% 1.074e+09 ą 4% perf-stat.ps.branch-instructions >>>>> 31292687 -61.2% 12133349 ą 2% perf-stat.ps.branch-misses >>>>> 26057291 -5.9% 24512034 ą 4% perf-stat.ps.cache-misses >>>>> 1.353e+08 -58.6% 56072195 ą 4% perf-stat.ps.cache-references >>>>> 365254 -75.8% 88464 ą 3% perf-stat.ps.context-switches >>>>> 1.735e+10 -22.4% 1.346e+10 ą 2% perf-stat.ps.cpu-cycles >>>>> 60838 -79.1% 12727 ą 6% perf-stat.ps.cpu-migrations >>>>> 3056601 ą 4% -81.5% 565354 ą 4% perf-stat.ps.dTLB-load-misses >>>>> 2.636e+09 -50.7% 1.3e+09 ą 4% perf-stat.ps.dTLB-loads >>>>> 1155253 ą 2% -83.0% 196581 ą 5% perf-stat.ps.dTLB-store-misses >>>>> 1.473e+09 -57.4% 6.268e+08 ą 3% perf-stat.ps.dTLB-stores >>>>> 7997726 -73.3% 2131477 ą 3% perf-stat.ps.iTLB-load-misses >>>>> 5521346 -74.3% 1418623 ą 2% perf-stat.ps.iTLB-loads >>>>> 1.023e+10 -50.4% 5.073e+09 ą 4% perf-stat.ps.instructions >>>>> 75671 -80.9% 14479 ą 4% perf-stat.ps.minor-faults >>>>> 5549722 -61.4% 2141750 ą 4% perf-stat.ps.node-loads >>>>> 8769156 +51.6% 13296579 ą 5% perf-stat.ps.node-stores >>>>> 110795 -82.0% 19977 ą 4% perf-stat.ps.page-faults >>>>> 6.482e+11 -50.7% 3.197e+11 ą 4% perf-stat.total.instructions >>>>> 0.00 ą 37% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab >>>>> 0.01 ą 18% +8373.1% 0.73 ą 49% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64 >>>>> 0.01 ą 16% +4600.0% 0.38 ą 24% perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit >>>> >>>> More time spent in madvise and munmap. but I'm not sure whether this >>>> is caused by tearing down the address space when exiting the test. If >>>> so it should not count in the regression. >>> It's not for the whole address space tearing down. It's for pthread >>> stack tearing down when pthread exit (can be treated as address space >>> tearing down? I suppose so). >>> >>> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384 >>> https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576 >> >> It explains the problem. The madvise() does have some extra overhead >> for handling THP (splitting pmd, deferred split queue, etc). >> >>> >>> Another thing is whether it's worthy to make stack use THP? It may be >>> useful for some apps which need large stack size? >> >> Kernel actually doesn't apply THP to stack (see >> vma_is_temporary_stack()). But kernel can't know whether the VMA is >> stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc >> doesn't set the proper flags to tell kernel the area is stack, kernel >> just treats it as normal anonymous area. So glibc should set up stack >> properly IMHO. > > If I read the code correctly, nptl allocates stack by the below code: > > mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE, > MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); > > See https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L563 > > The MAP_STACK is used, but it is a no-op on Linux. So the alternative > is to make MAP_STACK useful on Linux instead of changing glibc. But > the blast radius seems much wider. Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to filter out of the MAP_STACK mapping based on this patch. The regression in stress-ng.pthread was gone. I suppose this is kind of safe because the madvise call is only applied to glibc allocated stack. But what I am not sure was whether it's worthy to do such kind of change as the regression only is seen obviously in micro-benchmark. No evidence showed the other regressionsin this report is related with madvise. At least from the perf statstics. Need to check more on stream/ramspeed. Thanks. Regards Yin, Fengwei > >> >>> >>> >>> Regards >>> Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-21 0:58 ` Yin Fengwei @ 2023-12-21 1:02 ` Yin Fengwei 2023-12-21 4:49 ` Matthew Wilcox 2023-12-21 13:39 ` Yin, Fengwei 2 siblings, 0 replies; 24+ messages in thread From: Yin Fengwei @ 2023-12-21 1:02 UTC (permalink / raw) To: Yang Shi Cc: kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang >>>> >>>> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384 >>>> https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576 >>> >>> It explains the problem. The madvise() does have some extra overhead >>> for handling THP (splitting pmd, deferred split queue, etc). >>> >>>> >>>> Another thing is whether it's worthy to make stack use THP? It may be >>>> useful for some apps which need large stack size? >>> >>> Kernel actually doesn't apply THP to stack (see >>> vma_is_temporary_stack()). But kernel can't know whether the VMA is >>> stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc >>> doesn't set the proper flags to tell kernel the area is stack, kernel >>> just treats it as normal anonymous area. So glibc should set up stack >>> properly IMHO. >> >> If I read the code correctly, nptl allocates stack by the below code: >> >> mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE, >> MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); >> >> See >> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L563 >> >> The MAP_STACK is used, but it is a no-op on Linux. So the alternative >> is to make MAP_STACK useful on Linux instead of changing glibc. But >> the blast radius seems much wider. > Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to > filter out of the MAP_STACK mapping based on this patch. The regression > in stress-ng.pthread was gone. I suppose this is kind of safe because > the madvise call is only applied to glibc allocated stack. The patch I tested against stress-ng.pthread: diff --git a/mm/mmap.c b/mm/mmap.c index b78e83d351d2..1fd510aef82e 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1829,7 +1829,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, */ pgoff = 0; get_area = shmem_get_unmapped_area; - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { + } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && + !(flags & MAP_STACK)) { /* Ensures that larger anonymous mappings are THP aligned. */ get_area = thp_get_unmapped_area; } > > > But what I am not sure was whether it's worthy to do such kind of change > as the regression only is seen obviously in micro-benchmark. No evidence > showed the other regressionsin this report is related with madvise. At > least from the perf statstics. Need to check more on stream/ramspeed. > Thanks. > > > Regards > Yin, Fengwei > >> >>> >>>> >>>> >>>> Regards >>>> Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-21 0:58 ` Yin Fengwei 2023-12-21 1:02 ` Yin Fengwei @ 2023-12-21 4:49 ` Matthew Wilcox 2023-12-21 4:58 ` Yin Fengwei 2023-12-21 18:07 ` Yang Shi 2023-12-21 13:39 ` Yin, Fengwei 2 siblings, 2 replies; 24+ messages in thread From: Matthew Wilcox @ 2023-12-21 4:49 UTC (permalink / raw) To: Yin Fengwei Cc: Yang Shi, kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Christopher Lameter, ying.huang, feng.tang On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote: > Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to > filter out of the MAP_STACK mapping based on this patch. The regression > in stress-ng.pthread was gone. I suppose this is kind of safe because > the madvise call is only applied to glibc allocated stack. > > > But what I am not sure was whether it's worthy to do such kind of change > as the regression only is seen obviously in micro-benchmark. No evidence > showed the other regressionsin this report is related with madvise. At > least from the perf statstics. Need to check more on stream/ramspeed. FWIW, we had a customer report a significant performance problem when inadvertently using 2MB pages for stacks. They were able to avoid it by using 2044KiB sized stacks ... ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-21 4:49 ` Matthew Wilcox @ 2023-12-21 4:58 ` Yin Fengwei 2023-12-21 18:07 ` Yang Shi 1 sibling, 0 replies; 24+ messages in thread From: Yin Fengwei @ 2023-12-21 4:58 UTC (permalink / raw) To: Matthew Wilcox Cc: Yang Shi, kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Christopher Lameter, ying.huang, feng.tang On 2023/12/21 12:49, Matthew Wilcox wrote: > On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote: >> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to >> filter out of the MAP_STACK mapping based on this patch. The regression >> in stress-ng.pthread was gone. I suppose this is kind of safe because >> the madvise call is only applied to glibc allocated stack. >> >> >> But what I am not sure was whether it's worthy to do such kind of change >> as the regression only is seen obviously in micro-benchmark. No evidence >> showed the other regressionsin this report is related with madvise. At >> least from the perf statstics. Need to check more on stream/ramspeed. > > FWIW, we had a customer report a significant performance problem when > inadvertently using 2MB pages for stacks. They were able to avoid it by > using 2044KiB sized stacks ... Looks like related with this regression. So we may need to consider avoiding THP for stack. Regards Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-21 4:49 ` Matthew Wilcox 2023-12-21 4:58 ` Yin Fengwei @ 2023-12-21 18:07 ` Yang Shi 2023-12-21 18:14 ` Matthew Wilcox 1 sibling, 1 reply; 24+ messages in thread From: Yang Shi @ 2023-12-21 18:07 UTC (permalink / raw) To: Matthew Wilcox Cc: Yin Fengwei, kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Christopher Lameter, ying.huang, feng.tang On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote: > > Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to > > filter out of the MAP_STACK mapping based on this patch. The regression > > in stress-ng.pthread was gone. I suppose this is kind of safe because > > the madvise call is only applied to glibc allocated stack. > > > > > > But what I am not sure was whether it's worthy to do such kind of change > > as the regression only is seen obviously in micro-benchmark. No evidence > > showed the other regressionsin this report is related with madvise. At > > least from the perf statstics. Need to check more on stream/ramspeed. > > FWIW, we had a customer report a significant performance problem when > inadvertently using 2MB pages for stacks. They were able to avoid it by > using 2044KiB sized stacks ... Thanks for the report. This provided more justification regarding honoring MAP_STACK on Linux. Some applications, for example, pthread, just allocate a fixed size area for stack. This confuses kernel because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP. But I'm still a little confused by why THP for stack could result in significant performance problems. Unless the applications resize the stack quite often. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-21 18:07 ` Yang Shi @ 2023-12-21 18:14 ` Matthew Wilcox 2023-12-22 1:06 ` Yin, Fengwei 0 siblings, 1 reply; 24+ messages in thread From: Matthew Wilcox @ 2023-12-21 18:14 UTC (permalink / raw) To: Yang Shi Cc: Yin Fengwei, kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Christopher Lameter, ying.huang, feng.tang On Thu, Dec 21, 2023 at 10:07:09AM -0800, Yang Shi wrote: > On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote: > > > Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to > > > filter out of the MAP_STACK mapping based on this patch. The regression > > > in stress-ng.pthread was gone. I suppose this is kind of safe because > > > the madvise call is only applied to glibc allocated stack. > > > > > > > > > But what I am not sure was whether it's worthy to do such kind of change > > > as the regression only is seen obviously in micro-benchmark. No evidence > > > showed the other regressionsin this report is related with madvise. At > > > least from the perf statstics. Need to check more on stream/ramspeed. > > > > FWIW, we had a customer report a significant performance problem when > > inadvertently using 2MB pages for stacks. They were able to avoid it by > > using 2044KiB sized stacks ... > > Thanks for the report. This provided more justification regarding > honoring MAP_STACK on Linux. Some applications, for example, pthread, > just allocate a fixed size area for stack. This confuses kernel > because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP. > > But I'm still a little confused by why THP for stack could result in > significant performance problems. Unless the applications resize the > stack quite often. We didn't delve into what was causing the problem, only that it was happening. The application had many threads, so it could have been as simple as consuming all the available THP and leaving fewer available for other uses. Or it could have been a memory consumption problem; maybe the app would only have been using 16-32kB per thread but was now using 2MB per thread and if there were, say, 100 threads, that's an extra 199MB of memory in use. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-21 18:14 ` Matthew Wilcox @ 2023-12-22 1:06 ` Yin, Fengwei 2023-12-22 2:23 ` Huang, Ying 0 siblings, 1 reply; 24+ messages in thread From: Yin, Fengwei @ 2023-12-22 1:06 UTC (permalink / raw) To: Matthew Wilcox, Yang Shi Cc: kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Christopher Lameter, ying.huang, feng.tang On 12/22/2023 2:14 AM, Matthew Wilcox wrote: > On Thu, Dec 21, 2023 at 10:07:09AM -0800, Yang Shi wrote: >> On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote: >>> >>> On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote: >>>> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to >>>> filter out of the MAP_STACK mapping based on this patch. The regression >>>> in stress-ng.pthread was gone. I suppose this is kind of safe because >>>> the madvise call is only applied to glibc allocated stack. >>>> >>>> >>>> But what I am not sure was whether it's worthy to do such kind of change >>>> as the regression only is seen obviously in micro-benchmark. No evidence >>>> showed the other regressionsin this report is related with madvise. At >>>> least from the perf statstics. Need to check more on stream/ramspeed. >>> >>> FWIW, we had a customer report a significant performance problem when >>> inadvertently using 2MB pages for stacks. They were able to avoid it by >>> using 2044KiB sized stacks ... >> >> Thanks for the report. This provided more justification regarding >> honoring MAP_STACK on Linux. Some applications, for example, pthread, >> just allocate a fixed size area for stack. This confuses kernel >> because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP. >> >> But I'm still a little confused by why THP for stack could result in >> significant performance problems. Unless the applications resize the >> stack quite often. > > We didn't delve into what was causing the problem, only that it was > happening. The application had many threads, so it could have been as > simple as consuming all the available THP and leaving fewer available > for other uses. Or it could have been a memory consumption problem; > maybe the app would only have been using 16-32kB per thread but was > now using 2MB per thread and if there were, say, 100 threads, that's an > extra 199MB of memory in use. One thing I know is related with the memory zeroing. This is from the perf data in this report: 0.00 +16.7 16.69 ± 7% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault Zeroing 2M memory costs much more CPU than zeroing 16-32KB memory if there are many threads. Regards Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-22 1:06 ` Yin, Fengwei @ 2023-12-22 2:23 ` Huang, Ying 0 siblings, 0 replies; 24+ messages in thread From: Huang, Ying @ 2023-12-22 2:23 UTC (permalink / raw) To: Yin, Fengwei Cc: Matthew Wilcox, Yang Shi, kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Christopher Lameter, feng.tang "Yin, Fengwei" <fengwei.yin@intel.com> writes: > On 12/22/2023 2:14 AM, Matthew Wilcox wrote: >> On Thu, Dec 21, 2023 at 10:07:09AM -0800, Yang Shi wrote: >>> On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote: >>>> >>>> On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote: >>>>> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to >>>>> filter out of the MAP_STACK mapping based on this patch. The regression >>>>> in stress-ng.pthread was gone. I suppose this is kind of safe because >>>>> the madvise call is only applied to glibc allocated stack. >>>>> >>>>> >>>>> But what I am not sure was whether it's worthy to do such kind of change >>>>> as the regression only is seen obviously in micro-benchmark. No evidence >>>>> showed the other regressionsin this report is related with madvise. At >>>>> least from the perf statstics. Need to check more on stream/ramspeed. >>>> >>>> FWIW, we had a customer report a significant performance problem when >>>> inadvertently using 2MB pages for stacks. They were able to avoid it by >>>> using 2044KiB sized stacks ... >>> >>> Thanks for the report. This provided more justification regarding >>> honoring MAP_STACK on Linux. Some applications, for example, pthread, >>> just allocate a fixed size area for stack. This confuses kernel >>> because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP. >>> >>> But I'm still a little confused by why THP for stack could result in >>> significant performance problems. Unless the applications resize the >>> stack quite often. >> We didn't delve into what was causing the problem, only that it was >> happening. The application had many threads, so it could have been as >> simple as consuming all the available THP and leaving fewer available >> for other uses. Or it could have been a memory consumption problem; >> maybe the app would only have been using 16-32kB per thread but was >> now using 2MB per thread and if there were, say, 100 threads, that's an >> extra 199MB of memory in use. > One thing I know is related with the memory zeroing. This is from > the perf data in this report: > > 0.00 +16.7 16.69 ± 7% > perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault > > Zeroing 2M memory costs much more CPU than zeroing 16-32KB memory if > there are many threads. Using 2M stack may hurt performance of short-live threads with shallow stack depth. Imagine a network server which creates a new thread for each incoming connection. I understand that the performance will not be great in this way anyway. IIUC we should not make it too bad. But, whether this is import depends on whether the use case is important. TBH, I don't know that. -- Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-21 0:58 ` Yin Fengwei 2023-12-21 1:02 ` Yin Fengwei 2023-12-21 4:49 ` Matthew Wilcox @ 2023-12-21 13:39 ` Yin, Fengwei 2023-12-21 18:11 ` Yang Shi 2 siblings, 1 reply; 24+ messages in thread From: Yin, Fengwei @ 2023-12-21 13:39 UTC (permalink / raw) To: Yang Shi Cc: kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On 12/21/2023 8:58 AM, Yin Fengwei wrote: > But what I am not sure was whether it's worthy to do such kind of change > as the regression only is seen obviously in micro-benchmark. No evidence > showed the other regressionsin this report is related with madvise. At > least from the perf statstics. Need to check more on stream/ramspeed. > Thanks. With debugging patch (filter out the stack mapping from THP aligned), the result of stream can be restored to around 2%: commit: 30749e6fbb3d391a7939ac347e9612afe8c26e94 1111d46b5cbad57486e7a3fab75888accac2f072 89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with debugging patch 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589 ---------------- --------------------------- --------------------------- 350993 -15.6% 296081 ± 2% -1.5% 345689 stream.add_bandwidth_MBps 349830 -16.1% 293492 ± 2% -2.3% 341860 ± 2% stream.add_bandwidth_MBps_harmonicMean 333973 -20.5% 265439 ± 3% -1.7% 328403 stream.copy_bandwidth_MBps 332930 -21.7% 260548 ± 3% -2.5% 324711 ± 2% stream.copy_bandwidth_MBps_harmonicMean 302788 -16.2% 253817 ± 2% -1.4% 298421 stream.scale_bandwidth_MBps 302157 -17.1% 250577 ± 2% -2.0% 296054 stream.scale_bandwidth_MBps_harmonicMean 339047 -12.1% 298061 -1.4% 334206 stream.triad_bandwidth_MBps 338186 -12.4% 296218 -2.0% 331469 stream.triad_bandwidth_MBps_harmonicMean The regression of ramspeed is still there. Regards Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-21 13:39 ` Yin, Fengwei @ 2023-12-21 18:11 ` Yang Shi 2023-12-22 1:13 ` Yin, Fengwei 0 siblings, 1 reply; 24+ messages in thread From: Yang Shi @ 2023-12-21 18:11 UTC (permalink / raw) To: Yin, Fengwei Cc: kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote: > > > > On 12/21/2023 8:58 AM, Yin Fengwei wrote: > > But what I am not sure was whether it's worthy to do such kind of change > > as the regression only is seen obviously in micro-benchmark. No evidence > > showed the other regressionsin this report is related with madvise. At > > least from the perf statstics. Need to check more on stream/ramspeed. > > Thanks. > > With debugging patch (filter out the stack mapping from THP aligned), > the result of stream can be restored to around 2%: > > commit: > 30749e6fbb3d391a7939ac347e9612afe8c26e94 > 1111d46b5cbad57486e7a3fab75888accac2f072 > 89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with > debugging patch > > 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589 > ---------------- --------------------------- --------------------------- > 350993 -15.6% 296081 ± 2% -1.5% 345689 > stream.add_bandwidth_MBps > 349830 -16.1% 293492 ± 2% -2.3% 341860 ± > 2% stream.add_bandwidth_MBps_harmonicMean > 333973 -20.5% 265439 ± 3% -1.7% 328403 > stream.copy_bandwidth_MBps > 332930 -21.7% 260548 ± 3% -2.5% 324711 ± > 2% stream.copy_bandwidth_MBps_harmonicMean > 302788 -16.2% 253817 ± 2% -1.4% 298421 > stream.scale_bandwidth_MBps > 302157 -17.1% 250577 ± 2% -2.0% 296054 > stream.scale_bandwidth_MBps_harmonicMean > 339047 -12.1% 298061 -1.4% 334206 > stream.triad_bandwidth_MBps > 338186 -12.4% 296218 -2.0% 331469 > stream.triad_bandwidth_MBps_harmonicMean > > > The regression of ramspeed is still there. Thanks for the debugging patch and the test. If no one has objection to honor MAP_STACK, I'm going to come up with a more formal patch. Even though thp_get_unmapped_area() is not called for MAP_STACK, stack area still may be allocated at 2M aligned address theoretically. And it may be worse with multi-sized THP, for 1M. Do you have any instructions regarding how to run ramspeed? Anyway I may not have time debug it until after holidays. > > > Regards > Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-21 18:11 ` Yang Shi @ 2023-12-22 1:13 ` Yin, Fengwei 2024-01-04 1:32 ` Yang Shi 0 siblings, 1 reply; 24+ messages in thread From: Yin, Fengwei @ 2023-12-22 1:13 UTC (permalink / raw) To: Yang Shi Cc: kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On 12/22/2023 2:11 AM, Yang Shi wrote: > On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote: >> >> >> >> On 12/21/2023 8:58 AM, Yin Fengwei wrote: >>> But what I am not sure was whether it's worthy to do such kind of change >>> as the regression only is seen obviously in micro-benchmark. No evidence >>> showed the other regressionsin this report is related with madvise. At >>> least from the perf statstics. Need to check more on stream/ramspeed. >>> Thanks. >> >> With debugging patch (filter out the stack mapping from THP aligned), >> the result of stream can be restored to around 2%: >> >> commit: >> 30749e6fbb3d391a7939ac347e9612afe8c26e94 >> 1111d46b5cbad57486e7a3fab75888accac2f072 >> 89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with >> debugging patch >> >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589 >> ---------------- --------------------------- --------------------------- >> 350993 -15.6% 296081 ± 2% -1.5% 345689 >> stream.add_bandwidth_MBps >> 349830 -16.1% 293492 ± 2% -2.3% 341860 ± >> 2% stream.add_bandwidth_MBps_harmonicMean >> 333973 -20.5% 265439 ± 3% -1.7% 328403 >> stream.copy_bandwidth_MBps >> 332930 -21.7% 260548 ± 3% -2.5% 324711 ± >> 2% stream.copy_bandwidth_MBps_harmonicMean >> 302788 -16.2% 253817 ± 2% -1.4% 298421 >> stream.scale_bandwidth_MBps >> 302157 -17.1% 250577 ± 2% -2.0% 296054 >> stream.scale_bandwidth_MBps_harmonicMean >> 339047 -12.1% 298061 -1.4% 334206 >> stream.triad_bandwidth_MBps >> 338186 -12.4% 296218 -2.0% 331469 >> stream.triad_bandwidth_MBps_harmonicMean >> >> >> The regression of ramspeed is still there. > > Thanks for the debugging patch and the test. If no one has objection > to honor MAP_STACK, I'm going to come up with a more formal patch. > Even though thp_get_unmapped_area() is not called for MAP_STACK, stack > area still may be allocated at 2M aligned address theoretically. And > it may be worse with multi-sized THP, for 1M. Right. Filtering out MAP_STACK can't make sure no THP for stack. Just reduce the possibility of using THP for stack. > > Do you have any instructions regarding how to run ramspeed? Anyway I > may not have time debug it until after holidays. 0Day leverages phoronix-test-suite to run ramspeed. So I don't have direct answer here. I suppose we could check the configuration of ramspeed in phoronix-test- suite to understand what's the build options and command options to run ramspeed: https://openbenchmarking.org/test/pts/ramspeed Regards Yin, Fengwei > >> >> >> Regards >> Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2023-12-22 1:13 ` Yin, Fengwei @ 2024-01-04 1:32 ` Yang Shi 2024-01-04 8:18 ` Yin Fengwei 0 siblings, 1 reply; 24+ messages in thread From: Yang Shi @ 2024-01-04 1:32 UTC (permalink / raw) To: Yin, Fengwei Cc: kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On Thu, Dec 21, 2023 at 5:13 PM Yin, Fengwei <fengwei.yin@intel.com> wrote: > > > > On 12/22/2023 2:11 AM, Yang Shi wrote: > > On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote: > >> > >> > >> > >> On 12/21/2023 8:58 AM, Yin Fengwei wrote: > >>> But what I am not sure was whether it's worthy to do such kind of change > >>> as the regression only is seen obviously in micro-benchmark. No evidence > >>> showed the other regressionsin this report is related with madvise. At > >>> least from the perf statstics. Need to check more on stream/ramspeed. > >>> Thanks. > >> > >> With debugging patch (filter out the stack mapping from THP aligned), > >> the result of stream can be restored to around 2%: > >> > >> commit: > >> 30749e6fbb3d391a7939ac347e9612afe8c26e94 > >> 1111d46b5cbad57486e7a3fab75888accac2f072 > >> 89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with > >> debugging patch > >> > >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589 > >> ---------------- --------------------------- --------------------------- > >> 350993 -15.6% 296081 ± 2% -1.5% 345689 > >> stream.add_bandwidth_MBps > >> 349830 -16.1% 293492 ± 2% -2.3% 341860 ± > >> 2% stream.add_bandwidth_MBps_harmonicMean > >> 333973 -20.5% 265439 ± 3% -1.7% 328403 > >> stream.copy_bandwidth_MBps > >> 332930 -21.7% 260548 ± 3% -2.5% 324711 ± > >> 2% stream.copy_bandwidth_MBps_harmonicMean > >> 302788 -16.2% 253817 ± 2% -1.4% 298421 > >> stream.scale_bandwidth_MBps > >> 302157 -17.1% 250577 ± 2% -2.0% 296054 > >> stream.scale_bandwidth_MBps_harmonicMean > >> 339047 -12.1% 298061 -1.4% 334206 > >> stream.triad_bandwidth_MBps > >> 338186 -12.4% 296218 -2.0% 331469 > >> stream.triad_bandwidth_MBps_harmonicMean > >> > >> > >> The regression of ramspeed is still there. > > > > Thanks for the debugging patch and the test. If no one has objection > > to honor MAP_STACK, I'm going to come up with a more formal patch. > > Even though thp_get_unmapped_area() is not called for MAP_STACK, stack > > area still may be allocated at 2M aligned address theoretically. And > > it may be worse with multi-sized THP, for 1M. > Right. Filtering out MAP_STACK can't make sure no THP for stack. Just > reduce the possibility of using THP for stack. Can you please help test the below patch? diff --git a/include/linux/mman.h b/include/linux/mman.h index 40d94411d492..dc7048824be8 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | arch_calc_vm_flag_bits(flags); } But I can't reproduce the pthread regression on my aarch64 VM. It might be due to the guard stack (the 64K guard stack is at 2M aligned, the 8M stack is right next to it which starts at 2M + 64K). But I can see the stack area is not THP eligible anymore with this patch. See: fffd18e10000-fffd19610000 rw-p 00000000 00:00 0 Size: 8192 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 12 kB Pss: 12 kB Pss_Dirty: 12 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 12 kB Referenced: 12 kB Anonymous: 12 kB KSM: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd wr mr mw me ac nh The "nh" flag is set. > > > > > Do you have any instructions regarding how to run ramspeed? Anyway I > > may not have time debug it until after holidays. > 0Day leverages phoronix-test-suite to run ramspeed. So I don't have > direct answer here. > > I suppose we could check the configuration of ramspeed in phoronix-test- > suite to understand what's the build options and command options to run > ramspeed: > https://openbenchmarking.org/test/pts/ramspeed Downloaded the test suite. It looks phronix just runs test 3 (int) and 6 (float). They basically does 4 sub tests to benchmark memory bandwidth: * copy * scale copy * add copy * triad copy The source buffer is initialized (page fault is triggered), but the destination area is not. So the page fault + page clear time is accounted to the result. Clearing huge page may take a little bit more time. But I didn't see noticeable regression on my aarch64 VM either. Anyway I'm supposed such test should be run with THP off. > > > Regards > Yin, Fengwei > > > > >> > >> > >> Regards > >> Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2024-01-04 1:32 ` Yang Shi @ 2024-01-04 8:18 ` Yin Fengwei 2024-01-04 8:39 ` Oliver Sang 0 siblings, 1 reply; 24+ messages in thread From: Yin Fengwei @ 2024-01-04 8:18 UTC (permalink / raw) To: Yang Shi Cc: kernel test robot, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On 2024/1/4 09:32, Yang Shi wrote: > On Thu, Dec 21, 2023 at 5:13 PM Yin, Fengwei <fengwei.yin@intel.com> wrote: >> >> >> >> On 12/22/2023 2:11 AM, Yang Shi wrote: >>> On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote: >>>> >>>> >>>> >>>> On 12/21/2023 8:58 AM, Yin Fengwei wrote: >>>>> But what I am not sure was whether it's worthy to do such kind of change >>>>> as the regression only is seen obviously in micro-benchmark. No evidence >>>>> showed the other regressionsin this report is related with madvise. At >>>>> least from the perf statstics. Need to check more on stream/ramspeed. >>>>> Thanks. >>>> >>>> With debugging patch (filter out the stack mapping from THP aligned), >>>> the result of stream can be restored to around 2%: >>>> >>>> commit: >>>> 30749e6fbb3d391a7939ac347e9612afe8c26e94 >>>> 1111d46b5cbad57486e7a3fab75888accac2f072 >>>> 89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with >>>> debugging patch >>>> >>>> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589 >>>> ---------------- --------------------------- --------------------------- >>>> 350993 -15.6% 296081 ± 2% -1.5% 345689 >>>> stream.add_bandwidth_MBps >>>> 349830 -16.1% 293492 ± 2% -2.3% 341860 ± >>>> 2% stream.add_bandwidth_MBps_harmonicMean >>>> 333973 -20.5% 265439 ± 3% -1.7% 328403 >>>> stream.copy_bandwidth_MBps >>>> 332930 -21.7% 260548 ± 3% -2.5% 324711 ± >>>> 2% stream.copy_bandwidth_MBps_harmonicMean >>>> 302788 -16.2% 253817 ± 2% -1.4% 298421 >>>> stream.scale_bandwidth_MBps >>>> 302157 -17.1% 250577 ± 2% -2.0% 296054 >>>> stream.scale_bandwidth_MBps_harmonicMean >>>> 339047 -12.1% 298061 -1.4% 334206 >>>> stream.triad_bandwidth_MBps >>>> 338186 -12.4% 296218 -2.0% 331469 >>>> stream.triad_bandwidth_MBps_harmonicMean >>>> >>>> >>>> The regression of ramspeed is still there. >>> >>> Thanks for the debugging patch and the test. If no one has objection >>> to honor MAP_STACK, I'm going to come up with a more formal patch. >>> Even though thp_get_unmapped_area() is not called for MAP_STACK, stack >>> area still may be allocated at 2M aligned address theoretically. And >>> it may be worse with multi-sized THP, for 1M. >> Right. Filtering out MAP_STACK can't make sure no THP for stack. Just >> reduce the possibility of using THP for stack. > > Can you please help test the below patch? I can't access the testing box now. Oliver will help to test your patch. Regards Yin, Fengwei > > diff --git a/include/linux/mman.h b/include/linux/mman.h > index 40d94411d492..dc7048824be8 100644 > --- a/include/linux/mman.h > +++ b/include/linux/mman.h > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > arch_calc_vm_flag_bits(flags); > } > > But I can't reproduce the pthread regression on my aarch64 VM. It > might be due to the guard stack (the 64K guard stack is at 2M aligned, > the 8M stack is right next to it which starts at 2M + 64K). But I can > see the stack area is not THP eligible anymore with this patch. See: > > fffd18e10000-fffd19610000 rw-p 00000000 00:00 0 > Size: 8192 kB > KernelPageSize: 4 kB > MMUPageSize: 4 kB > Rss: 12 kB > Pss: 12 kB > Pss_Dirty: 12 kB > Shared_Clean: 0 kB > Shared_Dirty: 0 kB > Private_Clean: 0 kB > Private_Dirty: 12 kB > Referenced: 12 kB > Anonymous: 12 kB > KSM: 0 kB > LazyFree: 0 kB > AnonHugePages: 0 kB > ShmemPmdMapped: 0 kB > FilePmdMapped: 0 kB > Shared_Hugetlb: 0 kB > Private_Hugetlb: 0 kB > Swap: 0 kB > SwapPss: 0 kB > Locked: 0 kB > THPeligible: 0 > VmFlags: rd wr mr mw me ac nh > > The "nh" flag is set. > >> >>> >>> Do you have any instructions regarding how to run ramspeed? Anyway I >>> may not have time debug it until after holidays. >> 0Day leverages phoronix-test-suite to run ramspeed. So I don't have >> direct answer here. >> >> I suppose we could check the configuration of ramspeed in phoronix-test- >> suite to understand what's the build options and command options to run >> ramspeed: >> https://openbenchmarking.org/test/pts/ramspeed > > Downloaded the test suite. It looks phronix just runs test 3 (int) and > 6 (float). They basically does 4 sub tests to benchmark memory > bandwidth: > > * copy > * scale copy > * add copy > * triad copy > > The source buffer is initialized (page fault is triggered), but the > destination area is not. So the page fault + page clear time is > accounted to the result. Clearing huge page may take a little bit more > time. But I didn't see noticeable regression on my aarch64 VM either. > Anyway I'm supposed such test should be run with THP off. > >> >> >> Regards >> Yin, Fengwei >> >>> >>>> >>>> >>>> Regards >>>> Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2024-01-04 8:18 ` Yin Fengwei @ 2024-01-04 8:39 ` Oliver Sang 2024-01-05 9:29 ` Oliver Sang 0 siblings, 1 reply; 24+ messages in thread From: Oliver Sang @ 2024-01-04 8:39 UTC (permalink / raw) To: Yin Fengwei Cc: Yang Shi, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang, oliver.sang hi, Fengwei, hi, Yang Shi, On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote: > > On 2024/1/4 09:32, Yang Shi wrote: ... > > Can you please help test the below patch? > I can't access the testing box now. Oliver will help to test your patch. > since now the commit-id of 'mm: align larger anonymous mappings on THP boundaries' in linux-next/master is efa7df3e3bb5d I applied the patch like below: * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression so far, I will test d8d7b1dae6f03 for all these tests. Thanks commit d8d7b1dae6f0311d528b289cda7b317520f9a984 Author: 0day robot <lkp@intel.com> Date: Thu Jan 4 12:51:10 2024 +0800 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi diff --git a/include/linux/mman.h b/include/linux/mman.h index 40d94411d4920..91197bd387730 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | arch_calc_vm_flag_bits(flags); } > > Regards > Yin, Fengwei > > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h > > index 40d94411d492..dc7048824be8 100644 > > --- a/include/linux/mman.h > > +++ b/include/linux/mman.h > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > > arch_calc_vm_flag_bits(flags); > > } > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2024-01-04 8:39 ` Oliver Sang @ 2024-01-05 9:29 ` Oliver Sang 2024-01-05 14:52 ` Yin, Fengwei 2024-01-05 18:49 ` Yang Shi 0 siblings, 2 replies; 24+ messages in thread From: Oliver Sang @ 2024-01-05 9:29 UTC (permalink / raw) To: Yang Shi Cc: Yin Fengwei, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang, oliver.sang [-- Attachment #1: Type: text/plain, Size: 16841 bytes --] hi, Yang Shi, On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote: > hi, Fengwei, hi, Yang Shi, > > On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote: > > > > On 2024/1/4 09:32, Yang Shi wrote: > > ... > > > > Can you please help test the below patch? > > I can't access the testing box now. Oliver will help to test your patch. > > > > since now the commit-id of > 'mm: align larger anonymous mappings on THP boundaries' > in linux-next/master is efa7df3e3bb5d > I applied the patch like below: > > * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi > * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries > * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi > > our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression > so far, I will test d8d7b1dae6f03 for all these tests. Thanks > we got 12 regressions and 1 improvement results for efa7df3e3b so far. (4 regressions are just similar to what we reported for 1111d46b5c). by your patch, 6 of those regressions are fixed, others are not impacted. below is a summary: No. testsuite test status-on-efa7df3e3b fix-by-d8d7b1dae6 ? === ========= ==== ==================== =================== (1) stress-ng numa regression NO (2) pthread regression yes (on a Ice Lake server) (3) pthread regression yes (on a Cascade Lake desktop) (4) will-it-scale malloc1 regression NO (5) page_fault1 improvement no (so still improvement) (6) vm-scalability anon-w-seq-mt regression yes (7) stream nr_threads=25% regression yes (8) nr_threads=50% regression yes (9) phoronix osbench.CreateThreads regression yes (on a Cascade Lake server) (10) ramspeed.Add.Integer regression NO (and below 3, on a Coffee Lake desktop) (11) ramspeed.Average.FloatingPoint regression NO (12) ramspeed.Triad.Integer regression NO (13) ramspeed.Average.Integer regression NO below are details, for those regressions not fixed by d8d7b1dae6, attached full comparison. (1) detail comparison is attached as 'stress-ng-regression' Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G ========================================================================================= class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 251.12 -48.2% 130.00 -47.9% 130.75 stress-ng.numa.ops 4.10 -49.4% 2.08 -49.2% 2.09 stress-ng.numa.ops_per_sec (2) Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G ========================================================================================= class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/pthread/stress-ng/60s 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 3272223 -87.8% 400430 +0.5% 3287322 stress-ng.pthread.ops 54516 -87.8% 6664 +0.5% 54772 stress-ng.pthread.ops_per_sec (3) Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with memory: 128G ========================================================================================= class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 2250845 -85.2% 332370 ± 6% -0.8% 2232820 stress-ng.pthread.ops 37510 -85.2% 5538 ± 6% -0.8% 37209 stress-ng.pthread.ops_per_sec (4) full comparison attached as 'will-it-scale-regression' Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 10994 -86.7% 1466 -86.7% 1460 will-it-scale.per_process_ops 1231431 -86.7% 164315 -86.7% 163624 will-it-scale.workload (5) Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault1/will-it-scale 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.224.threads 56.06 +13.3% 63.53 +13.8% 63.81 will-it-scale.224.threads_idle 84191 +44.8% 121869 +44.9% 122010 will-it-scale.per_thread_ops 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.workload (6) Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 345968 -6.5% 323566 +0.1% 346304 vm-scalability.median 1.91 ± 10% -0.5 1.38 ± 20% -0.2 1.75 ± 13% vm-scalability.median_stddev% 79708409 -7.4% 73839640 -0.1% 79613742 vm-scalability.throughput (7) Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G ========================================================================================= array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase: 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 349414 -16.2% 292854 ± 2% -0.4% 348048 stream.add_bandwidth_MBps 347727 ± 2% -16.5% 290470 ± 2% -0.6% 345750 ± 2% stream.add_bandwidth_MBps_harmonicMean 332206 -21.6% 260428 ± 3% -0.4% 330838 stream.copy_bandwidth_MBps 330746 ± 2% -22.6% 255915 ± 3% -0.6% 328725 ± 2% stream.copy_bandwidth_MBps_harmonicMean 301178 -16.9% 250209 ± 2% -0.4% 299920 stream.scale_bandwidth_MBps 300262 -17.7% 247151 ± 2% -0.6% 298586 ± 2% stream.scale_bandwidth_MBps_harmonicMean 337408 -12.5% 295287 ± 2% -0.3% 336304 stream.triad_bandwidth_MBps 336153 -12.7% 293621 -0.5% 334624 ± 2% stream.triad_bandwidth_MBps_harmonicMean (8) Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G ========================================================================================= array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase: 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/50%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 345632 -19.7% 277550 ± 3% +0.4% 347067 ± 2% stream.add_bandwidth_MBps 342263 ± 2% -19.7% 274704 ± 2% +0.4% 343609 ± 2% stream.add_bandwidth_MBps_harmonicMean 343820 -17.3% 284428 ± 3% +0.1% 344248 stream.copy_bandwidth_MBps 341759 ± 2% -17.8% 280934 ± 3% +0.1% 342025 ± 2% stream.copy_bandwidth_MBps_harmonicMean 343270 -17.8% 282330 ± 3% +0.3% 344276 ± 2% stream.scale_bandwidth_MBps 340812 ± 2% -18.3% 278284 ± 3% +0.3% 341672 ± 2% stream.scale_bandwidth_MBps_harmonicMean 364596 -19.7% 292831 ± 3% +0.4% 366145 ± 2% stream.triad_bandwidth_MBps 360643 ± 2% -19.9% 289034 ± 3% +0.4% 362004 ± 2% stream.triad_bandwidth_MBps_harmonicMean (9) Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with memory: 512G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Create Threads/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 26.82 +1348.4% 388.43 +4.0% 27.88 phoronix-test-suite.osbench.CreateThreads.us_per_event **** for below (10) - (13), full comparison is attached as phoronix-regressions (they all happen on a Coffee Lake desktop) (10) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 20115 -4.5% 19211 -4.5% 19217 phoronix-test-suite.ramspeed.Add.Integer.mb_s (11) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 19960 -2.9% 19378 -3.0% 19366 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s (12) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 19667 -6.4% 18399 -6.4% 18413 phoronix-test-suite.ramspeed.Triad.Integer.mb_s (13) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 19799 -3.5% 19106 -3.4% 19117 phoronix-test-suite.ramspeed.Average.Integer.mb_s > > > commit d8d7b1dae6f0311d528b289cda7b317520f9a984 > Author: 0day robot <lkp@intel.com> > Date: Thu Jan 4 12:51:10 2024 +0800 > > fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi > > diff --git a/include/linux/mman.h b/include/linux/mman.h > index 40d94411d4920..91197bd387730 100644 > --- a/include/linux/mman.h > +++ b/include/linux/mman.h > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > arch_calc_vm_flag_bits(flags); > } > > > > > > Regards > > Yin, Fengwei > > > > > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h > > > index 40d94411d492..dc7048824be8 100644 > > > --- a/include/linux/mman.h > > > +++ b/include/linux/mman.h > > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > > > arch_calc_vm_flag_bits(flags); > > > } > > > [-- Attachment #2: stress-ng-regression --] [-- Type: text/plain, Size: 15787 bytes --] (1) Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G ========================================================================================= class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 55848 ± 28% +236.5% 187927 ± 3% +259.4% 200733 ± 2% meminfo.AnonHugePages 1.80 ± 5% -0.2 1.60 ± 5% -0.2 1.60 ± 7% mpstat.cpu.all.usr% 8077 ± 7% +11.8% 9030 ± 5% +4.6% 8451 ± 7% numa-vmstat.node0.nr_kernel_stack 120605 ± 3% -10.0% 108597 ± 3% -10.5% 107928 ± 3% vmstat.system.in 1868 ± 32% +75.1% 3271 ± 14% +87.1% 3495 ± 20% turbostat.C1 9123408 ± 5% -13.8% 7863298 ± 7% -14.0% 7846843 ± 6% turbostat.IRQ 59.62 ± 49% +125.4% 134.38 ± 88% +267.9% 219.38 ± 85% turbostat.POLL 24.33 ± 43% +69.1% 41.14 ± 35% +9.0% 26.51 ± 53% sched_debug.cfs_rq:/.removed.load_avg.avg 104.44 ± 21% +29.2% 134.94 ± 17% +3.2% 107.78 ± 26% sched_debug.cfs_rq:/.removed.load_avg.stddev 106.26 ± 16% -17.6% 87.53 ± 21% -24.6% 80.11 ± 21% sched_debug.cfs_rq:/.util_est_enqueued.stddev 35387 ± 59% +127.7% 80580 ± 53% +249.2% 123565 ± 57% sched_debug.cpu.avg_idle.min 1156 ± 7% -21.9% 903.06 ± 5% -23.2% 888.25 ± 15% sched_debug.cpu.nr_switches.min 20719 ±111% -51.1% 10123 ± 71% -56.6% 8996 ± 29% numa-meminfo.node0.Active 20639 ±111% -51.5% 10001 ± 72% -56.8% 8916 ± 29% numa-meminfo.node0.Active(anon) 31253 ± 70% +142.7% 75839 ± 20% +214.1% 98180 ± 22% numa-meminfo.node0.AnonHugePages 8076 ± 7% +11.8% 9029 ± 5% +4.7% 8451 ± 7% numa-meminfo.node0.KernelStack 24260 ± 62% +360.8% 111783 ± 17% +321.2% 102184 ± 21% numa-meminfo.node1.AnonHugePages 283702 ± 16% +40.9% 399633 ± 18% +35.9% 385485 ± 11% numa-meminfo.node1.AnonPages.max 251.12 -48.2% 130.00 -47.9% 130.75 stress-ng.numa.ops 4.10 -49.4% 2.08 -49.2% 2.09 stress-ng.numa.ops_per_sec 61658 -53.5% 28697 -53.3% 28768 stress-ng.time.minor_page_faults 3727 +2.8% 3832 +2.9% 3833 stress-ng.time.system_time 10.41 -48.6% 5.35 -48.7% 5.34 stress-ng.time.user_time 4313 ± 4% -47.0% 2285 ± 8% -48.3% 2230 ± 7% stress-ng.time.voluntary_context_switches 63.61 +2.5% 65.20 +2.7% 65.30 time.elapsed_time 63.61 +2.5% 65.20 +2.7% 65.30 time.elapsed_time.max 61658 -53.5% 28697 -53.3% 28768 time.minor_page_faults 3727 +2.8% 3832 +2.9% 3833 time.system_time 10.41 -48.6% 5.35 -48.7% 5.34 time.user_time 4313 ± 4% -47.0% 2285 ± 8% -48.3% 2230 ± 7% time.voluntary_context_switches 120325 +6.1% 127672 ± 6% +0.9% 121431 proc-vmstat.nr_anon_pages 27.33 ± 29% +236.0% 91.83 ± 3% +258.6% 98.02 ± 2% proc-vmstat.nr_anon_transparent_hugepages 148677 +6.2% 157844 ± 4% +0.7% 149763 proc-vmstat.nr_inactive_anon 98.10 ± 25% -52.8% 46.30 ± 69% -55.3% 43.82 ± 64% proc-vmstat.nr_isolated_file 2809 +9.0% 3063 ± 28% -3.9% 2698 ± 2% proc-vmstat.nr_page_table_pages 148670 +6.2% 157837 ± 4% +0.7% 149765 proc-vmstat.nr_zone_inactive_anon 2580003 -5.8% 2431297 -5.8% 2431173 proc-vmstat.numa_hit 1488693 -5.8% 1402808 -5.8% 1401633 proc-vmstat.numa_local 1091291 -5.8% 1028489 -5.7% 1029540 proc-vmstat.numa_other 9.56e+08 +2.1% 9.757e+08 +2.1% 9.761e+08 proc-vmstat.pgalloc_normal 469554 -7.6% 433894 -7.3% 435076 proc-vmstat.pgfault 9.559e+08 +2.1% 9.756e+08 +2.1% 9.76e+08 proc-vmstat.pgfree 17127 ± 21% -55.4% 7647 ± 64% -55.0% 7700 ± 52% proc-vmstat.pgmigrate_fail 9.554e+08 +2.1% 9.751e+08 +2.1% 9.754e+08 proc-vmstat.pgmigrate_success 1865641 +2.1% 1904388 +2.1% 1905158 proc-vmstat.thp_migration_success 0.43 ± 8% -0.1 0.30 ± 10% -0.2 0.28 ± 12% perf-profile.children.cycles-pp.queue_pages_range 0.43 ± 8% -0.1 0.30 ± 10% -0.2 0.28 ± 12% perf-profile.children.cycles-pp.walk_page_range 0.32 ± 8% -0.1 0.21 ± 11% -0.1 0.19 ± 13% perf-profile.children.cycles-pp.__walk_page_range 0.30 ± 8% -0.1 0.19 ± 12% -0.1 0.17 ± 13% perf-profile.children.cycles-pp.walk_pud_range 0.31 ± 9% -0.1 0.20 ± 12% -0.1 0.19 ± 12% perf-profile.children.cycles-pp.walk_pgd_range 0.30 ± 8% -0.1 0.20 ± 11% -0.1 0.18 ± 13% perf-profile.children.cycles-pp.walk_p4d_range 0.29 ± 8% -0.1 0.18 ± 11% -0.1 0.17 ± 13% perf-profile.children.cycles-pp.walk_pmd_range 0.28 ± 8% -0.1 0.17 ± 11% -0.1 0.16 ± 13% perf-profile.children.cycles-pp.queue_folios_pte_range 0.13 ± 12% -0.1 0.07 ± 11% -0.1 0.06 ± 17% perf-profile.children.cycles-pp.vm_normal_folio 0.18 ± 4% -0.0 0.15 ± 3% -0.0 0.16 ± 3% perf-profile.children.cycles-pp.add_page_for_migration 0.12 ± 4% -0.0 0.12 ± 5% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.__cond_resched 98.65 +0.2 98.82 +0.2 98.88 perf-profile.children.cycles-pp.migrate_pages_batch 98.66 +0.2 98.83 +0.2 98.89 perf-profile.children.cycles-pp.migrate_pages_sync 98.68 +0.2 98.85 +0.2 98.91 perf-profile.children.cycles-pp.migrate_pages 0.10 ± 11% -0.0 0.05 ± 12% -0.1 0.04 ± 79% perf-profile.self.cycles-pp.vm_normal_folio 0.13 ± 8% -0.0 0.08 ± 14% -0.0 0.08 ± 14% perf-profile.self.cycles-pp.queue_folios_pte_range 0.17 ± 89% -100.0% 0.00 -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages 0.45 ± 59% +124.4% 1.01 ± 81% +1094.5% 5.40 ±120% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read 27.27 ± 95% -75.2% 6.77 ± 83% -48.4% 14.08 ± 77% perf-sched.sch_delay.max.ms.__cond_resched.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch 2.00 ± 88% -100.0% 0.00 -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages 4.30 ± 86% -50.9% 2.11 ± 67% -90.0% 0.43 ±261% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 3.31 ± 53% -55.8% 1.46 ±218% -81.0% 0.63 ±182% perf-sched.sch_delay.max.ms.synchronize_rcu_expedited.lru_cache_disable.do_pages_move.kernel_move_pages 190.22 ± 41% +125.2% 428.42 ± 60% +72.7% 328.46 ± 21% perf-sched.wait_and_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork 294.56 ± 10% +44.0% 424.28 ± 16% +62.5% 478.70 ± 13% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0 322.33 ± 5% +46.1% 470.78 ± 10% +40.8% 453.90 ± 10% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 117.25 ± 11% -13.3% 101.62 ± 34% -24.6% 88.38 ± 17% perf-sched.wait_and_delay.count.__cond_resched.down_read.add_page_for_migration.do_pages_move.kernel_move_pages 307.25 ± 7% -54.6% 139.62 ± 4% -55.2% 137.62 ± 5% perf-sched.wait_and_delay.count.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_pages_move.kernel_move_pages 406.25 ± 3% -57.7% 171.88 ± 10% -59.0% 166.75 ± 3% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.__flush_work.isra.0 142.50 ± 33% -76.8% 33.00 ±139% -65.8% 48.75 ± 83% perf-sched.wait_and_delay.count.synchronize_rcu_expedited.lru_cache_disable.do_pages_move.kernel_move_pages 1196 ± 3% -37.9% 743.38 ± 10% -38.5% 736.00 ± 9% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 1749 ± 19% +45.1% 2537 ± 6% +76.0% 3078 ± 18% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0 2691 ± 15% +48.8% 4003 ± 6% +44.6% 3892 ± 11% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 2.82 ± 14% -100.0% 0.00 -81.1% 0.53 ±264% perf-sched.wait_time.avg.ms.__cond_resched.down_read.migrate_to_node.do_migrate_pages.kernel_migrate_pages 199.40 ± 29% +114.8% 428.41 ± 60% +64.7% 328.44 ± 21% perf-sched.wait_time.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork 3.09 ± 16% -100.0% 0.00 -84.4% 0.48 ±264% perf-sched.wait_time.avg.ms.__cond_resched.queue_folios_pte_range.walk_pmd_range.isra.0 1.94 ± 50% -100.0% 0.00 -74.2% 0.50 ±264% perf-sched.wait_time.avg.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages 294.30 ± 10% +44.1% 424.17 ± 16% +62.6% 478.57 ± 13% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0 0.98 ±107% -100.0% 0.00 -95.8% 0.04 ±264% perf-sched.wait_time.avg.ms.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages 321.84 ± 5% +46.1% 470.35 ± 10% +40.8% 453.02 ± 10% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 7.31 ± 53% -100.0% 0.00 -87.7% 0.90 ±264% perf-sched.wait_time.max.ms.__cond_resched.down_read.migrate_to_node.do_migrate_pages.kernel_migrate_pages 6.45 ± 16% -100.0% 0.00 -84.5% 1.00 ±264% perf-sched.wait_time.max.ms.__cond_resched.queue_folios_pte_range.walk_pmd_range.isra.0 6.17 ± 45% -100.0% 0.00 -91.9% 0.50 ±264% perf-sched.wait_time.max.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages 11.63 ±118% -93.3% 0.78 ±178% -89.3% 1.24 ±245% perf-sched.wait_time.max.ms.exp_funnel_lock.synchronize_rcu_expedited.lru_cache_disable.do_pages_move 1749 ± 19% +45.1% 2537 ± 6% +76.0% 3078 ± 18% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0 2.49 ± 88% -100.0% 0.00 -98.4% 0.04 ±264% perf-sched.wait_time.max.ms.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages 2691 ± 15% +48.8% 4003 ± 6% +44.6% 3892 ± 11% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 340.81 +38.9% 473.47 +38.4% 471.58 perf-stat.i.MPKI 1.131e+09 -25.0% 8.485e+08 -25.2% 8.465e+08 ± 2% perf-stat.i.branch-instructions 68.31 +1.1 69.37 +1.1 69.37 perf-stat.i.cache-miss-rate% 46.16 +38.1% 63.73 +37.5% 63.45 perf-stat.i.cpi 157.48 -7.7% 145.30 ± 2% -8.1% 144.76 ± 2% perf-stat.i.cpu-migrations 0.02 ± 2% +0.0 0.02 ± 16% +0.0 0.02 perf-stat.i.dTLB-load-miss-rate% 165432 ± 2% -2.9% 160583 ± 12% -8.3% 151664 perf-stat.i.dTLB-load-misses 1.133e+09 -21.9% 8.846e+08 -22.1% 8.823e+08 ± 2% perf-stat.i.dTLB-loads 0.02 -0.0 0.01 ± 3% -0.0 0.01 perf-stat.i.dTLB-store-miss-rate% 98452 -31.8% 67127 ± 2% -32.2% 66739 ± 2% perf-stat.i.dTLB-store-misses 5.668e+08 -13.7% 4.891e+08 -13.9% 4.879e+08 perf-stat.i.dTLB-stores 5.684e+09 -24.5% 4.292e+09 -24.7% 4.282e+09 ± 2% perf-stat.i.instructions 0.07 ± 2% -14.5% 0.06 ± 3% -14.6% 0.06 ± 5% perf-stat.i.ipc 88.20 -10.7% 78.73 -11.0% 78.53 perf-stat.i.metric.M/sec 1.242e+08 +0.9% 1.254e+08 +1.0% 1.255e+08 perf-stat.i.node-load-misses 76214273 +1.0% 76999051 +1.2% 77103845 perf-stat.i.node-loads 247.93 +32.1% 327.57 ± 2% +32.1% 327.56 ± 2% perf-stat.overall.MPKI 0.92 ± 4% +0.2 1.13 ± 5% +0.2 1.12 ± 5% perf-stat.overall.branch-miss-rate% 69.51 +0.9 70.45 +1.0 70.50 perf-stat.overall.cache-miss-rate% 33.77 +31.3% 44.35 ± 2% +31.3% 44.35 ± 2% perf-stat.overall.cpi 0.01 ± 2% +0.0 0.02 ± 13% +0.0 0.02 ± 2% perf-stat.overall.dTLB-load-miss-rate% 0.02 -0.0 0.01 ± 2% -0.0 0.01 perf-stat.overall.dTLB-store-miss-rate% 0.03 -23.9% 0.02 ± 2% -23.9% 0.02 perf-stat.overall.ipc 1.084e+09 -24.2% 8.217e+08 ± 2% -24.2% 8.216e+08 ± 2% perf-stat.ps.branch-instructions 154.44 -8.0% 142.02 ± 2% -8.6% 141.20 ± 2% perf-stat.ps.cpu-migrations 163178 ± 3% -3.1% 158185 ± 12% -8.0% 150107 ± 2% perf-stat.ps.dTLB-load-misses 1.089e+09 -21.1% 8.585e+08 -21.2% 8.581e+08 perf-stat.ps.dTLB-loads 96861 -31.9% 65975 ± 2% -32.1% 65796 ± 2% perf-stat.ps.dTLB-store-misses 5.503e+08 -13.1% 4.781e+08 -13.2% 4.776e+08 perf-stat.ps.dTLB-stores 5.447e+09 -23.7% 4.157e+09 -23.7% 4.157e+09 perf-stat.ps.instructions 1.223e+08 +1.0% 1.235e+08 +1.0% 1.235e+08 perf-stat.ps.node-load-misses 75118302 +1.1% 75929311 +1.1% 75927016 perf-stat.ps.node-loads 3.496e+11 -21.7% 2.737e+11 -21.7% 2.739e+11 ± 2% perf-stat.total.instructions [-- Attachment #3: will-it-scale-regression --] [-- Type: text/plain, Size: 57536 bytes --] (4) Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 3161 +46.4% 4627 +47.5% 4662 vmstat.system.cs 0.58 ± 2% +0.7 1.27 +0.7 1.26 mpstat.cpu.all.irq% 0.55 ± 3% -0.5 0.09 ± 2% -0.5 0.09 ± 2% mpstat.cpu.all.soft% 1.00 ± 13% -0.7 0.29 -0.7 0.28 mpstat.cpu.all.usr% 1231431 -86.7% 164315 -86.7% 163624 will-it-scale.112.processes 10994 -86.7% 1466 -86.7% 1460 will-it-scale.per_process_ops 1231431 -86.7% 164315 -86.7% 163624 will-it-scale.workload 0.03 -66.7% 0.01 -66.7% 0.01 turbostat.IPC 81.38 -2.8% 79.12 -2.2% 79.62 turbostat.PkgTmp 764.02 +17.1% 894.78 +17.0% 893.81 turbostat.PkgWatt 19.80 +135.4% 46.59 +135.1% 46.53 turbostat.RAMWatt 771.38 ± 5% +249.5% 2696 ± 14% +231.9% 2560 ± 10% perf-c2c.DRAM.local 3050 ± 5% -69.8% 922.75 ± 6% -71.5% 869.88 ± 8% perf-c2c.DRAM.remote 11348 ± 4% -90.2% 1107 ± 5% -90.6% 1065 ± 3% perf-c2c.HITM.local 357.50 ± 21% -44.0% 200.38 ± 7% -48.2% 185.25 ± 13% perf-c2c.HITM.remote 11706 ± 4% -88.8% 1307 ± 4% -89.3% 1250 ± 3% perf-c2c.HITM.total 1.717e+08 ± 9% -85.5% 24955542 -85.5% 24880885 numa-numastat.node0.local_node 1.718e+08 ± 9% -85.4% 25046901 -85.5% 24972867 numa-numastat.node0.numa_hit 1.945e+08 ± 7% -87.0% 25203631 -87.1% 25104844 numa-numastat.node1.local_node 1.946e+08 ± 7% -87.0% 25300536 -87.1% 25180465 numa-numastat.node1.numa_hit 2.001e+08 ± 2% -87.5% 25098699 -87.5% 25011079 numa-numastat.node2.local_node 2.002e+08 ± 2% -87.4% 25173132 -87.5% 25119438 numa-numastat.node2.numa_hit 1.956e+08 ± 6% -87.3% 24922332 -87.3% 24784408 numa-numastat.node3.local_node 1.957e+08 ± 6% -87.2% 25008002 -87.3% 24874399 numa-numastat.node3.numa_hit 766959 -45.9% 414816 -46.2% 412898 meminfo.Active 766881 -45.9% 414742 -46.2% 412824 meminfo.Active(anon) 391581 +12.1% 438946 +8.4% 424669 meminfo.AnonPages 421982 +20.7% 509155 +14.8% 484430 meminfo.Inactive 421800 +20.7% 508969 +14.8% 484244 meminfo.Inactive(anon) 68496 ± 7% +88.9% 129357 ± 2% +82.9% 125252 ± 2% meminfo.Mapped 569270 -24.0% 432709 -24.1% 431884 meminfo.SUnreclaim 797185 -40.2% 476420 -40.8% 471912 meminfo.Shmem 730111 -18.8% 593041 -18.9% 592400 meminfo.Slab 148082 ± 2% -20.3% 118055 ± 4% -21.7% 115994 ± 6% numa-meminfo.node0.SUnreclaim 197311 ± 16% -22.5% 152829 ± 19% -29.8% 138546 ± 9% numa-meminfo.node0.Slab 144635 ± 5% -25.8% 107254 ± 4% -25.3% 107973 ± 6% numa-meminfo.node1.SUnreclaim 137974 ± 2% -24.5% 104205 ± 6% -25.7% 102563 ± 4% numa-meminfo.node2.SUnreclaim 167889 ± 13% -26.1% 124127 ± 9% -15.0% 142771 ± 18% numa-meminfo.node2.Slab 607639 ± 20% -46.2% 326998 ± 15% -46.8% 323458 ± 13% numa-meminfo.node3.Active 607611 ± 20% -46.2% 326968 ± 15% -46.8% 323438 ± 13% numa-meminfo.node3.Active(anon) 679476 ± 21% -31.3% 466619 ± 19% -38.5% 418074 ± 16% numa-meminfo.node3.FilePages 20150 ± 22% +128.4% 46020 ± 11% +123.0% 44932 ± 8% numa-meminfo.node3.Mapped 138148 ± 2% -25.3% 103148 ± 4% -23.8% 105326 ± 7% numa-meminfo.node3.SUnreclaim 631930 ± 20% -40.9% 373456 ± 15% -41.5% 369883 ± 13% numa-meminfo.node3.Shmem 166777 ± 7% -19.6% 134013 ± 9% -20.7% 132332 ± 7% numa-meminfo.node3.Slab 37030 ± 2% -20.3% 29511 ± 4% -21.7% 28993 ± 6% numa-vmstat.node0.nr_slab_unreclaimable 1.718e+08 ± 9% -85.4% 25047066 -85.5% 24973455 numa-vmstat.node0.numa_hit 1.717e+08 ± 9% -85.5% 24955707 -85.5% 24881472 numa-vmstat.node0.numa_local 36158 ± 5% -25.8% 26811 ± 4% -25.4% 26990 ± 6% numa-vmstat.node1.nr_slab_unreclaimable 1.946e+08 ± 7% -87.0% 25300606 -87.1% 25181038 numa-vmstat.node1.numa_hit 1.945e+08 ± 7% -87.0% 25203699 -87.1% 25105417 numa-vmstat.node1.numa_local 34499 ± 2% -24.5% 26050 ± 6% -25.7% 25638 ± 4% numa-vmstat.node2.nr_slab_unreclaimable 2.002e+08 ± 2% -87.4% 25173363 -87.5% 25119830 numa-vmstat.node2.numa_hit 2.001e+08 ± 2% -87.5% 25098930 -87.5% 25011471 numa-vmstat.node2.numa_local 151851 ± 20% -46.2% 81720 ± 15% -46.8% 80848 ± 13% numa-vmstat.node3.nr_active_anon 169827 ± 21% -31.3% 116645 ± 19% -38.5% 104502 ± 16% numa-vmstat.node3.nr_file_pages 4991 ± 23% +131.5% 11555 ± 11% +125.4% 11249 ± 8% numa-vmstat.node3.nr_mapped 157941 ± 20% -40.9% 93355 ± 15% -41.5% 92454 ± 13% numa-vmstat.node3.nr_shmem 34570 ± 2% -25.4% 25780 ± 4% -23.8% 26327 ± 7% numa-vmstat.node3.nr_slab_unreclaimable 151851 ± 20% -46.2% 81720 ± 15% -46.8% 80848 ± 13% numa-vmstat.node3.nr_zone_active_anon 1.957e+08 ± 6% -87.2% 25008117 -87.3% 24874649 numa-vmstat.node3.numa_hit 1.956e+08 ± 6% -87.3% 24922447 -87.3% 24784657 numa-vmstat.node3.numa_local 191746 -45.9% 103734 -46.2% 103228 proc-vmstat.nr_active_anon 97888 +12.1% 109757 +8.5% 106185 proc-vmstat.nr_anon_pages 947825 -8.5% 867659 -8.6% 866533 proc-vmstat.nr_file_pages 105444 +20.7% 127227 +14.9% 121113 proc-vmstat.nr_inactive_anon 17130 ± 7% +88.9% 32365 ± 2% +83.4% 31420 ± 2% proc-vmstat.nr_mapped 4007 +4.2% 4176 +4.1% 4170 proc-vmstat.nr_page_table_pages 199322 -40.2% 119155 -40.8% 118031 proc-vmstat.nr_shmem 142294 -24.0% 108161 -24.1% 107954 proc-vmstat.nr_slab_unreclaimable 191746 -45.9% 103734 -46.2% 103228 proc-vmstat.nr_zone_active_anon 105444 +20.7% 127223 +14.9% 121106 proc-vmstat.nr_zone_inactive_anon 40186 ± 13% +65.0% 66320 ± 5% +60.2% 64374 ± 13% proc-vmstat.numa_hint_faults 20248 ± 39% +108.3% 42185 ± 12% +102.6% 41033 ± 10% proc-vmstat.numa_hint_faults_local 7.623e+08 -86.8% 1.005e+08 -86.9% 1.002e+08 proc-vmstat.numa_hit 7.62e+08 -86.9% 1.002e+08 -86.9% 99786408 proc-vmstat.numa_local 181538 ± 6% +49.5% 271428 ± 3% +48.9% 270328 ± 6% proc-vmstat.numa_pte_updates 152652 ± 7% -28.6% 108996 -29.6% 107396 proc-vmstat.pgactivate 7.993e+08 +3068.4% 2.533e+10 +3055.6% 2.522e+10 proc-vmstat.pgalloc_normal 3.72e+08 -86.4% 50632612 -86.4% 50429200 proc-vmstat.pgfault 7.99e+08 +3069.7% 2.533e+10 +3056.9% 2.522e+10 proc-vmstat.pgfree 48.75 ± 2% +1e+08% 49362627 +1e+08% 49162408 proc-vmstat.thp_fault_alloc 21789703 ± 10% -20.1% 17410551 ± 7% -18.9% 17673460 ± 4% sched_debug.cfs_rq:/.avg_vruntime.max 427573 ± 99% +1126.7% 5245182 ± 17% +1104.4% 5149659 ± 13% sched_debug.cfs_rq:/.avg_vruntime.min 4757464 ± 10% -48.3% 2458136 ± 19% -46.6% 2539001 ± 11% sched_debug.cfs_rq:/.avg_vruntime.stddev 0.44 ± 2% -15.9% 0.37 ± 2% -16.6% 0.37 ± 3% sched_debug.cfs_rq:/.h_nr_running.stddev 299205 ± 38% +59.3% 476493 ± 27% +50.6% 450561 ± 42% sched_debug.cfs_rq:/.load.max 21789703 ± 10% -20.1% 17410551 ± 7% -18.9% 17673460 ± 4% sched_debug.cfs_rq:/.min_vruntime.max 427573 ± 99% +1126.7% 5245182 ± 17% +1104.4% 5149659 ± 13% sched_debug.cfs_rq:/.min_vruntime.min 4757464 ± 10% -48.3% 2458136 ± 19% -46.6% 2539001 ± 11% sched_debug.cfs_rq:/.min_vruntime.stddev 0.44 ± 2% -16.0% 0.37 ± 2% -17.2% 0.36 ± 2% sched_debug.cfs_rq:/.nr_running.stddev 446.75 ± 2% -18.4% 364.71 ± 2% -19.3% 360.46 ± 2% sched_debug.cfs_rq:/.runnable_avg.stddev 445.25 ± 2% -18.4% 363.46 ± 2% -19.3% 359.33 ± 2% sched_debug.cfs_rq:/.util_avg.stddev 946.71 ± 3% -14.7% 807.54 ± 4% -15.4% 800.58 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.max 281.39 ± 7% -31.2% 193.63 ± 4% -32.0% 191.24 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.stddev 1131635 ± 7% +73.7% 1965577 ± 6% +76.5% 1997455 ± 7% sched_debug.cpu.avg_idle.max 223539 ± 16% +165.4% 593172 ± 7% +146.0% 549906 ± 11% sched_debug.cpu.avg_idle.min 83325 ± 4% +64.3% 136927 ± 9% +69.7% 141399 ± 11% sched_debug.cpu.avg_idle.stddev 17.57 ± 6% +594.5% 122.01 ± 3% +588.0% 120.88 ± 3% sched_debug.cpu.clock.stddev 873.33 -11.1% 776.19 -11.8% 770.20 sched_debug.cpu.clock_task.stddev 2870 -18.1% 2351 -17.4% 2371 sched_debug.cpu.curr->pid.avg 3003 -12.5% 2627 -12.4% 2630 sched_debug.cpu.curr->pid.stddev 550902 ± 6% +74.4% 960871 ± 6% +79.8% 990291 ± 8% sched_debug.cpu.max_idle_balance_cost.max 4451 ± 59% +1043.9% 50917 ± 15% +1129.4% 54721 ± 15% sched_debug.cpu.max_idle_balance_cost.stddev 0.00 ± 17% +385.8% 0.00 ± 34% +315.7% 0.00 ± 3% sched_debug.cpu.next_balance.stddev 0.43 -17.5% 0.35 -16.8% 0.35 sched_debug.cpu.nr_running.avg 1.15 ± 8% +25.0% 1.44 ± 8% +30.4% 1.50 ± 13% sched_debug.cpu.nr_running.max 0.45 -14.4% 0.39 -14.2% 0.39 ± 2% sched_debug.cpu.nr_running.stddev 3280 ± 5% +32.5% 4345 +34.5% 4412 sched_debug.cpu.nr_switches.avg 846.82 ± 11% +109.9% 1777 ± 12% +112.4% 1799 ± 4% sched_debug.cpu.nr_switches.min 0.03 ±173% +887.2% 0.30 ± 73% +521.1% 0.19 ± 35% sched_debug.rt_rq:.rt_time.avg 6.79 ±173% +887.2% 67.01 ± 73% +521.1% 42.16 ± 35% sched_debug.rt_rq:.rt_time.max 0.45 ±173% +887.2% 4.47 ± 73% +521.1% 2.81 ± 35% sched_debug.rt_rq:.rt_time.stddev 4.65 +28.0% 5.96 +28.5% 5.98 perf-stat.i.MPKI 8.721e+09 -71.0% 2.532e+09 -71.1% 2.523e+09 perf-stat.i.branch-instructions 0.34 +0.1 0.48 +0.1 0.48 perf-stat.i.branch-miss-rate% 30145441 -58.6% 12471062 -58.6% 12487542 perf-stat.i.branch-misses 33.52 -15.3 18.20 -15.2 18.27 perf-stat.i.cache-miss-rate% 1.819e+08 -58.8% 74947458 -58.8% 74903072 perf-stat.i.cache-misses 5.429e+08 ± 2% -24.1% 4.123e+08 -24.4% 4.103e+08 perf-stat.i.cache-references 3041 +48.6% 4518 +49.7% 4552 perf-stat.i.context-switches 10.96 +212.9% 34.28 +214.1% 34.41 perf-stat.i.cpi 309.29 -11.2% 274.59 -11.3% 274.20 perf-stat.i.cpu-migrations 2354 +144.6% 5758 +144.7% 5761 perf-stat.i.cycles-between-cache-misses 0.13 -0.1 0.01 ± 3% -0.1 0.01 ± 3% perf-stat.i.dTLB-load-miss-rate% 12852209 ± 2% -98.0% 261197 ± 3% -97.9% 263864 ± 3% perf-stat.i.dTLB-load-misses 9.56e+09 -69.3% 2.932e+09 -69.4% 2.922e+09 perf-stat.i.dTLB-loads 0.12 -0.1 0.03 -0.1 0.03 perf-stat.i.dTLB-store-miss-rate% 5083186 -86.3% 693971 -86.4% 690328 perf-stat.i.dTLB-store-misses 4.209e+09 -44.9% 2.317e+09 -45.2% 2.308e+09 perf-stat.i.dTLB-stores 76.33 -39.7 36.61 -39.7 36.59 perf-stat.i.iTLB-load-miss-rate% 18717931 -80.1% 3715941 -80.2% 3698121 perf-stat.i.iTLB-load-misses 5758034 +7.7% 6202790 +7.4% 6183041 perf-stat.i.iTLB-loads 3.914e+10 -67.8% 1.261e+10 -67.9% 1.256e+10 perf-stat.i.instructions 2107 +73.9% 3663 +73.6% 3658 perf-stat.i.instructions-per-iTLB-miss 0.09 -67.9% 0.03 -68.1% 0.03 perf-stat.i.ipc 269.39 +10.6% 297.91 +10.7% 298.33 perf-stat.i.metric.K/sec 102.78 -64.5% 36.54 -64.6% 36.40 perf-stat.i.metric.M/sec 1234832 -86.4% 167556 -86.5% 166848 perf-stat.i.minor-faults 87.25 -41.9 45.32 -42.2 45.09 perf-stat.i.node-load-miss-rate% 25443233 -83.0% 4326696 ± 3% -83.4% 4227985 ± 2% perf-stat.i.node-load-misses 3723342 ± 3% +45.4% 5414430 +44.3% 5372545 perf-stat.i.node-loads 79.20 -74.4 4.78 -74.5 4.74 perf-stat.i.node-store-miss-rate% 14161911 ± 2% -83.1% 2394469 -83.2% 2382317 perf-stat.i.node-store-misses 3727955 ± 3% +1181.6% 47776544 +1188.5% 48035797 perf-stat.i.node-stores 1234832 -86.4% 167556 -86.5% 166849 perf-stat.i.page-faults 4.65 +28.0% 5.95 +28.4% 5.97 perf-stat.overall.MPKI 0.35 +0.1 0.49 +0.1 0.49 perf-stat.overall.branch-miss-rate% 33.51 -15.3 18.19 -15.3 18.26 perf-stat.overall.cache-miss-rate% 10.94 +212.3% 34.16 +213.4% 34.28 perf-stat.overall.cpi 2354 +143.9% 5741 +144.1% 5746 perf-stat.overall.cycles-between-cache-misses 0.13 -0.1 0.01 ± 3% -0.1 0.01 ± 5% perf-stat.overall.dTLB-load-miss-rate% 0.12 -0.1 0.03 -0.1 0.03 perf-stat.overall.dTLB-store-miss-rate% 76.49 -39.2 37.31 -39.2 37.29 perf-stat.overall.iTLB-load-miss-rate% 2090 +63.4% 3416 +63.5% 3417 perf-stat.overall.instructions-per-iTLB-miss 0.09 -68.0% 0.03 -68.1% 0.03 perf-stat.overall.ipc 87.22 -43.1 44.12 ± 2% -43.5 43.76 perf-stat.overall.node-load-miss-rate% 79.16 -74.4 4.77 -74.4 4.72 perf-stat.overall.node-store-miss-rate% 9549728 +140.9% 23005172 +141.1% 23022843 perf-stat.overall.path-length 8.691e+09 -71.0% 2.519e+09 -71.1% 2.51e+09 perf-stat.ps.branch-instructions 30118940 -59.1% 12319517 -59.1% 12327993 perf-stat.ps.branch-misses 1.813e+08 -58.8% 74623919 -58.9% 74563289 perf-stat.ps.cache-misses 5.41e+08 ± 2% -24.2% 4.103e+08 -24.5% 4.085e+08 perf-stat.ps.cache-references 3031 +47.9% 4485 +49.1% 4519 perf-stat.ps.context-switches 307.72 -12.7% 268.59 -12.7% 268.66 perf-stat.ps.cpu-migrations 12806734 ± 2% -98.0% 260740 ± 4% -97.9% 267782 ± 5% perf-stat.ps.dTLB-load-misses 9.528e+09 -69.4% 2.917e+09 -69.5% 2.907e+09 perf-stat.ps.dTLB-loads 5063992 -86.4% 690720 -86.4% 687415 perf-stat.ps.dTLB-store-misses 4.195e+09 -45.0% 2.306e+09 -45.2% 2.297e+09 perf-stat.ps.dTLB-stores 18661026 -80.3% 3672024 -80.4% 3658006 perf-stat.ps.iTLB-load-misses 5735379 +7.6% 6169096 +7.3% 6151755 perf-stat.ps.iTLB-loads 3.901e+10 -67.8% 1.254e+10 -68.0% 1.25e+10 perf-stat.ps.instructions 1230175 -86.4% 166708 -86.5% 166045 perf-stat.ps.minor-faults 25346347 -83.0% 4299946 ± 2% -83.4% 4203636 ± 2% perf-stat.ps.node-load-misses 3713652 ± 3% +46.6% 5444481 +45.5% 5401831 perf-stat.ps.node-loads 14107969 ± 2% -83.1% 2381707 -83.2% 2368146 perf-stat.ps.node-store-misses 3716359 ± 3% +1179.6% 47556224 +1186.1% 47797289 perf-stat.ps.node-stores 1230175 -86.4% 166708 -86.5% 166046 perf-stat.ps.page-faults 1.176e+13 -67.9% 3.78e+12 -68.0% 3.767e+12 perf-stat.total.instructions 0.01 ± 42% +385.1% 0.03 ± 8% +566.0% 0.04 ± 42% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 0.01 ± 17% +354.3% 0.05 ± 8% +402.1% 0.06 ± 8% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 0.01 ± 19% +323.1% 0.06 ± 27% +347.1% 0.06 ± 17% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64 0.01 ± 14% +2.9e+05% 25.06 ±172% +1.6e+05% 13.94 ±263% perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 0.00 ±129% +7133.3% 0.03 ± 7% +7200.0% 0.03 ± 4% perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64 0.01 ± 8% +396.8% 0.06 ± 2% +402.1% 0.06 ± 2% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 0.01 ± 9% +256.9% 0.03 ± 10% +232.8% 0.02 ± 13% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 0.01 ± 15% +324.0% 0.05 ± 17% +320.8% 0.05 ± 17% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 0.01 ± 19% +338.6% 0.06 ± 7% +305.0% 0.05 ± 8% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 0.01 ± 9% +298.4% 0.03 ± 2% +304.8% 0.03 perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 0.01 ± 7% +265.8% 0.03 ± 5% +17282.9% 1.65 ±258% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 0.19 ± 11% -89.3% 0.02 ± 10% -89.4% 0.02 ± 10% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.01 ± 28% +319.8% 0.05 ± 19% +303.0% 0.05 ± 18% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read 0.01 ± 14% +338.9% 0.03 ± 9% +318.5% 0.03 ± 4% perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open 0.02 ± 20% +674.2% 0.12 ±137% +267.5% 0.06 ± 15% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 0.01 ± 46% +256.9% 0.03 ± 11% +1095.8% 0.11 ±112% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part 0.02 ± 28% +324.6% 0.07 ± 8% +353.2% 0.07 ± 9% perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 0.02 ± 21% +318.4% 0.07 ± 25% +389.6% 0.08 ± 26% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64 0.01 ± 26% +1.9e+06% 250.13 ±173% +9.7e+05% 125.09 ±264% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 0.02 ± 25% +585.6% 0.11 ± 63% +454.5% 0.09 ± 31% perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64 0.04 ± 39% +159.0% 0.11 ± 6% +190.0% 0.13 ± 10% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64 0.01 ± 29% +312.9% 0.06 ± 19% +401.7% 0.07 ± 13% perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64 0.02 ± 25% +216.8% 0.06 ± 36% +166.4% 0.05 ± 7% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll 0.01 ± 21% +345.8% 0.07 ± 26% +298.3% 0.06 ± 18% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 0.03 ± 35% +190.2% 0.07 ± 16% +187.8% 0.07 ± 11% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 0.02 ± 19% +220.8% 0.07 ± 23% +2.9e+05% 63.06 ±263% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork 4.60 ± 5% -10.7% 4.11 ± 8% -13.4% 3.99 perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.02 ± 32% +368.0% 0.07 ± 25% +346.9% 0.07 ± 20% perf-sched.sch_delay.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read 189.60 -32.9% 127.16 -33.0% 126.98 perf-sched.total_wait_and_delay.average.ms 11265 ± 3% +73.7% 19568 ± 3% +71.1% 19274 perf-sched.total_wait_and_delay.count.ms 189.18 -32.9% 126.97 -33.0% 126.81 perf-sched.total_wait_time.average.ms 0.50 ± 20% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault 0.50 ± 11% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop 0.43 ± 16% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0 52.33 ± 31% +223.4% 169.23 ± 7% +226.5% 170.86 ± 2% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64 0.51 ± 18% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 28.05 ± 4% +27.8% 35.84 ± 4% +26.0% 35.34 ± 8% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64 2.08 ± 3% +33.2% 2.76 +32.9% 2.76 ± 2% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 491.80 -53.6% 227.96 ± 3% -53.5% 228.58 ± 2% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 222.00 ± 9% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault 8.75 ± 33% -84.3% 1.38 ±140% -82.9% 1.50 ± 57% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 1065 ± 3% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop 538.25 ± 9% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.unmap_vmas.unmap_region.constprop.0 307.75 ± 6% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 2458 ± 3% -20.9% 1944 ± 4% -20.5% 1954 ± 7% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64 2577 ± 5% +168.6% 6921 ± 4% +165.0% 6829 ± 2% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 7.07 ±172% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault 1730 ± 24% -77.9% 382.66 ±117% -50.1% 862.68 ± 89% perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 34.78 ± 43% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop 8.04 ±179% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0 9.47 ±134% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 3.96 ± 6% +60.6% 6.36 ± 5% +58.3% 6.27 ± 6% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 0.42 ± 27% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc 0.50 ± 20% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault 0.51 ± 17% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault 0.59 ± 17% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault 0.46 ± 31% -63.3% 0.17 ± 18% -67.7% 0.15 ± 15% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap 0.50 ± 11% -67.8% 0.16 ± 8% -67.6% 0.16 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop 0.43 ± 16% -63.5% 0.16 ± 10% -62.6% 0.16 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0 0.50 ± 19% -67.0% 0.17 ± 5% -69.0% 0.16 ± 11% perf-sched.wait_time.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range 1.71 ± 5% +55.9% 2.66 ± 3% +47.3% 2.52 ± 6% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64 52.33 ± 31% +223.4% 169.20 ± 7% +226.5% 170.83 ± 2% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64 0.51 ± 18% -67.7% 0.16 ± 5% -68.0% 0.16 ± 6% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 0.53 ± 17% -65.4% 0.18 ± 56% -66.5% 0.18 ± 10% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 27.63 ± 4% +29.7% 35.83 ± 4% +27.6% 35.27 ± 8% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64 2.07 ± 3% +32.1% 2.73 +31.9% 2.73 ± 2% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 491.61 -53.6% 227.94 ± 3% -53.5% 228.56 ± 2% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 1.72 ± 5% +58.1% 2.73 ± 3% +50.4% 2.59 ± 7% perf-sched.wait_time.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read 1.42 ± 21% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc 7.07 ±172% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault 1.66 ± 27% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault 2.05 ± 57% -100.0% 0.00 -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault 1.69 ± 20% -84.6% 0.26 ± 25% -86.0% 0.24 ± 6% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap 1730 ± 24% -76.3% 409.21 ±104% -50.1% 862.65 ± 89% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 34.78 ± 43% -98.9% 0.38 ± 12% -98.8% 0.41 ± 10% perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop 8.04 ±179% -96.0% 0.32 ± 18% -95.7% 0.35 ± 19% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0 4.68 ±155% -93.4% 0.31 ± 24% -93.9% 0.28 ± 21% perf-sched.wait_time.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range 3.42 ± 5% +55.9% 5.33 ± 3% +47.3% 5.03 ± 6% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64 9.47 ±134% -96.3% 0.35 ± 17% -96.1% 0.37 ± 8% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault 1.87 ± 10% -60.9% 0.73 ±164% -85.3% 0.28 ± 24% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 2.39 ±185% -97.8% 0.05 ±165% -98.0% 0.05 ±177% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi 3.95 ± 6% +59.9% 6.32 ± 5% +57.6% 6.23 ± 6% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 3.45 ± 5% +58.1% 5.45 ± 3% +50.4% 5.19 ± 7% perf-sched.wait_time.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read 56.55 ± 2% -55.1 1.45 ± 2% -55.1 1.44 ± 2% perf-profile.calltrace.cycles-pp.__munmap 56.06 ± 2% -55.1 0.96 ± 2% -55.1 0.96 ± 2% perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 56.50 ± 2% -55.1 1.44 -55.1 1.44 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 56.50 ± 2% -55.1 1.44 ± 2% -55.1 1.43 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 56.47 ± 2% -55.0 1.43 -55.0 1.42 ± 2% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 56.48 ± 2% -55.0 1.44 ± 2% -55.0 1.43 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 56.45 ± 2% -55.0 1.42 -55.0 1.42 ± 2% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 56.40 ± 2% -55.0 1.40 ± 2% -55.0 1.39 ± 2% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 35.28 -34.6 0.66 -34.6 0.66 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 35.17 -34.6 0.57 -34.6 0.57 ± 2% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap 35.11 -34.5 0.57 -34.5 0.56 perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap 18.40 ± 7% -18.4 0.00 -18.4 0.00 perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 17.42 ± 7% -17.4 0.00 -17.4 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 17.42 ± 7% -17.4 0.00 -17.4 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap 17.41 ± 7% -17.4 0.00 -17.4 0.00 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap 17.23 ± 6% -17.2 0.00 -17.2 0.00 perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region 16.09 ± 8% -16.1 0.00 -16.1 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region 16.02 ± 8% -16.0 0.00 -16.0 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region 15.95 ± 8% -16.0 0.00 -16.0 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu 15.89 ± 8% -15.9 0.00 -15.9 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush 15.86 ± 8% -15.9 0.00 -15.9 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain 15.82 ± 8% -15.8 0.00 -15.8 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu 9.32 ± 9% -9.3 0.00 -9.3 0.00 perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu 8.52 ± 8% -8.5 0.00 -8.5 0.00 perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 7.90 ± 4% -7.9 0.00 -7.9 0.00 perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu 7.56 ± 6% -7.6 0.00 -7.6 0.00 perf-profile.calltrace.cycles-pp.__pte_alloc.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 7.55 ± 6% -7.6 0.00 -7.6 0.00 perf-profile.calltrace.cycles-pp.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault.handle_mm_fault 6.51 ± 8% -6.5 0.00 -6.5 0.00 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault 6.51 ± 8% -6.5 0.00 -6.5 0.00 perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc.do_anonymous_page 6.41 ± 8% -6.4 0.00 -6.4 0.00 perf-profile.calltrace.cycles-pp.__memcg_kmem_charge_page.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc 0.00 +0.5 0.54 ± 4% +0.6 0.55 ± 3% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page 0.00 +0.7 0.70 ± 3% +0.7 0.71 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault 0.00 +1.4 1.39 +1.4 1.38 ± 3% perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault 19.16 ± 6% +57.0 76.21 +57.5 76.66 perf-profile.calltrace.cycles-pp.asm_exc_page_fault 19.09 ± 6% +57.1 76.16 +57.5 76.61 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 19.10 ± 6% +57.1 76.17 +57.5 76.61 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 18.99 ± 6% +57.1 76.14 +57.6 76.58 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 18.43 ± 7% +57.7 76.11 +58.1 76.56 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.00 +73.0 73.00 +73.5 73.46 perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault 0.00 +75.1 75.15 +75.6 75.60 perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.00 +75.9 75.92 +76.4 76.37 perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 58.03 ± 2% -56.0 2.05 -56.0 2.03 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 58.02 ± 2% -56.0 2.04 -56.0 2.02 perf-profile.children.cycles-pp.do_syscall_64 56.57 ± 2% -55.1 1.45 ± 2% -55.1 1.45 ± 2% perf-profile.children.cycles-pp.__munmap 56.06 ± 2% -55.1 0.97 -55.1 0.96 perf-profile.children.cycles-pp.unmap_region 56.51 ± 2% -55.1 1.43 -55.1 1.42 ± 2% perf-profile.children.cycles-pp.do_vmi_munmap 56.48 ± 2% -55.0 1.43 ± 2% -55.0 1.43 ± 2% perf-profile.children.cycles-pp.__vm_munmap 56.48 ± 2% -55.0 1.44 ± 2% -55.0 1.43 ± 2% perf-profile.children.cycles-pp.__x64_sys_munmap 56.40 ± 2% -55.0 1.40 -55.0 1.39 ± 2% perf-profile.children.cycles-pp.do_vmi_align_munmap 35.28 -34.6 0.66 -34.6 0.66 perf-profile.children.cycles-pp.tlb_finish_mmu 35.18 -34.6 0.58 -34.6 0.57 perf-profile.children.cycles-pp.tlb_batch_pages_flush 35.16 -34.6 0.57 -34.6 0.57 perf-profile.children.cycles-pp.release_pages 32.12 ± 8% -32.1 0.05 -32.1 0.04 ± 37% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 31.85 ± 8% -31.8 0.06 -31.8 0.06 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 31.74 ± 8% -31.7 0.00 -31.7 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 18.40 ± 7% -18.4 0.00 -18.4 0.00 perf-profile.children.cycles-pp.do_anonymous_page 17.43 ± 7% -17.4 0.00 -17.4 0.00 perf-profile.children.cycles-pp.lru_add_drain 17.43 ± 7% -17.4 0.00 -17.4 0.00 perf-profile.children.cycles-pp.lru_add_drain_cpu 17.43 ± 7% -17.3 0.10 ± 5% -17.3 0.10 ± 3% perf-profile.children.cycles-pp.folio_batch_move_lru 17.23 ± 6% -17.2 0.00 -17.2 0.00 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list 9.32 ± 9% -9.3 0.00 -9.3 0.00 perf-profile.children.cycles-pp.uncharge_folio 8.57 ± 8% -8.4 0.16 ± 4% -8.4 0.15 ± 4% perf-profile.children.cycles-pp.__mem_cgroup_charge 7.90 ± 4% -7.8 0.14 ± 5% -7.8 0.14 ± 4% perf-profile.children.cycles-pp.uncharge_batch 7.57 ± 6% -7.6 0.00 -7.6 0.00 perf-profile.children.cycles-pp.__pte_alloc 7.55 ± 6% -7.4 0.16 ± 3% -7.4 0.16 ± 3% perf-profile.children.cycles-pp.pte_alloc_one 6.54 ± 2% -6.5 0.00 -6.5 0.00 perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 6.59 ± 8% -6.4 0.22 ± 2% -6.4 0.22 ± 3% perf-profile.children.cycles-pp.alloc_pages_mpol 6.58 ± 8% -6.4 0.21 ± 2% -6.4 0.22 ± 2% perf-profile.children.cycles-pp.__alloc_pages 6.41 ± 8% -6.3 0.07 ± 5% -6.3 0.07 ± 5% perf-profile.children.cycles-pp.__memcg_kmem_charge_page 4.48 ± 2% -4.3 0.18 ± 4% -4.3 0.18 ± 3% perf-profile.children.cycles-pp.__mod_lruvec_page_state 3.08 ± 4% -3.0 0.09 ± 7% -3.0 0.09 ± 6% perf-profile.children.cycles-pp.page_counter_uncharge 1.74 ± 8% -1.6 0.10 -1.6 0.10 ± 4% perf-profile.children.cycles-pp.kmem_cache_alloc 1.72 ± 2% -1.5 0.23 ± 2% -1.5 0.23 ± 4% perf-profile.children.cycles-pp.unmap_vmas 1.71 ± 2% -1.5 0.22 ± 3% -1.5 0.22 ± 4% perf-profile.children.cycles-pp.unmap_page_range 1.70 ± 2% -1.5 0.21 ± 3% -1.5 0.21 ± 4% perf-profile.children.cycles-pp.zap_pmd_range 1.36 ± 16% -1.3 0.09 ± 4% -1.3 0.09 ± 4% perf-profile.children.cycles-pp.native_irq_return_iret 1.18 ± 2% -1.1 0.08 ± 7% -1.1 0.08 ± 5% perf-profile.children.cycles-pp.page_remove_rmap 1.16 ± 2% -1.1 0.08 ± 4% -1.1 0.07 ± 6% perf-profile.children.cycles-pp.folio_add_new_anon_rmap 1.45 ± 6% -1.0 0.44 ± 2% -1.0 0.44 ± 2% perf-profile.children.cycles-pp.__mmap 1.05 -1.0 0.06 ± 7% -1.0 0.06 ± 7% perf-profile.children.cycles-pp.lru_add_fn 1.03 ± 7% -1.0 0.04 ± 37% -1.0 0.04 ± 37% perf-profile.children.cycles-pp.__anon_vma_prepare 1.38 ± 6% -1.0 0.42 ± 3% -1.0 0.42 ± 2% perf-profile.children.cycles-pp.vm_mmap_pgoff 1.33 ± 6% -0.9 0.40 ± 2% -0.9 0.40 ± 2% perf-profile.children.cycles-pp.do_mmap 0.93 ± 11% -0.9 0.03 ± 77% -0.9 0.02 ±100% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook 1.17 ± 7% -0.8 0.34 ± 2% -0.8 0.34 ± 2% perf-profile.children.cycles-pp.mmap_region 0.87 ± 5% -0.8 0.06 ± 5% -0.8 0.06 ± 9% perf-profile.children.cycles-pp.kmem_cache_free 0.89 ± 5% -0.7 0.19 ± 4% -0.7 0.20 ± 2% perf-profile.children.cycles-pp.rcu_do_batch 0.89 ± 5% -0.7 0.20 ± 4% -0.7 0.20 ± 3% perf-profile.children.cycles-pp.rcu_core 0.90 ± 5% -0.7 0.21 ± 4% -0.7 0.21 ± 2% perf-profile.children.cycles-pp.__do_softirq 0.74 ± 6% -0.7 0.06 ± 5% -0.7 0.06 ± 8% perf-profile.children.cycles-pp.irq_exit_rcu 0.72 ± 10% -0.7 0.06 ± 5% -0.7 0.06 ± 7% perf-profile.children.cycles-pp.vm_area_alloc 1.01 ± 4% -0.4 0.61 ± 4% -0.4 0.61 ± 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.14 ± 5% -0.1 0.02 ±100% -0.1 0.02 ±100% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown 0.16 ± 9% -0.1 0.07 ± 7% -0.1 0.07 perf-profile.children.cycles-pp.__slab_free 0.15 ± 3% -0.1 0.06 ± 5% -0.1 0.06 ± 5% perf-profile.children.cycles-pp.get_unmapped_area 0.08 ± 22% -0.0 0.05 ± 41% -0.0 0.04 ± 37% perf-profile.children.cycles-pp.generic_perform_write 0.08 ± 22% -0.0 0.05 ± 41% -0.0 0.04 ± 38% perf-profile.children.cycles-pp.shmem_file_write_iter 0.09 ± 22% -0.0 0.05 ± 43% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.record__pushfn 0.09 ± 22% -0.0 0.05 ± 43% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.writen 0.09 ± 22% -0.0 0.05 ± 43% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.__libc_write 0.11 ± 8% -0.0 0.07 ± 6% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.rcu_cblist_dequeue 0.16 ± 7% -0.0 0.13 ± 4% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.try_charge_memcg 0.09 ± 22% -0.0 0.07 ± 18% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.vfs_write 0.09 ± 22% -0.0 0.07 ± 18% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.ksys_write 0.15 ± 4% -0.0 0.13 ± 3% -0.0 0.13 ± 2% perf-profile.children.cycles-pp.get_page_from_freelist 0.09 -0.0 0.08 ± 4% -0.0 0.08 perf-profile.children.cycles-pp.flush_tlb_mm_range 0.06 +0.0 0.09 ± 4% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.rcu_all_qs 0.17 ± 6% +0.0 0.20 ± 4% +0.0 0.20 ± 3% perf-profile.children.cycles-pp.kthread 0.17 ± 6% +0.0 0.20 ± 4% +0.0 0.20 ± 3% perf-profile.children.cycles-pp.ret_from_fork_asm 0.17 ± 6% +0.0 0.20 ± 4% +0.0 0.20 ± 3% perf-profile.children.cycles-pp.ret_from_fork 0.12 ± 4% +0.0 0.16 ± 3% +0.0 0.16 ± 2% perf-profile.children.cycles-pp.mas_store_prealloc 0.08 ± 6% +0.0 0.12 ± 2% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.vma_alloc_folio 0.00 +0.0 0.04 ± 37% +0.1 0.05 perf-profile.children.cycles-pp.memcg_check_events 0.00 +0.0 0.04 ± 37% +0.1 0.05 perf-profile.children.cycles-pp.thp_get_unmapped_area 0.00 +0.1 0.05 +0.0 0.04 ± 57% perf-profile.children.cycles-pp.free_tail_page_prepare 0.00 +0.1 0.05 +0.1 0.05 perf-profile.children.cycles-pp.mas_destroy 0.00 +0.1 0.05 ± 9% +0.1 0.05 ± 9% perf-profile.children.cycles-pp.update_load_avg 0.00 +0.1 0.06 ± 7% +0.1 0.07 ± 7% perf-profile.children.cycles-pp.native_flush_tlb_one_user 0.00 +0.1 0.07 ± 7% +0.1 0.07 ± 6% perf-profile.children.cycles-pp.__page_cache_release 0.00 +0.1 0.07 ± 4% +0.1 0.07 ± 5% perf-profile.children.cycles-pp.mas_topiary_replace 0.08 ± 5% +0.1 0.16 ± 3% +0.1 0.15 ± 3% perf-profile.children.cycles-pp.mas_alloc_nodes 0.00 +0.1 0.08 ± 4% +0.1 0.08 ± 6% perf-profile.children.cycles-pp.prep_compound_page 0.08 ± 6% +0.1 0.17 ± 5% +0.1 0.18 ± 5% perf-profile.children.cycles-pp.task_tick_fair 0.00 +0.1 0.10 ± 5% +0.1 0.10 ± 4% perf-profile.children.cycles-pp.folio_add_lru_vma 0.00 +0.1 0.11 ± 4% +0.1 0.11 ± 5% perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk 0.00 +0.1 0.12 ± 2% +0.1 0.12 ± 3% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk 0.00 +0.1 0.13 ± 3% +0.1 0.13 ± 2% perf-profile.children.cycles-pp.mas_split 0.00 +0.1 0.13 +0.1 0.13 ± 3% perf-profile.children.cycles-pp._raw_spin_lock 0.11 ± 4% +0.1 0.24 ± 3% +0.1 0.25 ± 4% perf-profile.children.cycles-pp.scheduler_tick 0.00 +0.1 0.14 ± 4% +0.1 0.14 ± 5% perf-profile.children.cycles-pp.__mem_cgroup_uncharge 0.00 +0.1 0.14 ± 3% +0.1 0.14 ± 3% perf-profile.children.cycles-pp.mas_wr_bnode 0.00 +0.1 0.14 ± 5% +0.1 0.14 ± 3% perf-profile.children.cycles-pp.destroy_large_folio 0.00 +0.1 0.15 ± 4% +0.1 0.15 ± 4% perf-profile.children.cycles-pp.mas_spanning_rebalance 0.00 +0.1 0.15 ± 2% +0.2 0.15 ± 4% perf-profile.children.cycles-pp.zap_huge_pmd 0.00 +0.2 0.17 ± 3% +0.2 0.17 ± 3% perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page 0.19 ± 3% +0.2 0.38 +0.2 0.38 ± 2% perf-profile.children.cycles-pp.mas_store_gfp 0.00 +0.2 0.19 ± 3% +0.2 0.18 ± 4% perf-profile.children.cycles-pp.__mod_node_page_state 0.00 +0.2 0.20 ± 3% +0.2 0.20 ± 4% perf-profile.children.cycles-pp.__mod_lruvec_state 0.12 ± 3% +0.2 0.35 +0.2 0.36 ± 3% perf-profile.children.cycles-pp.update_process_times 0.12 ± 3% +0.2 0.36 ± 2% +0.2 0.36 ± 2% perf-profile.children.cycles-pp.tick_sched_handle 0.14 ± 3% +0.2 0.39 +0.3 0.40 ± 4% perf-profile.children.cycles-pp.tick_nohz_highres_handler 0.27 ± 2% +0.3 0.52 ± 3% +0.3 0.52 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt 0.27 ± 2% +0.3 0.52 ± 4% +0.3 0.53 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.21 ± 4% +0.3 0.48 ± 3% +0.3 0.48 ± 2% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.00 +0.3 0.31 ± 2% +0.3 0.31 ± 3% perf-profile.children.cycles-pp.mas_wr_spanning_store 0.00 +0.4 0.38 +0.4 0.38 ± 2% perf-profile.children.cycles-pp.free_unref_page_prepare 0.00 +0.4 0.39 +0.4 0.40 perf-profile.children.cycles-pp.free_unref_page 0.13 ± 4% +1.3 1.42 +1.3 1.41 ± 3% perf-profile.children.cycles-pp.__cond_resched 19.19 ± 6% +57.0 76.23 +57.5 76.68 perf-profile.children.cycles-pp.asm_exc_page_fault 19.11 ± 6% +57.1 76.18 +57.5 76.63 perf-profile.children.cycles-pp.exc_page_fault 19.10 ± 6% +57.1 76.18 +57.5 76.62 perf-profile.children.cycles-pp.do_user_addr_fault 19.00 ± 6% +57.1 76.15 +57.6 76.59 perf-profile.children.cycles-pp.handle_mm_fault 18.44 ± 7% +57.7 76.12 +58.1 76.57 perf-profile.children.cycles-pp.__handle_mm_fault 0.06 ± 9% +73.3 73.38 +73.8 73.84 perf-profile.children.cycles-pp.clear_page_erms 0.00 +75.2 75.25 +75.7 75.70 perf-profile.children.cycles-pp.clear_huge_page 0.00 +75.9 75.92 +76.4 76.37 perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page 31.74 ± 8% -31.7 0.00 -31.7 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 9.22 ± 9% -9.2 0.00 -9.2 0.00 perf-profile.self.cycles-pp.uncharge_folio 6.50 ± 2% -6.5 0.00 -6.5 0.00 perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 5.56 ± 9% -5.6 0.00 -5.6 0.00 perf-profile.self.cycles-pp.__memcg_kmem_charge_page 1.94 ± 4% -1.9 0.08 ± 8% -1.9 0.08 ± 7% perf-profile.self.cycles-pp.page_counter_uncharge 1.36 ± 16% -1.3 0.09 ± 4% -1.3 0.09 ± 4% perf-profile.self.cycles-pp.native_irq_return_iret 0.16 ± 9% -0.1 0.07 ± 7% -0.1 0.07 perf-profile.self.cycles-pp.__slab_free 0.10 ± 8% -0.0 0.07 ± 6% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.rcu_cblist_dequeue 0.07 ± 7% +0.0 0.08 ± 5% +0.0 0.08 ± 7% perf-profile.self.cycles-pp.page_counter_try_charge 0.00 +0.1 0.06 ± 7% +0.1 0.07 ± 7% perf-profile.self.cycles-pp.native_flush_tlb_one_user 0.01 ±264% +0.1 0.07 ± 4% +0.1 0.07 perf-profile.self.cycles-pp.rcu_all_qs 0.00 +0.1 0.07 ± 4% +0.1 0.07 ± 4% perf-profile.self.cycles-pp.__do_huge_pmd_anonymous_page 0.00 +0.1 0.08 ± 6% +0.1 0.08 ± 6% perf-profile.self.cycles-pp.prep_compound_page 0.00 +0.1 0.08 ± 5% +0.1 0.08 ± 6% perf-profile.self.cycles-pp.__kmem_cache_alloc_bulk 0.00 +0.1 0.13 ± 2% +0.1 0.13 ± 2% perf-profile.self.cycles-pp._raw_spin_lock 0.00 +0.2 0.18 ± 3% +0.2 0.18 ± 4% perf-profile.self.cycles-pp.__mod_node_page_state 0.00 +0.3 0.30 ± 2% +0.3 0.30 perf-profile.self.cycles-pp.free_unref_page_prepare 0.00 +0.6 0.58 ± 3% +0.6 0.58 ± 5% perf-profile.self.cycles-pp.clear_huge_page 0.08 ± 4% +1.2 1.25 +1.2 1.24 ± 4% perf-profile.self.cycles-pp.__cond_resched 0.05 ± 9% +72.8 72.81 +73.2 73.26 perf-profile.self.cycles-pp.clear_page_erms [-- Attachment #4: phoronix-regressions --] [-- Type: text/plain, Size: 37812 bytes --] (10) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 6787 -2.9% 6592 -2.9% 6589 vmstat.system.cs 0.18 ± 23% -0.0 0.15 ± 44% -0.1 0.12 ± 23% perf-profile.children.cycles-pp.get_next_timer_interrupt 0.08 ± 49% +0.1 0.15 ± 16% +0.0 0.08 ± 61% perf-profile.self.cycles-pp.ct_kernel_enter 352936 +42.1% 501525 +6.9% 377117 meminfo.AnonHugePages 518885 +26.2% 654716 -2.1% 508198 meminfo.AnonPages 1334861 +11.4% 1486492 -0.9% 1322775 meminfo.Inactive(anon) 1.51 -0.1 1.45 -0.1 1.46 turbostat.C1E% 24.23 -1.2% 23.93 -0.7% 24.05 turbostat.CorWatt 2.64 -4.4% 2.52 -4.3% 2.53 turbostat.Pkg%pc2 25.40 -1.3% 25.06 -0.9% 25.18 turbostat.PkgWatt 3.30 -2.8% 3.20 -2.9% 3.20 turbostat.RAMWatt 20115 -4.5% 19211 -4.5% 19217 phoronix-test-suite.ramspeed.Add.Integer.mb_s 284.00 +3.5% 293.95 +3.5% 293.96 phoronix-test-suite.time.elapsed_time 284.00 +3.5% 293.95 +3.5% 293.96 phoronix-test-suite.time.elapsed_time.max 120322 +1.6% 122291 -0.2% 120098 phoronix-test-suite.time.maximum_resident_set_size 281626 -54.7% 127627 -54.7% 127530 phoronix-test-suite.time.minor_page_faults 259.16 +4.2% 270.02 +4.1% 269.86 phoronix-test-suite.time.user_time 284.00 +3.5% 293.95 +3.5% 293.96 time.elapsed_time 284.00 +3.5% 293.95 +3.5% 293.96 time.elapsed_time.max 120322 +1.6% 122291 -0.2% 120098 time.maximum_resident_set_size 281626 -54.7% 127627 -54.7% 127530 time.minor_page_faults 1.72 -7.6% 1.59 -7.2% 1.60 time.system_time 259.16 +4.2% 270.02 +4.1% 269.86 time.user_time 129720 +26.2% 163681 -2.1% 127047 proc-vmstat.nr_anon_pages 172.33 +42.1% 244.89 +6.8% 184.14 proc-vmstat.nr_anon_transparent_hugepages 360027 -1.0% 356428 +0.1% 360507 proc-vmstat.nr_dirty_background_threshold 720935 -1.0% 713729 +0.1% 721897 proc-vmstat.nr_dirty_threshold 3328684 -1.1% 3292559 +0.1% 3333390 proc-vmstat.nr_free_pages 333715 +11.4% 371625 -0.9% 330692 proc-vmstat.nr_inactive_anon 1732 +5.1% 1820 +4.8% 1816 proc-vmstat.nr_page_table_pages 333715 +11.4% 371625 -0.9% 330692 proc-vmstat.nr_zone_inactive_anon 855883 -34.6% 560138 -34.9% 557459 proc-vmstat.numa_hit 855859 -34.6% 560157 -34.9% 557429 proc-vmstat.numa_local 5552895 +1.1% 5611662 +0.1% 5559236 proc-vmstat.pgalloc_normal 1080638 -26.7% 792254 -27.0% 788881 proc-vmstat.pgfault 109646 +3.0% 112918 +2.6% 112483 proc-vmstat.pgreuse 9026 +7.6% 9714 +6.6% 9619 proc-vmstat.thp_fault_alloc 1.165e+08 -3.6% 1.123e+08 -3.3% 1.126e+08 perf-stat.i.branch-instructions 3.38 +0.1 3.45 +0.1 3.49 perf-stat.i.branch-miss-rate% 4.13e+08 -2.7% 4.018e+08 -2.9% 4.011e+08 perf-stat.i.cache-misses 5.336e+08 -2.3% 5.212e+08 -2.4% 5.206e+08 perf-stat.i.cache-references 6824 -2.9% 6629 -2.9% 6624 perf-stat.i.context-switches 4.05 +3.8% 4.20 +3.7% 4.20 perf-stat.i.cpi 447744 ± 3% -17.3% 370369 ± 3% -15.0% 380580 perf-stat.i.dTLB-load-misses 1.119e+09 -3.3% 1.082e+09 -3.4% 1.081e+09 perf-stat.i.dTLB-loads 0.02 ± 10% -0.0 0.01 ± 14% -0.0 0.01 ± 3% perf-stat.i.dTLB-store-miss-rate% 84207 ± 7% -58.4% 35034 ± 13% -55.8% 37210 ± 2% perf-stat.i.dTLB-store-misses 7.312e+08 -3.3% 7.069e+08 -3.4% 7.065e+08 perf-stat.i.dTLB-stores 127863 -2.8% 124330 -3.6% 123263 perf-stat.i.iTLB-load-misses 145042 -2.5% 141459 -3.0% 140719 perf-stat.i.iTLB-loads 2.393e+09 -3.3% 2.313e+09 -3.4% 2.313e+09 perf-stat.i.instructions 0.28 -3.9% 0.27 -3.7% 0.27 perf-stat.i.ipc 220.56 -3.0% 213.92 -3.1% 213.80 perf-stat.i.metric.M/sec 3580 -31.0% 2470 -30.9% 2476 perf-stat.i.minor-faults 49017829 +2.1% 50065997 +2.1% 50037948 perf-stat.i.node-loads 98043570 -2.7% 95377592 -2.9% 95180579 perf-stat.i.node-stores 3585 -31.0% 2474 -30.8% 2480 perf-stat.i.page-faults 3.64 +3.8% 3.78 +3.8% 3.78 perf-stat.overall.cpi 21.10 +3.2% 21.77 +3.3% 21.79 perf-stat.overall.cycles-between-cache-misses 0.04 ± 3% -0.0 0.03 ± 3% -0.0 0.04 perf-stat.overall.dTLB-load-miss-rate% 0.01 ± 7% -0.0 0.00 ± 13% -0.0 0.01 ± 2% perf-stat.overall.dTLB-store-miss-rate% 0.27 -3.7% 0.26 -3.7% 0.26 perf-stat.overall.ipc 1.16e+08 -3.6% 1.119e+08 -3.3% 1.121e+08 perf-stat.ps.branch-instructions 4.117e+08 -2.7% 4.006e+08 -2.9% 3.999e+08 perf-stat.ps.cache-misses 5.319e+08 -2.3% 5.195e+08 -2.4% 5.19e+08 perf-stat.ps.cache-references 6798 -2.8% 6605 -2.9% 6600 perf-stat.ps.context-switches 446139 ± 3% -17.3% 369055 ± 3% -15.0% 379224 perf-stat.ps.dTLB-load-misses 1.115e+09 -3.3% 1.078e+09 -3.4% 1.078e+09 perf-stat.ps.dTLB-loads 83922 ± 7% -58.4% 34908 ± 13% -55.8% 37075 ± 2% perf-stat.ps.dTLB-store-misses 7.288e+08 -3.3% 7.047e+08 -3.4% 7.042e+08 perf-stat.ps.dTLB-stores 127384 -2.7% 123884 -3.6% 122817 perf-stat.ps.iTLB-load-misses 144399 -2.4% 140903 -2.9% 140152 perf-stat.ps.iTLB-loads 2.385e+09 -3.3% 2.306e+09 -3.4% 2.305e+09 perf-stat.ps.instructions 3566 -31.0% 2460 -30.9% 2465 perf-stat.ps.minor-faults 48864755 +2.1% 49912372 +2.1% 49884745 perf-stat.ps.node-loads 97730481 -2.7% 95083043 -2.9% 94887981 perf-stat.ps.node-stores 3571 -31.0% 2465 -30.8% 2470 perf-stat.ps.page-faults (11) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 6853 -2.6% 6678 -2.7% 6668 vmstat.system.cs 353760 +40.0% 495232 +6.4% 376514 meminfo.AnonHugePages 519691 +25.5% 652412 -2.1% 508766 meminfo.AnonPages 1335612 +11.1% 1484265 -0.9% 1323541 meminfo.Inactive(anon) 1.52 -0.0 1.48 -0.0 1.48 turbostat.C1E% 2.65 -3.0% 2.57 -2.8% 2.58 turbostat.Pkg%pc2 3.32 -2.6% 3.23 -2.6% 3.23 turbostat.RAMWatt 19960 -2.9% 19378 -3.0% 19366 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s 281.37 +3.0% 289.87 +3.1% 290.12 phoronix-test-suite.time.elapsed_time 281.37 +3.0% 289.87 +3.1% 290.12 phoronix-test-suite.time.elapsed_time.max 120220 +1.6% 122163 -0.1% 120158 phoronix-test-suite.time.maximum_resident_set_size 281853 -54.7% 127777 -54.7% 127780 phoronix-test-suite.time.minor_page_faults 257.32 +3.4% 265.97 +3.4% 265.99 phoronix-test-suite.time.user_time 281.37 +3.0% 289.87 +3.1% 290.12 time.elapsed_time 281.37 +3.0% 289.87 +3.1% 290.12 time.elapsed_time.max 120220 +1.6% 122163 -0.1% 120158 time.maximum_resident_set_size 281853 -54.7% 127777 -54.7% 127780 time.minor_page_faults 1.74 -8.5% 1.59 -9.1% 1.58 time.system_time 257.32 +3.4% 265.97 +3.4% 265.99 time.user_time 0.80 ± 23% -0.4 0.41 ± 78% -0.3 0.54 ± 40% perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 0.79 ± 21% -0.4 0.40 ± 77% -0.3 0.54 ± 39% perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.77 ± 20% -0.4 0.40 ± 77% -0.3 0.52 ± 39% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault 1.39 ± 15% -0.3 1.04 ± 22% -0.2 1.20 ± 14% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 1.39 ± 15% -0.3 1.04 ± 21% -0.2 1.20 ± 14% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.80 ± 23% -0.3 0.55 ± 29% -0.2 0.60 ± 16% perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page 0.79 ± 21% -0.3 0.54 ± 28% -0.2 0.60 ± 16% perf-profile.children.cycles-pp.clear_huge_page 0.79 ± 20% -0.2 0.58 ± 31% -0.2 0.58 ± 17% perf-profile.children.cycles-pp.clear_page_erms 0.78 ± 20% -0.2 0.58 ± 31% -0.2 0.58 ± 17% perf-profile.self.cycles-pp.clear_page_erms 129919 +25.5% 163102 -2.1% 127191 proc-vmstat.nr_anon_pages 172.73 +40.0% 241.81 +6.4% 183.84 proc-vmstat.nr_anon_transparent_hugepages 3328013 -1.1% 3291433 +0.1% 3332863 proc-vmstat.nr_free_pages 333903 +11.1% 371065 -0.9% 330885 proc-vmstat.nr_inactive_anon 1740 +4.5% 1819 +4.4% 1817 proc-vmstat.nr_page_table_pages 333903 +11.1% 371065 -0.9% 330885 proc-vmstat.nr_zone_inactive_anon 853676 -34.9% 556019 -34.7% 557219 proc-vmstat.numa_hit 853653 -34.9% 555977 -34.7% 557192 proc-vmstat.numa_local 5551461 +1.0% 5607022 +0.1% 5559594 proc-vmstat.pgalloc_normal 1075659 -27.0% 785124 -26.9% 786363 proc-vmstat.pgfault 108727 +2.6% 111582 +2.6% 111546 proc-vmstat.pgreuse 9027 +7.6% 9714 +6.6% 9619 proc-vmstat.thp_fault_alloc 1.184e+08 -3.3% 1.145e+08 -3.2% 1.146e+08 perf-stat.i.branch-instructions 5500836 -2.4% 5367239 -2.4% 5368946 perf-stat.i.branch-misses 4.139e+08 -2.5% 4.036e+08 -2.6% 4.034e+08 perf-stat.i.cache-misses 5.246e+08 -2.5% 5.114e+08 -2.5% 5.117e+08 perf-stat.i.cache-references 6889 -2.6% 6710 -2.6% 6710 perf-stat.i.context-switches 4.31 +2.6% 4.42 +2.7% 4.43 perf-stat.i.cpi 0.10 ± 2% -0.0 0.09 ± 2% -0.0 0.08 ± 3% perf-stat.i.dTLB-load-miss-rate% 454444 -16.1% 381426 -18.4% 370782 ± 3% perf-stat.i.dTLB-load-misses 8.087e+08 -3.0% 7.841e+08 -3.1% 7.839e+08 perf-stat.i.dTLB-loads 0.02 -0.0 0.01 ± 2% -0.0 0.01 ± 14% perf-stat.i.dTLB-store-miss-rate% 86294 -57.1% 36992 ± 2% -59.7% 34809 ± 13% perf-stat.i.dTLB-store-misses 5.311e+08 -3.0% 5.151e+08 -3.1% 5.149e+08 perf-stat.i.dTLB-stores 129929 -4.0% 124682 -3.3% 125639 perf-stat.i.iTLB-load-misses 146749 -3.3% 141975 -3.7% 141337 perf-stat.i.iTLB-loads 2.249e+09 -3.1% 2.18e+09 -3.1% 2.179e+09 perf-stat.i.instructions 0.26 -3.0% 0.25 -2.9% 0.25 perf-stat.i.ipc 179.65 -2.7% 174.83 -2.7% 174.79 perf-stat.i.metric.M/sec 3614 -31.4% 2478 -31.1% 2490 perf-stat.i.minor-faults 65665882 -0.5% 65367211 -0.8% 65111743 perf-stat.i.node-loads 3618 -31.4% 2483 -31.1% 2494 perf-stat.i.page-faults 3.88 +3.3% 4.01 +3.3% 4.01 perf-stat.overall.cpi 21.10 +2.7% 21.67 +2.7% 21.67 perf-stat.overall.cycles-between-cache-misses 0.06 -0.0 0.05 -0.0 0.05 ± 3% perf-stat.overall.dTLB-load-miss-rate% 0.02 -0.0 0.01 ± 2% -0.0 0.01 ± 13% perf-stat.overall.dTLB-store-miss-rate% 0.26 -3.2% 0.25 -3.2% 0.25 perf-stat.overall.ipc 1.179e+08 -3.3% 1.14e+08 -3.2% 1.141e+08 perf-stat.ps.branch-instructions 5473781 -2.4% 5340720 -2.4% 5344770 perf-stat.ps.branch-misses 4.126e+08 -2.5% 4.023e+08 -2.5% 4.021e+08 perf-stat.ps.cache-misses 5.229e+08 -2.5% 5.098e+08 -2.5% 5.1e+08 perf-stat.ps.cache-references 6864 -2.6% 6687 -2.6% 6687 perf-stat.ps.context-switches 452799 -16.1% 380049 -18.4% 369456 ± 3% perf-stat.ps.dTLB-load-misses 8.06e+08 -3.0% 7.815e+08 -3.1% 7.814e+08 perf-stat.ps.dTLB-loads 85997 -57.1% 36856 ± 2% -59.7% 34683 ± 13% perf-stat.ps.dTLB-store-misses 5.294e+08 -3.0% 5.135e+08 -3.0% 5.133e+08 perf-stat.ps.dTLB-stores 129440 -4.0% 124225 -3.3% 125181 perf-stat.ps.iTLB-load-misses 146145 -3.2% 141400 -3.7% 140780 perf-stat.ps.iTLB-loads 2.241e+09 -3.1% 2.172e+09 -3.1% 2.172e+09 perf-stat.ps.instructions 3599 -31.4% 2468 -31.1% 2479 perf-stat.ps.minor-faults 65457458 -0.5% 65162312 -0.8% 64909293 perf-stat.ps.node-loads 3604 -31.4% 2472 -31.1% 2484 perf-stat.ps.page-faults (12) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 607.38 ± 15% -24.4% 459.12 ± 24% -6.0% 570.75 ± 5% perf-c2c.DRAM.local 6801 -3.4% 6570 -3.1% 6587 vmstat.system.cs 15155 -0.9% 15024 -0.7% 15046 vmstat.system.in 353771 +43.0% 505977 ± 3% +7.1% 378972 meminfo.AnonHugePages 518698 +26.5% 656280 -1.7% 509920 meminfo.AnonPages 1334737 +11.5% 1487919 -0.8% 1324549 meminfo.Inactive(anon) 1.50 -0.1 1.45 -0.1 1.45 turbostat.C1E% 2.64 -4.0% 2.54 -2.8% 2.57 turbostat.Pkg%pc2 25.32 -1.1% 25.06 -0.6% 25.17 turbostat.PkgWatt 3.30 -3.0% 3.20 -2.8% 3.20 turbostat.RAMWatt 1.25 ± 8% -0.3 0.96 ± 16% -0.1 1.15 ± 22% perf-profile.children.cycles-pp.do_user_addr_fault 1.25 ± 8% -0.3 0.96 ± 16% -0.1 1.15 ± 22% perf-profile.children.cycles-pp.exc_page_fault 1.15 ± 9% -0.3 0.88 ± 16% -0.1 1.02 ± 22% perf-profile.children.cycles-pp.__handle_mm_fault 1.18 ± 9% -0.3 0.91 ± 15% -0.1 1.06 ± 21% perf-profile.children.cycles-pp.handle_mm_fault 0.23 ± 19% +0.1 0.32 ± 18% +0.1 0.33 ± 20% perf-profile.children.cycles-pp.exit_mmap 0.23 ± 19% +0.1 0.32 ± 18% +0.1 0.33 ± 20% perf-profile.children.cycles-pp.__mmput 19667 -6.4% 18399 -6.4% 18413 phoronix-test-suite.ramspeed.Triad.Integer.mb_s 284.07 +3.7% 294.53 +3.4% 293.86 phoronix-test-suite.time.elapsed_time 284.07 +3.7% 294.53 +3.4% 293.86 phoronix-test-suite.time.elapsed_time.max 120102 +1.8% 122256 +0.1% 120265 phoronix-test-suite.time.maximum_resident_set_size 281737 -54.7% 127624 -54.7% 127574 phoronix-test-suite.time.minor_page_faults 259.49 +4.1% 270.20 +4.1% 270.14 phoronix-test-suite.time.user_time 284.07 +3.7% 294.53 +3.4% 293.86 time.elapsed_time 284.07 +3.7% 294.53 +3.4% 293.86 time.elapsed_time.max 120102 +1.8% 122256 +0.1% 120265 time.maximum_resident_set_size 281737 -54.7% 127624 -54.7% 127574 time.minor_page_faults 1.72 -8.1% 1.58 -8.4% 1.58 time.system_time 259.49 +4.1% 270.20 +4.1% 270.14 time.user_time 129673 +26.5% 164074 -1.7% 127482 proc-vmstat.nr_anon_pages 172.74 +43.0% 247.07 ± 3% +7.1% 185.05 proc-vmstat.nr_anon_transparent_hugepages 360059 -1.0% 356437 +0.1% 360424 proc-vmstat.nr_dirty_background_threshold 720999 -1.0% 713747 +0.1% 721730 proc-vmstat.nr_dirty_threshold 3328170 -1.1% 3291542 +0.1% 3330837 proc-vmstat.nr_free_pages 333684 +11.5% 371981 -0.8% 331138 proc-vmstat.nr_inactive_anon 1735 +5.0% 1822 +4.9% 1819 proc-vmstat.nr_page_table_pages 333684 +11.5% 371981 -0.8% 331138 proc-vmstat.nr_zone_inactive_anon 857533 -34.7% 559940 -34.6% 560503 proc-vmstat.numa_hit 857463 -34.7% 560233 -34.6% 560504 proc-vmstat.numa_local 1082386 -26.7% 793742 -26.9% 791272 proc-vmstat.pgfault 109917 +2.8% 113044 +2.4% 112517 proc-vmstat.pgreuse 9028 +7.5% 9707 +6.5% 9619 proc-vmstat.thp_fault_alloc 1.168e+08 -6.9% 1.087e+08 ± 9% -3.5% 1.127e+08 perf-stat.i.branch-instructions 3.39 +0.1 3.47 +0.1 3.47 perf-stat.i.branch-miss-rate% 5431805 -8.1% 4990354 ± 15% -2.7% 5285279 perf-stat.i.branch-misses 4.13e+08 -3.1% 4.004e+08 -2.8% 4.015e+08 perf-stat.i.cache-misses 5.338e+08 -2.6% 5.196e+08 -2.4% 5.211e+08 perf-stat.i.cache-references 6835 -3.4% 6604 -3.1% 6623 perf-stat.i.context-switches 4.05 +3.8% 4.21 +3.6% 4.20 perf-stat.i.cpi 60.96 ± 7% +0.4% 61.20 ± 12% -7.7% 56.27 ± 3% perf-stat.i.cycles-between-cache-misses 0.08 ± 3% -0.0 0.08 ± 6% -0.0 0.08 ± 4% perf-stat.i.dTLB-load-miss-rate% 455317 -16.9% 378574 -16.7% 379148 perf-stat.i.dTLB-load-misses 1.118e+09 -3.8% 1.076e+09 -3.3% 1.082e+09 perf-stat.i.dTLB-loads 0.02 -0.0 0.01 ± 6% -0.0 0.01 ± 2% perf-stat.i.dTLB-store-miss-rate% 86796 -57.3% 37100 ± 2% -57.3% 37097 ± 2% perf-stat.i.dTLB-store-misses 7.31e+08 -3.7% 7.04e+08 -3.3% 7.068e+08 perf-stat.i.dTLB-stores 128995 -3.1% 125030 ± 2% -4.4% 123280 perf-stat.i.iTLB-load-misses 145739 -4.0% 139945 -3.7% 140348 perf-stat.i.iTLB-loads 2.395e+09 -4.3% 2.291e+09 ± 2% -3.4% 2.314e+09 perf-stat.i.instructions 0.28 -4.2% 0.27 -3.9% 0.27 perf-stat.i.ipc 30.30 ± 6% -11.5% 26.81 ± 6% -21.3% 23.84 ± 12% perf-stat.i.metric.K/sec 220.55 -3.5% 212.73 -3.0% 213.94 perf-stat.i.metric.M/sec 3598 -31.3% 2473 -31.5% 2466 perf-stat.i.minor-faults 49026239 +1.9% 49938429 +2.0% 50024868 perf-stat.i.node-loads 98013334 -3.0% 95053521 -2.8% 95291354 perf-stat.i.node-stores 3602 -31.2% 2477 -31.4% 2470 perf-stat.i.page-faults 3.64 +4.6% 3.81 +3.9% 3.78 perf-stat.overall.cpi 21.09 +3.2% 21.76 +3.3% 21.78 perf-stat.overall.cycles-between-cache-misses 0.04 -0.0 0.04 -0.0 0.04 perf-stat.overall.dTLB-load-miss-rate% 0.01 -0.0 0.01 ± 2% -0.0 0.01 ± 2% perf-stat.overall.dTLB-store-miss-rate% 0.27 -4.3% 0.26 -3.7% 0.26 perf-stat.overall.ipc 1.163e+08 -6.9% 1.083e+08 ± 9% -3.5% 1.122e+08 perf-stat.ps.branch-instructions 5405065 -8.1% 4967211 ± 15% -2.7% 5259197 perf-stat.ps.branch-misses 4.117e+08 -3.0% 3.992e+08 -2.8% 4.003e+08 perf-stat.ps.cache-misses 5.321e+08 -2.6% 5.18e+08 -2.4% 5.195e+08 perf-stat.ps.cache-references 6810 -3.4% 6579 -3.1% 6599 perf-stat.ps.context-switches 453677 -16.9% 377215 -16.7% 377792 perf-stat.ps.dTLB-load-misses 1.115e+09 -3.8% 1.072e+09 -3.3% 1.078e+09 perf-stat.ps.dTLB-loads 86500 -57.3% 36965 ± 2% -57.3% 36962 ± 2% perf-stat.ps.dTLB-store-misses 7.286e+08 -3.7% 7.019e+08 -3.3% 7.045e+08 perf-stat.ps.dTLB-stores 128515 -3.1% 124573 ± 2% -4.4% 122831 perf-stat.ps.iTLB-load-misses 145145 -4.0% 139336 -3.7% 139772 perf-stat.ps.iTLB-loads 2.386e+09 -4.3% 2.283e+09 ± 2% -3.4% 2.306e+09 perf-stat.ps.instructions 3583 -31.3% 2462 -31.5% 2455 perf-stat.ps.minor-faults 48873391 +1.9% 49781212 +2.0% 49874192 perf-stat.ps.node-loads 97704914 -3.0% 94765417 -2.8% 94999974 perf-stat.ps.node-stores 3588 -31.2% 2467 -31.4% 2460 perf-stat.ps.page-faults (13) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 6786 -2.9% 6587 -2.9% 6586 vmstat.system.cs 355264 ± 2% +41.1% 501244 +6.5% 378393 meminfo.AnonHugePages 520377 +25.7% 654330 -2.1% 509644 meminfo.AnonPages 1336461 +11.2% 1486141 -0.9% 1324302 meminfo.Inactive(anon) 1.50 -0.0 1.46 -0.1 1.45 turbostat.C1E% 24.20 -1.2% 23.90 -0.9% 23.98 turbostat.CorWatt 2.62 -2.4% 2.56 -3.7% 2.53 turbostat.Pkg%pc2 25.37 -1.3% 25.03 -1.0% 25.12 turbostat.PkgWatt 3.30 -3.1% 3.20 -3.0% 3.20 turbostat.RAMWatt 19799 -3.5% 19106 -3.4% 19117 phoronix-test-suite.ramspeed.Average.Integer.mb_s 283.91 +3.7% 294.40 +3.6% 294.12 phoronix-test-suite.time.elapsed_time 283.91 +3.7% 294.40 +3.6% 294.12 phoronix-test-suite.time.elapsed_time.max 120150 +1.7% 122196 +0.2% 120373 phoronix-test-suite.time.maximum_resident_set_size 281692 -54.7% 127689 -54.7% 127587 phoronix-test-suite.time.minor_page_faults 259.47 +4.1% 270.04 +4.0% 269.86 phoronix-test-suite.time.user_time 283.91 +3.7% 294.40 +3.6% 294.12 time.elapsed_time 283.91 +3.7% 294.40 +3.6% 294.12 time.elapsed_time.max 120150 +1.7% 122196 +0.2% 120373 time.maximum_resident_set_size 281692 -54.7% 127689 -54.7% 127587 time.minor_page_faults 1.72 -7.9% 1.58 -8.4% 1.58 time.system_time 259.47 +4.1% 270.04 +4.0% 269.86 time.user_time 130092 +25.7% 163578 -2.1% 127411 proc-vmstat.nr_anon_pages 173.47 ± 2% +41.1% 244.74 +6.5% 184.76 proc-vmstat.nr_anon_transparent_hugepages 3328419 -1.1% 3292662 +0.1% 3332791 proc-vmstat.nr_free_pages 334114 +11.2% 371530 -0.9% 331076 proc-vmstat.nr_inactive_anon 1732 +4.7% 1814 +5.2% 1823 proc-vmstat.nr_page_table_pages 334114 +11.2% 371530 -0.9% 331076 proc-vmstat.nr_zone_inactive_anon 853734 -34.6% 558669 -34.2% 562087 proc-vmstat.numa_hit 853524 -34.6% 558628 -34.1% 562074 proc-vmstat.numa_local 5551673 +1.0% 5609595 +0.2% 5564708 proc-vmstat.pgalloc_normal 1077693 -26.6% 791019 -26.3% 794706 proc-vmstat.pgfault 109591 +3.1% 112941 +2.9% 112795 proc-vmstat.pgreuse 9027 +7.6% 9714 +6.6% 9619 proc-vmstat.thp_fault_alloc 1.58 ± 16% -0.5 1.08 ± 8% -0.4 1.16 ± 24% perf-profile.calltrace.cycles-pp.asm_exc_page_fault 1.42 ± 14% -0.4 0.97 ± 9% -0.4 1.05 ± 24% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.42 ± 14% -0.4 0.98 ± 8% -0.4 1.05 ± 24% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 1.32 ± 14% -0.4 0.91 ± 12% -0.3 0.98 ± 26% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.30 ± 13% -0.4 0.88 ± 13% -0.4 0.94 ± 26% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 1.64 ± 16% -0.5 1.12 ± 9% -0.4 1.24 ± 22% perf-profile.children.cycles-pp.asm_exc_page_fault 1.48 ± 15% -0.5 1.01 ± 10% -0.4 1.12 ± 21% perf-profile.children.cycles-pp.do_user_addr_fault 1.49 ± 14% -0.5 1.02 ± 9% -0.4 1.12 ± 21% perf-profile.children.cycles-pp.exc_page_fault 1.37 ± 14% -0.4 0.94 ± 12% -0.3 1.05 ± 22% perf-profile.children.cycles-pp.handle_mm_fault 1.34 ± 13% -0.4 0.91 ± 13% -0.3 1.00 ± 23% perf-profile.children.cycles-pp.__handle_mm_fault 0.78 ± 20% -0.3 0.50 ± 20% -0.2 0.54 ± 33% perf-profile.children.cycles-pp.clear_page_erms 0.76 ± 20% -0.3 0.50 ± 22% -0.2 0.53 ± 34% perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page 0.75 ± 20% -0.2 0.50 ± 23% -0.2 0.53 ± 33% perf-profile.children.cycles-pp.clear_huge_page 0.25 ± 28% +0.0 0.28 ± 77% -0.1 0.11 ± 52% perf-profile.children.cycles-pp.ret_from_fork_asm 0.24 ± 28% +0.0 0.28 ± 77% -0.1 0.11 ± 52% perf-profile.children.cycles-pp.ret_from_fork 0.23 ± 31% +0.0 0.28 ± 78% -0.1 0.09 ± 59% perf-profile.children.cycles-pp.kthread 0.77 ± 20% -0.3 0.50 ± 18% -0.2 0.54 ± 33% perf-profile.self.cycles-pp.clear_page_erms 1.166e+08 -3.3% 1.127e+08 -3.0% 1.131e+08 perf-stat.i.branch-instructions 3.39 +0.1 3.49 +0.1 3.46 perf-stat.i.branch-miss-rate% 5415570 -2.0% 5304890 -2.0% 5306531 perf-stat.i.branch-misses 4.133e+08 -3.1% 4.005e+08 -2.9% 4.014e+08 perf-stat.i.cache-misses 5.335e+08 -2.5% 5.203e+08 -2.4% 5.209e+08 perf-stat.i.cache-references 6825 -3.1% 6616 -3.1% 6614 perf-stat.i.context-switches 4.06 +3.5% 4.20 +3.3% 4.19 perf-stat.i.cpi 0.08 ± 3% -0.0 0.08 ± 2% -0.0 0.08 ± 2% perf-stat.i.dTLB-load-miss-rate% 451852 -17.2% 374167 ± 4% -16.1% 378935 perf-stat.i.dTLB-load-misses 1.12e+09 -3.7% 1.079e+09 -3.5% 1.081e+09 perf-stat.i.dTLB-loads 0.02 -0.0 0.01 ± 13% -0.0 0.01 perf-stat.i.dTLB-store-miss-rate% 86119 -59.0% 35274 ± 13% -57.5% 36598 perf-stat.i.dTLB-store-misses 7.319e+08 -3.7% 7.049e+08 -3.5% 7.066e+08 perf-stat.i.dTLB-stores 128297 -2.6% 124925 -3.6% 123631 perf-stat.i.iTLB-load-misses 2.395e+09 -3.6% 2.309e+09 -3.4% 2.315e+09 perf-stat.i.instructions 0.28 -3.4% 0.27 -3.4% 0.27 perf-stat.i.ipc 220.76 -3.3% 213.44 -3.1% 213.87 perf-stat.i.metric.M/sec 3575 -30.9% 2470 -30.4% 2487 perf-stat.i.minor-faults 49267237 +1.1% 49805411 +1.4% 49954320 perf-stat.i.node-loads 98097080 -3.1% 95014639 -2.8% 95307489 perf-stat.i.node-stores 3579 -30.9% 2475 -30.4% 2492 perf-stat.i.page-faults 4.64 +0.1 4.71 +0.0 4.69 perf-stat.overall.branch-miss-rate% 3.64 +3.8% 3.78 +3.7% 3.78 perf-stat.overall.cpi 21.10 +3.3% 21.80 +3.2% 21.78 perf-stat.overall.cycles-between-cache-misses 0.04 -0.0 0.03 ± 4% -0.0 0.04 perf-stat.overall.dTLB-load-miss-rate% 0.01 -0.0 0.01 ± 13% -0.0 0.01 perf-stat.overall.dTLB-store-miss-rate% 0.27 -3.7% 0.26 -3.6% 0.26 perf-stat.overall.ipc 1.161e+08 -3.3% 1.122e+08 -3.0% 1.126e+08 perf-stat.ps.branch-instructions 5390667 -2.1% 5280037 -2.0% 5282651 perf-stat.ps.branch-misses 4.12e+08 -3.1% 3.993e+08 -2.9% 4.001e+08 perf-stat.ps.cache-misses 5.318e+08 -2.5% 5.187e+08 -2.3% 5.193e+08 perf-stat.ps.cache-references 6801 -3.1% 6593 -3.0% 6595 perf-stat.ps.context-switches 450236 -17.2% 372836 ± 4% -16.1% 377601 perf-stat.ps.dTLB-load-misses 1.117e+09 -3.7% 1.075e+09 -3.5% 1.078e+09 perf-stat.ps.dTLB-loads 85824 -59.0% 35147 ± 13% -57.5% 36467 perf-stat.ps.dTLB-store-misses 7.295e+08 -3.7% 7.027e+08 -3.4% 7.044e+08 perf-stat.ps.dTLB-stores 127825 -2.6% 124475 -3.6% 123194 perf-stat.ps.iTLB-load-misses 2.387e+09 -3.6% 2.302e+09 -3.3% 2.307e+09 perf-stat.ps.instructions 3561 -30.9% 2460 -30.4% 2478 perf-stat.ps.minor-faults 49109319 +1.1% 49654078 +1.4% 49800339 perf-stat.ps.node-loads 97782680 -3.1% 94720369 -2.8% 95009401 perf-stat.ps.node-stores 3566 -30.9% 2465 -30.4% 2482 perf-stat.ps.page-faults ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2024-01-05 9:29 ` Oliver Sang @ 2024-01-05 14:52 ` Yin, Fengwei 2024-01-05 18:49 ` Yang Shi 1 sibling, 0 replies; 24+ messages in thread From: Yin, Fengwei @ 2024-01-05 14:52 UTC (permalink / raw) To: Oliver Sang, Yang Shi Cc: Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On 1/5/2024 5:29 PM, Oliver Sang wrote: > hi, Yang Shi, > > On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote: >> hi, Fengwei, hi, Yang Shi, >> >> On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote: >>> >>> On 2024/1/4 09:32, Yang Shi wrote: >> >> ... >> >>>> Can you please help test the below patch? >>> I can't access the testing box now. Oliver will help to test your patch. >>> >> >> since now the commit-id of >> 'mm: align larger anonymous mappings on THP boundaries' >> in linux-next/master is efa7df3e3bb5d >> I applied the patch like below: >> >> * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi >> * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries >> * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi >> >> our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression >> so far, I will test d8d7b1dae6f03 for all these tests. Thanks >> > > we got 12 regressions and 1 improvement results for efa7df3e3b so far. > (4 regressions are just similar to what we reported for 1111d46b5c). > by your patch, 6 of those regressions are fixed, others are not impacted. > > below is a summary: > > No. testsuite test status-on-efa7df3e3b fix-by-d8d7b1dae6 ? > === ========= ==== ==================== =================== > (1) stress-ng numa regression NO > (2) pthread regression yes (on a Ice Lake server) > (3) pthread regression yes (on a Cascade Lake desktop) > (4) will-it-scale malloc1 regression NO > (5) page_fault1 improvement no (so still improvement) > (6) vm-scalability anon-w-seq-mt regression yes > (7) stream nr_threads=25% regression yes > (8) nr_threads=50% regression yes > (9) phoronix osbench.CreateThreads regression yes (on a Cascade Lake server) > (10) ramspeed.Add.Integer regression NO (and below 3, on a Coffee Lake desktop) > (11) ramspeed.Average.FloatingPoint regression NO > (12) ramspeed.Triad.Integer regression NO > (13) ramspeed.Average.Integer regression NO Hints on ramspeed just for your reference: I did standalone ramspeed (not phoronix) testing on a IceLake 48C/96T + 192GB memory and didn't see the regressions on that testing box (The testing box was retired at the end of last year and can't be accessed anymore). Regards Yin, Fengwei ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression 2024-01-05 9:29 ` Oliver Sang 2024-01-05 14:52 ` Yin, Fengwei @ 2024-01-05 18:49 ` Yang Shi 1 sibling, 0 replies; 24+ messages in thread From: Yang Shi @ 2024-01-05 18:49 UTC (permalink / raw) To: Oliver Sang Cc: Yin Fengwei, Rik van Riel, oe-lkp, lkp, Linux Memory Management List, Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang, feng.tang On Fri, Jan 5, 2024 at 1:29 AM Oliver Sang <oliver.sang@intel.com> wrote: > > hi, Yang Shi, > > On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote: > > hi, Fengwei, hi, Yang Shi, > > > > On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote: > > > > > > On 2024/1/4 09:32, Yang Shi wrote: > > > > ... > > > > > > Can you please help test the below patch? > > > I can't access the testing box now. Oliver will help to test your patch. > > > > > > > since now the commit-id of > > 'mm: align larger anonymous mappings on THP boundaries' > > in linux-next/master is efa7df3e3bb5d > > I applied the patch like below: > > > > * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi > > * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries > > * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi > > > > our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression > > so far, I will test d8d7b1dae6f03 for all these tests. Thanks > > > Hi Oliver, Thanks for running the test. Please see the inline comments. > we got 12 regressions and 1 improvement results for efa7df3e3b so far. > (4 regressions are just similar to what we reported for 1111d46b5c). > by your patch, 6 of those regressions are fixed, others are not impacted. > > below is a summary: > > No. testsuite test status-on-efa7df3e3b fix-by-d8d7b1dae6 ? > === ========= ==== ==================== =================== > (1) stress-ng numa regression NO > (2) pthread regression yes (on a Ice Lake server) > (3) pthread regression yes (on a Cascade Lake desktop) > (4) will-it-scale malloc1 regression NO I think this was reported earlier when Rik submitted the patch in the first place. IIRC, Huang Ying did some analysis on this one and thought is can be ignored. > (5) page_fault1 improvement no (so still improvement) > (6) vm-scalability anon-w-seq-mt regression yes > (7) stream nr_threads=25% regression yes > (8) nr_threads=50% regression yes > (9) phoronix osbench.CreateThreads regression yes (on a Cascade Lake server) > (10) ramspeed.Add.Integer regression NO (and below 3, on a Coffee Lake desktop) > (11) ramspeed.Average.FloatingPoint regression NO > (12) ramspeed.Triad.Integer regression NO > (13) ramspeed.Average.Integer regression NO Not fixing the ramspeed regression is expected. But it seems like both I and Fengwei can't reproduce the regression with running ramspeed alone. > > > below are details, for those regressions not fixed by d8d7b1dae6, attached > full comparison. > > > (1) detail comparison is attached as 'stress-ng-regression' > > Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G > ========================================================================================= > class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 251.12 -48.2% 130.00 -47.9% 130.75 stress-ng.numa.ops > 4.10 -49.4% 2.08 -49.2% 2.09 stress-ng.numa.ops_per_sec This is a new one. I did some analysis, it seems like it is not related to the THP patch since I can reproduce it on the kernel (on aarch64 VM) w/o the THP patch if I set THP to always. The profiling showed the regression was caused by move_pages() syscall. The test actually calls a bunch of NUMA syscalls, for example, set_mempolicy(), mbind(), move_pages(), migrate_pages(), etc, with different parameters. When calling move_pages() it tries to move pages (at base page granularity) to different nodes in a circular list. On my 2-node NUMA VM, it actually moves: 0th page to node #1 1st page to node #0 2nd page to node #1 3rd page to node #0 .... 1023rd page to node #0 But for THP, it actually bounces the THP between the two nodes for 512 times. The pgmigrate_success counter in /proc/vmstat also reflected the case: For base page, the delta is 1928431, but for THP case the delta is 218466402. The kernel already did the node check to kip move if the page is already on the target node, but the test case just do the bounce on purpose since it just assumes base page. So I think this case should be run with THP disabled. > > > (2) > Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G > ========================================================================================= > class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/pthread/stress-ng/60s > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 3272223 -87.8% 400430 +0.5% 3287322 stress-ng.pthread.ops > 54516 -87.8% 6664 +0.5% 54772 stress-ng.pthread.ops_per_sec > > > (3) > Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with memory: 128G > ========================================================================================= > class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 2250845 -85.2% 332370 ± 6% -0.8% 2232820 stress-ng.pthread.ops > 37510 -85.2% 5538 ± 6% -0.8% 37209 stress-ng.pthread.ops_per_sec > > > (4) full comparison attached as 'will-it-scale-regression' > > Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G > ========================================================================================= > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 10994 -86.7% 1466 -86.7% 1460 will-it-scale.per_process_ops > 1231431 -86.7% 164315 -86.7% 163624 will-it-scale.workload > > > (5) > Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G > ========================================================================================= > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault1/will-it-scale > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.224.threads > 56.06 +13.3% 63.53 +13.8% 63.81 will-it-scale.224.threads_idle > 84191 +44.8% 121869 +44.9% 122010 will-it-scale.per_thread_ops > 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.workload > > > (6) > Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G > ========================================================================================= > compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 345968 -6.5% 323566 +0.1% 346304 vm-scalability.median > 1.91 ± 10% -0.5 1.38 ± 20% -0.2 1.75 ± 13% vm-scalability.median_stddev% > 79708409 -7.4% 73839640 -0.1% 79613742 vm-scalability.throughput > > > (7) > Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G > ========================================================================================= > array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase: > 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 349414 -16.2% 292854 ± 2% -0.4% 348048 stream.add_bandwidth_MBps > 347727 ± 2% -16.5% 290470 ± 2% -0.6% 345750 ± 2% stream.add_bandwidth_MBps_harmonicMean > 332206 -21.6% 260428 ± 3% -0.4% 330838 stream.copy_bandwidth_MBps > 330746 ± 2% -22.6% 255915 ± 3% -0.6% 328725 ± 2% stream.copy_bandwidth_MBps_harmonicMean > 301178 -16.9% 250209 ± 2% -0.4% 299920 stream.scale_bandwidth_MBps > 300262 -17.7% 247151 ± 2% -0.6% 298586 ± 2% stream.scale_bandwidth_MBps_harmonicMean > 337408 -12.5% 295287 ± 2% -0.3% 336304 stream.triad_bandwidth_MBps > 336153 -12.7% 293621 -0.5% 334624 ± 2% stream.triad_bandwidth_MBps_harmonicMean > > > (8) > Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G > ========================================================================================= > array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase: > 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/50%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 345632 -19.7% 277550 ± 3% +0.4% 347067 ± 2% stream.add_bandwidth_MBps > 342263 ± 2% -19.7% 274704 ± 2% +0.4% 343609 ± 2% stream.add_bandwidth_MBps_harmonicMean > 343820 -17.3% 284428 ± 3% +0.1% 344248 stream.copy_bandwidth_MBps > 341759 ± 2% -17.8% 280934 ± 3% +0.1% 342025 ± 2% stream.copy_bandwidth_MBps_harmonicMean > 343270 -17.8% 282330 ± 3% +0.3% 344276 ± 2% stream.scale_bandwidth_MBps > 340812 ± 2% -18.3% 278284 ± 3% +0.3% 341672 ± 2% stream.scale_bandwidth_MBps_harmonicMean > 364596 -19.7% 292831 ± 3% +0.4% 366145 ± 2% stream.triad_bandwidth_MBps > 360643 ± 2% -19.9% 289034 ± 3% +0.4% 362004 ± 2% stream.triad_bandwidth_MBps_harmonicMean > > > (9) > Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with memory: 512G > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Create Threads/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 26.82 +1348.4% 388.43 +4.0% 27.88 phoronix-test-suite.osbench.CreateThreads.us_per_event > > > **** for below (10) - (13), full comparison is attached as phoronix-regressions > (they all happen on a Coffee Lake desktop) > (10) > Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 20115 -4.5% 19211 -4.5% 19217 phoronix-test-suite.ramspeed.Add.Integer.mb_s > > > (11) > Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 19960 -2.9% 19378 -3.0% 19366 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s > > > (12) > Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 19667 -6.4% 18399 -6.4% 18413 phoronix-test-suite.ramspeed.Triad.Integer.mb_s > > > (13) > Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G > ========================================================================================= > compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: > gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite > > 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 19799 -3.5% 19106 -3.4% 19117 phoronix-test-suite.ramspeed.Average.Integer.mb_s > > > > > > > > > commit d8d7b1dae6f0311d528b289cda7b317520f9a984 > > Author: 0day robot <lkp@intel.com> > > Date: Thu Jan 4 12:51:10 2024 +0800 > > > > fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h > > index 40d94411d4920..91197bd387730 100644 > > --- a/include/linux/mman.h > > +++ b/include/linux/mman.h > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > > arch_calc_vm_flag_bits(flags); > > } > > > > > > > > > > Regards > > > Yin, Fengwei > > > > > > > > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h > > > > index 40d94411d492..dc7048824be8 100644 > > > > --- a/include/linux/mman.h > > > > +++ b/include/linux/mman.h > > > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > > > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > > > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > > > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > > > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > > > > arch_calc_vm_flag_bits(flags); > > > > } > > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2024-01-05 18:50 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-12-19 15:41 [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression kernel test robot 2023-12-20 5:27 ` Yang Shi 2023-12-20 8:29 ` Yin Fengwei 2023-12-20 15:42 ` Christoph Lameter (Ampere) 2023-12-20 20:14 ` Yang Shi 2023-12-20 20:09 ` Yang Shi 2023-12-21 0:26 ` Yang Shi 2023-12-21 0:58 ` Yin Fengwei 2023-12-21 1:02 ` Yin Fengwei 2023-12-21 4:49 ` Matthew Wilcox 2023-12-21 4:58 ` Yin Fengwei 2023-12-21 18:07 ` Yang Shi 2023-12-21 18:14 ` Matthew Wilcox 2023-12-22 1:06 ` Yin, Fengwei 2023-12-22 2:23 ` Huang, Ying 2023-12-21 13:39 ` Yin, Fengwei 2023-12-21 18:11 ` Yang Shi 2023-12-22 1:13 ` Yin, Fengwei 2024-01-04 1:32 ` Yang Shi 2024-01-04 8:18 ` Yin Fengwei 2024-01-04 8:39 ` Oliver Sang 2024-01-05 9:29 ` Oliver Sang 2024-01-05 14:52 ` Yin, Fengwei 2024-01-05 18:49 ` Yang Shi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox