linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [linux-next:master] [mm]  1111d46b5c:  stress-ng.pthread.ops_per_sec -84.3% regression
@ 2023-12-19 15:41 kernel test robot
  2023-12-20  5:27 ` Yang Shi
  0 siblings, 1 reply; 24+ messages in thread
From: kernel test robot @ 2023-12-19 15:41 UTC (permalink / raw)
  To: Rik van Riel
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yang Shi, Matthew Wilcox, Christopher Lameter, ying.huang,
	feng.tang, fengwei.yin, oliver.sang



Hello,

for this commit, we reported
"[mm]  96db82a66d:  will-it-scale.per_process_ops -95.3% regression"
in Aug, 2022 when it's in linux-next/master
https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/

later, we reported
"[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
in Oct, 2022 when it's in linus/master
https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/

and the commit was reverted finally by
commit 0ba09b1733878afe838fe35c310715fda3d46428
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Dec 4 12:51:59 2022 -0800

now we noticed it goes into linux-next/master again.

we are not sure if there is an agreement that the benefit of this commit
has already overweight performance drop in some mirco benchmark.

we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
that
"This patch was applied to v6.1, but was reverted due to a regression
report.  However it turned out the regression was not due to this patch.
I ping'ed Andrew to reapply this patch, Andrew may forget it.  This
patch helps promote THP, so I rebased it onto the latest mm-unstable."

however, unfortunately, in our latest tests, we still observed below regression
upon this commit. just FYI.



kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:


commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: stress-ng
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
parameters:

	nr_threads: 1
	disk: 1HDD
	testtime: 60s
	fs: ext4
	class: os
	test: pthread
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression                                         |
| test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory   |
| test parameters  | array_size=50000000                                                                           |
|                  | cpufreq_governor=performance                                                                  |
|                  | iterations=10x                                                                                |
|                  | loop=100                                                                                      |
|                  | nr_threads=25%                                                                                |
|                  | omp=true                                                                                      |
+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression       |
| test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
| test parameters  | cpufreq_governor=performance                                                                  |
|                  | option_a=Average                                                                              |
|                  | option_b=Integer                                                                              |
|                  | test=ramspeed-1.4.3                                                                           |
+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
| test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
| test parameters  | cpufreq_governor=performance                                                                  |
|                  | option_a=Average                                                                              |
|                  | option_b=Floating Point                                                                       |
|                  | test=ramspeed-1.4.3                                                                           |
+------------------+-----------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s

commit: 
  30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
  1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")

30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  13405796           -65.5%    4620124        cpuidle..usage
      8.00            +8.2%       8.66 ±  2%  iostat.cpu.system
      1.61           -60.6%       0.63        iostat.cpu.user
    597.50 ± 14%     -64.3%     213.50 ± 14%  perf-c2c.DRAM.local
      1882 ± 14%     -74.7%     476.83 ±  7%  perf-c2c.HITM.local
   3768436           -12.9%    3283395        vmstat.memory.cache
    355105           -75.7%      86344 ±  3%  vmstat.system.cs
    385435           -20.7%     305714 ±  3%  vmstat.system.in
      1.13            -0.2        0.88        mpstat.cpu.all.irq%
      0.29            -0.2        0.10 ±  2%  mpstat.cpu.all.soft%
      6.76 ±  2%      +1.1        7.88 ±  2%  mpstat.cpu.all.sys%
      1.62            -1.0        0.62 ±  2%  mpstat.cpu.all.usr%
   2234397           -84.3%     350161 ±  5%  stress-ng.pthread.ops
     37237           -84.3%       5834 ±  5%  stress-ng.pthread.ops_per_sec
    294706 ±  2%     -68.0%      94191 ±  6%  stress-ng.time.involuntary_context_switches
     41442 ±  2%   +5023.4%    2123284        stress-ng.time.maximum_resident_set_size
   4466457           -83.9%     717053 ±  5%  stress-ng.time.minor_page_faults
    243.33           +13.5%     276.17 ±  3%  stress-ng.time.percent_of_cpu_this_job_got
    131.64           +27.7%     168.11 ±  3%  stress-ng.time.system_time
     19.73           -82.1%       3.53 ±  4%  stress-ng.time.user_time
   7715609           -80.2%    1530125 ±  4%  stress-ng.time.voluntary_context_switches
    494566           -59.5%     200338 ±  3%  meminfo.Active
    478287           -61.5%     184050 ±  3%  meminfo.Active(anon)
     58549 ± 17%   +1532.8%     956006 ± 14%  meminfo.AnonHugePages
    424631          +194.9%    1252445 ± 10%  meminfo.AnonPages
   3677263           -13.0%    3197755        meminfo.Cached
   5829485 ±  4%     -19.0%    4724784 ± 10%  meminfo.Committed_AS
    692486          +108.6%    1444669 ±  8%  meminfo.Inactive
    662179          +113.6%    1414338 ±  9%  meminfo.Inactive(anon)
    182416           -50.2%      90759        meminfo.Mapped
   4614466           +10.0%    5076604 ±  2%  meminfo.Memused
      6985           +47.6%      10307 ±  4%  meminfo.PageTables
    718445           -66.7%     238913 ±  3%  meminfo.Shmem
     35906           -20.7%      28471 ±  3%  meminfo.VmallocUsed
   4838522           +25.6%    6075302        meminfo.max_used_kB
    488.83           -20.9%     386.67 ±  2%  turbostat.Avg_MHz
     12.95            -2.7       10.26 ±  2%  turbostat.Busy%
   7156734           -87.2%     919149 ±  4%  turbostat.C1
     10.59            -8.9        1.65 ±  5%  turbostat.C1%
   3702647           -55.1%    1663518 ±  2%  turbostat.C1E
     32.99           -20.6       12.36 ±  3%  turbostat.C1E%
   1161078           +64.5%    1909611        turbostat.C6
     44.25           +31.8       76.10        turbostat.C6%
      0.18           -33.3%       0.12        turbostat.IPC
  74338573 ±  2%     -33.9%   49159610 ±  4%  turbostat.IRQ
   1381661           -91.0%     124075 ±  6%  turbostat.POLL
      0.26            -0.2        0.04 ± 12%  turbostat.POLL%
     96.15            -5.4%      90.95        turbostat.PkgWatt
     12.12           +19.3%      14.46        turbostat.RAMWatt
    119573           -61.5%      46012 ±  3%  proc-vmstat.nr_active_anon
    106168          +195.8%     314047 ± 10%  proc-vmstat.nr_anon_pages
     28.60 ± 17%   +1538.5%     468.68 ± 14%  proc-vmstat.nr_anon_transparent_hugepages
    923365           -13.0%     803489        proc-vmstat.nr_file_pages
    165571          +113.5%     353493 ±  9%  proc-vmstat.nr_inactive_anon
     45605           -50.2%      22690        proc-vmstat.nr_mapped
      1752           +47.1%       2578 ±  4%  proc-vmstat.nr_page_table_pages
    179613           -66.7%      59728 ±  3%  proc-vmstat.nr_shmem
     21490            -2.4%      20981        proc-vmstat.nr_slab_reclaimable
     28260            -7.3%      26208        proc-vmstat.nr_slab_unreclaimable
    119573           -61.5%      46012 ±  3%  proc-vmstat.nr_zone_active_anon
    165570          +113.5%     353492 ±  9%  proc-vmstat.nr_zone_inactive_anon
  17343640           -76.3%    4116748 ±  4%  proc-vmstat.numa_hit
  17364975           -76.3%    4118098 ±  4%  proc-vmstat.numa_local
    249252           -66.2%      84187 ±  2%  proc-vmstat.pgactivate
  27528916          +567.1%  1.836e+08 ±  5%  proc-vmstat.pgalloc_normal
   4912427           -79.2%    1019949 ±  3%  proc-vmstat.pgfault
  27227124          +574.1%  1.835e+08 ±  5%  proc-vmstat.pgfree
      8728         +3896.4%     348802 ±  5%  proc-vmstat.thp_deferred_split_page
      8730         +3895.3%     348814 ±  5%  proc-vmstat.thp_fault_alloc
      8728         +3896.4%     348802 ±  5%  proc-vmstat.thp_split_pmd
    316745           -21.5%     248756 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.avg
    112735 ±  4%     -34.3%      74061 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.min
      0.49 ±  6%     -17.2%       0.41 ±  8%  sched_debug.cfs_rq:/.h_nr_running.stddev
     12143 ±120%     -99.9%      15.70 ±116%  sched_debug.cfs_rq:/.left_vruntime.avg
    414017 ±126%     -99.9%     428.50 ±102%  sched_debug.cfs_rq:/.left_vruntime.max
     68492 ±125%     -99.9%      78.15 ±106%  sched_debug.cfs_rq:/.left_vruntime.stddev
     41917 ± 24%     -48.3%      21690 ± 57%  sched_debug.cfs_rq:/.load.avg
    176151 ± 30%     -56.9%      75963 ± 57%  sched_debug.cfs_rq:/.load.stddev
      6489 ± 17%     -29.0%       4608 ± 12%  sched_debug.cfs_rq:/.load_avg.max
      4.42 ± 45%     -81.1%       0.83 ± 74%  sched_debug.cfs_rq:/.load_avg.min
      1112 ± 17%     -31.0%     767.62 ± 11%  sched_debug.cfs_rq:/.load_avg.stddev
    316745           -21.5%     248756 ±  4%  sched_debug.cfs_rq:/.min_vruntime.avg
    112735 ±  4%     -34.3%      74061 ±  6%  sched_debug.cfs_rq:/.min_vruntime.min
      0.49 ±  6%     -17.2%       0.41 ±  8%  sched_debug.cfs_rq:/.nr_running.stddev
     12144 ±120%     -99.9%      15.70 ±116%  sched_debug.cfs_rq:/.right_vruntime.avg
    414017 ±126%     -99.9%     428.50 ±102%  sched_debug.cfs_rq:/.right_vruntime.max
     68492 ±125%     -99.9%      78.15 ±106%  sched_debug.cfs_rq:/.right_vruntime.stddev
     14.25 ± 44%     -76.6%       3.33 ± 58%  sched_debug.cfs_rq:/.runnable_avg.min
     11.58 ± 49%     -77.7%       2.58 ± 58%  sched_debug.cfs_rq:/.util_avg.min
    423972 ± 23%     +59.3%     675379 ±  3%  sched_debug.cpu.avg_idle.avg
      5720 ± 43%    +439.5%      30864        sched_debug.cpu.avg_idle.min
     99.79 ±  2%     -23.7%      76.11 ±  2%  sched_debug.cpu.clock_task.stddev
    162475 ± 49%     -95.8%       6813 ± 26%  sched_debug.cpu.curr->pid.avg
   1061268           -84.0%     170212 ±  4%  sched_debug.cpu.curr->pid.max
    365404 ± 20%     -91.3%      31839 ± 10%  sched_debug.cpu.curr->pid.stddev
      0.51 ±  3%     -20.1%       0.41 ±  9%  sched_debug.cpu.nr_running.stddev
    311923           -74.2%      80615 ±  2%  sched_debug.cpu.nr_switches.avg
    565973 ±  4%     -77.8%     125597 ± 10%  sched_debug.cpu.nr_switches.max
    192666 ±  4%     -70.6%      56695 ±  6%  sched_debug.cpu.nr_switches.min
     67485 ±  8%     -79.9%      13558 ± 10%  sched_debug.cpu.nr_switches.stddev
      2.62          +102.1%       5.30        perf-stat.i.MPKI
  2.09e+09           -47.6%  1.095e+09 ±  4%  perf-stat.i.branch-instructions
      1.56            -0.5        1.01        perf-stat.i.branch-miss-rate%
  31951200           -60.9%   12481432 ±  2%  perf-stat.i.branch-misses
     19.38           +23.7       43.08        perf-stat.i.cache-miss-rate%
  26413597            -5.7%   24899132 ±  4%  perf-stat.i.cache-misses
 1.363e+08           -58.3%   56906133 ±  4%  perf-stat.i.cache-references
    370628           -75.8%      89743 ±  3%  perf-stat.i.context-switches
      1.77           +65.1%       2.92 ±  2%  perf-stat.i.cpi
 1.748e+10           -21.8%  1.367e+10 ±  2%  perf-stat.i.cpu-cycles
     61611           -79.1%      12901 ±  6%  perf-stat.i.cpu-migrations
    716.97 ±  2%     -17.2%     593.35 ±  2%  perf-stat.i.cycles-between-cache-misses
      0.12 ±  4%      -0.1        0.05        perf-stat.i.dTLB-load-miss-rate%
   3066100 ±  3%     -81.3%     573066 ±  5%  perf-stat.i.dTLB-load-misses
 2.652e+09           -50.1%  1.324e+09 ±  4%  perf-stat.i.dTLB-loads
      0.08 ±  2%      -0.0        0.03        perf-stat.i.dTLB-store-miss-rate%
   1168195 ±  2%     -82.9%     199438 ±  5%  perf-stat.i.dTLB-store-misses
 1.478e+09           -56.8%  6.384e+08 ±  3%  perf-stat.i.dTLB-stores
   8080423           -73.2%    2169371 ±  3%  perf-stat.i.iTLB-load-misses
   5601321           -74.3%    1440571 ±  2%  perf-stat.i.iTLB-loads
 1.028e+10           -49.7%  5.173e+09 ±  4%  perf-stat.i.instructions
      1450           +73.1%       2511 ±  2%  perf-stat.i.instructions-per-iTLB-miss
      0.61           -35.9%       0.39        perf-stat.i.ipc
      0.48           -21.4%       0.38 ±  2%  perf-stat.i.metric.GHz
    616.28           -17.6%     507.69 ±  4%  perf-stat.i.metric.K/sec
    175.16           -50.8%      86.18 ±  4%  perf-stat.i.metric.M/sec
     76728           -80.8%      14724 ±  4%  perf-stat.i.minor-faults
   5600408           -61.4%    2160997 ±  5%  perf-stat.i.node-loads
   8873996           +52.1%   13499744 ±  5%  perf-stat.i.node-stores
    112409           -81.9%      20305 ±  4%  perf-stat.i.page-faults
      2.55           +89.6%       4.83        perf-stat.overall.MPKI
      1.51            -0.4        1.13        perf-stat.overall.branch-miss-rate%
     19.26           +24.5       43.71        perf-stat.overall.cache-miss-rate%
      1.70           +56.4%       2.65        perf-stat.overall.cpi
    665.84           -17.5%     549.51 ±  2%  perf-stat.overall.cycles-between-cache-misses
      0.12 ±  4%      -0.1        0.04        perf-stat.overall.dTLB-load-miss-rate%
      0.08 ±  2%      -0.0        0.03        perf-stat.overall.dTLB-store-miss-rate%
     59.16            +0.9       60.04        perf-stat.overall.iTLB-load-miss-rate%
      1278           +86.1%       2379 ±  2%  perf-stat.overall.instructions-per-iTLB-miss
      0.59           -36.1%       0.38        perf-stat.overall.ipc
 2.078e+09           -48.3%  1.074e+09 ±  4%  perf-stat.ps.branch-instructions
  31292687           -61.2%   12133349 ±  2%  perf-stat.ps.branch-misses
  26057291            -5.9%   24512034 ±  4%  perf-stat.ps.cache-misses
 1.353e+08           -58.6%   56072195 ±  4%  perf-stat.ps.cache-references
    365254           -75.8%      88464 ±  3%  perf-stat.ps.context-switches
 1.735e+10           -22.4%  1.346e+10 ±  2%  perf-stat.ps.cpu-cycles
     60838           -79.1%      12727 ±  6%  perf-stat.ps.cpu-migrations
   3056601 ±  4%     -81.5%     565354 ±  4%  perf-stat.ps.dTLB-load-misses
 2.636e+09           -50.7%    1.3e+09 ±  4%  perf-stat.ps.dTLB-loads
   1155253 ±  2%     -83.0%     196581 ±  5%  perf-stat.ps.dTLB-store-misses
 1.473e+09           -57.4%  6.268e+08 ±  3%  perf-stat.ps.dTLB-stores
   7997726           -73.3%    2131477 ±  3%  perf-stat.ps.iTLB-load-misses
   5521346           -74.3%    1418623 ±  2%  perf-stat.ps.iTLB-loads
 1.023e+10           -50.4%  5.073e+09 ±  4%  perf-stat.ps.instructions
     75671           -80.9%      14479 ±  4%  perf-stat.ps.minor-faults
   5549722           -61.4%    2141750 ±  4%  perf-stat.ps.node-loads
   8769156           +51.6%   13296579 ±  5%  perf-stat.ps.node-stores
    110795           -82.0%      19977 ±  4%  perf-stat.ps.page-faults
 6.482e+11           -50.7%  3.197e+11 ±  4%  perf-stat.total.instructions
      0.00 ± 37%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
      0.01 ± 18%   +8373.1%       0.73 ± 49%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
      0.01 ± 16%   +4600.0%       0.38 ± 24%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
      0.01 ±204%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      0.01 ±  8%   +3678.9%       0.36 ± 79%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
      0.01 ± 14%     -38.5%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
      0.01 ±  5%   +2946.2%       0.26 ± 43%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
      0.00 ± 14%    +125.0%       0.01 ± 12%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ±170%     -83.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00 ± 69%   +6578.6%       0.31 ±  4%  perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
      0.00          +100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      0.02 ± 86%   +4234.4%       0.65 ±  4%  perf-sched.sch_delay.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
      0.01 ±  6%   +6054.3%       0.47        perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
      0.00 ± 14%    +195.2%       0.01 ± 89%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.00 ±102%    +340.0%       0.01 ± 85%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      0.00          +100.0%       0.00        perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.00 ± 11%     +66.7%       0.01 ± 21%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.01 ± 89%   +1096.1%       0.15 ± 30%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
      0.00          +141.7%       0.01 ± 61%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.00 ±223%   +9975.0%       0.07 ±203%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
      0.00 ± 10%    +789.3%       0.04 ± 69%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.00 ± 31%   +6691.3%       0.26 ±  5%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
      0.00 ± 28%  +14612.5%       0.59 ±  4%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
      0.00 ± 24%   +4904.2%       0.20 ±  4%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
      0.00 ± 28%    +450.0%       0.01 ± 74%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      0.00 ± 17%    +984.6%       0.02 ± 79%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.00 ± 20%    +231.8%       0.01 ± 89%  perf-sched.sch_delay.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.submit_bio_wait
      0.00          +350.0%       0.01 ± 16%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.02 ± 16%    +320.2%       0.07 ±  2%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ±  2%    +282.1%       0.09 ±  5%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.00 ± 14%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
      0.05 ± 35%   +3784.5%       1.92 ± 16%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
      0.29 ±128%    +563.3%       1.92 ±  7%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
      0.14 ±217%     -99.7%       0.00 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      0.03 ± 49%     -74.0%       0.01 ± 51%  perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      0.01 ± 54%     -57.4%       0.00 ± 75%  perf-sched.sch_delay.max.ms.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
      0.12 ± 21%    +873.0%       1.19 ± 60%  perf-sched.sch_delay.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
      2.27 ±220%     -99.7%       0.01 ± 19%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
      0.02 ± 36%     -54.4%       0.01 ± 55%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
      0.04 ± 36%     -77.1%       0.01 ± 31%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
      0.12 ± 32%   +1235.8%       1.58 ± 31%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
      2.25 ±218%     -99.3%       0.02 ± 52%  perf-sched.sch_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.01 ± 85%  +19836.4%       2.56 ±  7%  perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
      0.03 ± 70%     -93.6%       0.00 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
      0.10 ± 16%   +2984.2%       3.21 ±  6%  perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
      0.01 ± 20%    +883.9%       0.05 ±177%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ± 15%    +694.7%       0.08 ±123%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.00 ±223%   +6966.7%       0.07 ±199%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
      0.01 ± 38%   +8384.6%       0.55 ± 72%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.01 ± 13%  +12995.7%       1.51 ±103%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    117.80 ± 56%     -96.4%       4.26 ± 36%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 68%    +331.9%       0.03        perf-sched.total_sch_delay.average.ms
      4.14          +242.6%      14.20 ±  4%  perf-sched.total_wait_and_delay.average.ms
    700841           -69.6%     212977 ±  3%  perf-sched.total_wait_and_delay.count.ms
      4.14          +242.4%      14.16 ±  4%  perf-sched.total_wait_time.average.ms
     11.68 ±  8%    +213.3%      36.59 ± 28%  perf-sched.wait_and_delay.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
     10.00 ±  2%    +226.1%      32.62 ± 20%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
     10.55 ±  3%    +259.8%      37.96 ±  7%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
      9.80 ± 12%    +196.5%      29.07 ± 32%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
      9.80 ±  4%    +234.9%      32.83 ± 14%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
     10.32 ±  2%    +223.8%      33.42 ±  6%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
      8.15 ± 14%    +271.3%      30.25 ± 35%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
      9.60 ±  4%    +240.8%      32.73 ± 16%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
     10.37 ±  4%    +232.0%      34.41 ± 10%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
      7.32 ± 46%    +269.7%      27.07 ± 49%  perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      9.88          +236.2%      33.23 ±  4%  perf-sched.wait_and_delay.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
      4.44 ±  4%    +379.0%      21.27 ± 18%  perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     10.05 ±  2%    +235.6%      33.73 ± 11%  perf-sched.wait_and_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.03          +462.6%       0.15 ±  6%  perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.78 ±  4%    +482.1%      39.46 ±  3%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
      3.17          +683.3%      24.85 ±  8%  perf-sched.wait_and_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
     36.64 ± 13%    +244.7%     126.32 ±  6%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      9.81          +302.4%      39.47 ±  4%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
      1.05           +48.2%       1.56        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
      0.93           +14.2%       1.06 ±  2%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
      9.93          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
     12.02 ±  3%    +139.8%      28.83 ±  6%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      6.09 ±  2%    +403.0%      30.64 ±  5%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     23.17 ± 19%     -83.5%       3.83 ±143%  perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio
     79.83 ±  9%     -55.1%      35.83 ± 16%  perf-sched.wait_and_delay.count.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
     14.83 ± 14%     -59.6%       6.00 ± 56%  perf-sched.wait_and_delay.count.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      8.50 ± 17%     -80.4%       1.67 ± 89%  perf-sched.wait_and_delay.count.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
    114.00 ± 14%     -62.4%      42.83 ± 11%  perf-sched.wait_and_delay.count.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
     94.67 ±  7%     -48.1%      49.17 ± 13%  perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
     59.83 ± 13%     -76.0%      14.33 ± 48%  perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
    103.00 ± 12%     -48.1%      53.50 ± 20%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
     19.33 ± 16%     -56.0%       8.50 ± 29%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
     68.17 ± 11%     -39.1%      41.50 ± 19%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
     36.67 ± 22%     -79.1%       7.67 ± 46%  perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
    465.50 ±  9%     -47.4%     244.83 ± 11%  perf-sched.wait_and_delay.count.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
     14492 ±  3%     -96.3%     533.67 ± 10%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    128.67 ±  7%     -53.5%      59.83 ± 10%  perf-sched.wait_and_delay.count.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.67 ± 34%     -80.4%       1.50 ±107%  perf-sched.wait_and_delay.count.__cond_resched.vunmap_p4d_range.__vunmap_range_noflush.remove_vm_area.vfree
    147533           -81.0%      28023 ±  5%  perf-sched.wait_and_delay.count.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4394 ±  4%     -78.5%     942.83 ±  7%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
    228791           -79.3%      47383 ±  4%  perf-sched.wait_and_delay.count.futex_wait_queue.__futex_wait.futex_wait.do_futex
    368.50 ±  2%     -67.1%     121.33 ±  3%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
    147506           -81.0%      28010 ±  5%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
      5387 ±  6%     -16.7%       4488 ±  5%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
      8303 ±  2%     -56.9%       3579 ±  5%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
     14.67 ±  7%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
    370.50 ±141%    +221.9%       1192 ±  5%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     24395 ±  2%     -51.2%      11914 ±  6%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     31053 ±  2%     -80.5%       6047 ±  5%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     16.41 ±  2%    +342.7%      72.65 ± 29%  perf-sched.wait_and_delay.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
     16.49 ±  3%    +463.3%      92.90 ± 27%  perf-sched.wait_and_delay.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
     17.32 ±  5%    +520.9%     107.52 ± 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
     15.38 ±  6%    +325.2%      65.41 ± 22%  perf-sched.wait_and_delay.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
     16.73 ±  4%    +456.2%      93.04 ± 11%  perf-sched.wait_and_delay.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
     17.14 ±  3%    +510.6%     104.68 ± 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
     15.70 ±  4%    +379.4%      75.25 ± 28%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
     15.70 ±  3%    +422.1%      81.97 ± 19%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
     16.38          +528.4%     102.91 ± 21%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
     45.20 ± 48%    +166.0%     120.23 ± 27%  perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
     17.25          +495.5%     102.71 ±  2%  perf-sched.wait_and_delay.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
    402.57 ± 15%     -52.8%     189.90 ± 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     16.96 ±  4%    +521.3%     105.40 ± 15%  perf-sched.wait_and_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
     28.45          +517.3%     175.65 ± 14%  perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
     22.49          +628.5%     163.83 ± 16%  perf-sched.wait_and_delay.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
     26.53 ± 30%    +326.9%     113.25 ± 16%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
     15.54          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
      1.67 ±141%    +284.6%       6.44 ±  4%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.07 ± 34%     -93.6%       0.00 ±105%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
     10.21 ± 15%    +295.8%      40.43 ± 50%  perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.89 ± 40%     -99.8%       0.01 ±113%  perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
     11.67 ±  8%    +213.5%      36.58 ± 28%  perf-sched.wait_time.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
      9.98 ±  2%    +226.8%      32.61 ± 20%  perf-sched.wait_time.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
      1.03           +71.2%       1.77 ± 20%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
      0.06 ± 79%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
      0.05 ± 22%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
      0.08 ± 82%     -98.2%       0.00 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     10.72 ± 10%    +166.9%      28.61 ± 29%  perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
     10.53 ±  3%    +260.5%      37.95 ±  7%  perf-sched.wait_time.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
      9.80 ± 12%    +196.6%      29.06 ± 32%  perf-sched.wait_time.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
      9.80 ±  4%    +235.1%      32.82 ± 14%  perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
      9.50 ± 12%    +281.9%      36.27 ± 70%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     10.31 ±  2%    +223.9%      33.40 ±  6%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
      8.04 ± 15%    +276.1%      30.25 ± 35%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
      9.60 ±  4%    +240.9%      32.72 ± 16%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
      0.06 ± 66%     -98.3%       0.00 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
     10.36 ±  4%    +232.1%      34.41 ± 10%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
      0.08 ± 50%     -95.7%       0.00 ±100%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
      0.01 ± 49%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
      0.03 ± 73%     -87.4%       0.00 ±145%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
      8.01 ± 25%    +238.0%      27.07 ± 49%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
      9.86          +237.0%      33.23 ±  4%  perf-sched.wait_time.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
      4.44 ±  4%    +379.2%      21.26 ± 18%  perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     10.03          +236.3%      33.73 ± 11%  perf-sched.wait_time.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.97 ±  8%     -87.8%       0.12 ±221%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
      0.02 ± 13%   +1846.8%       0.45 ± 11%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.01           +64.7%       1.66        perf-sched.wait_time.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
      0.75 ±  4%    +852.1%       7.10 ±  5%  perf-sched.wait_time.avg.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.03          +462.6%       0.15 ±  6%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.24 ±  4%     +25.3%       0.30 ±  8%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      1.98 ± 15%    +595.7%      13.80 ± 90%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      2.78 ± 14%    +444.7%      15.12 ± 16%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
      6.77 ±  4%    +483.0%      39.44 ±  3%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
      3.17          +684.7%      24.85 ±  8%  perf-sched.wait_time.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
     36.64 ± 13%    +244.7%     126.32 ±  6%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      9.79          +303.0%      39.45 ±  4%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
      1.05           +23.8%       1.30        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
      0.86          +101.2%       1.73 ±  3%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
      0.11 ± 21%    +438.9%       0.61 ± 15%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      0.32 ±  4%     +28.5%       0.41 ± 13%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     12.00 ±  3%    +139.6%      28.76 ±  6%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      6.07 ±  2%    +403.5%      30.56 ±  5%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.38 ± 41%     -98.8%       0.00 ±105%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
      0.36 ± 34%     -84.3%       0.06 ±200%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.vma_alloc_folio.do_anonymous_page
      0.36 ± 51%     -92.9%       0.03 ±114%  perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
     15.98 ±  5%    +361.7%      73.80 ± 23%  perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.51 ± 14%     -92.8%       0.04 ±196%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.__vmalloc_area_node.__vmalloc_node_range
      8.56 ± 11%     -99.9%       0.01 ±126%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
      0.43 ± 32%     -68.2%       0.14 ±119%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_node_trace.__get_vm_area_node.__vmalloc_node_range
      0.46 ± 20%     -89.3%       0.05 ±184%  perf-sched.wait_time.max.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct
     16.40 ±  2%    +342.9%      72.65 ± 29%  perf-sched.wait_time.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
      0.31 ± 63%     -76.2%       0.07 ±169%  perf-sched.wait_time.max.ms.__cond_resched.cgroup_css_set_fork.cgroup_can_fork.copy_process.kernel_clone
      0.14 ± 93%    +258.7%       0.49 ± 14%  perf-sched.wait_time.max.ms.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
     16.49 ±  3%    +463.5%      92.89 ± 27%  perf-sched.wait_time.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
      1.09          +171.0%       2.96 ± 10%  perf-sched.wait_time.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
      1.16 ±  7%    +155.1%       2.97 ±  4%  perf-sched.wait_time.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
      0.19 ± 78%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
      0.33 ± 35%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
      0.20 ±101%     -99.3%       0.00 ±223%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     17.31 ±  5%    +521.0%     107.51 ± 14%  perf-sched.wait_time.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
     15.38 ±  6%    +325.3%      65.40 ± 22%  perf-sched.wait_time.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
     16.72 ±  4%    +456.6%      93.04 ± 11%  perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
      1.16 ±  2%     +88.7%       2.20 ± 33%  perf-sched.wait_time.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
     53.96 ± 32%    +444.0%     293.53 ±109%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     17.13 ±  2%    +511.2%     104.68 ± 14%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
     15.69 ±  4%    +379.5%      75.25 ± 28%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
     15.70 ±  3%    +422.2%      81.97 ± 19%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
      0.27 ± 80%     -99.6%       0.00 ±223%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
     16.37          +528.6%     102.90 ± 21%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
      0.44 ± 33%     -99.1%       0.00 ±104%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
      0.02 ± 83%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
      0.08 ± 83%     -95.4%       0.00 ±147%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
      1.16 ±  2%    +134.7%       2.72 ± 19%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
     49.88 ± 25%    +141.0%     120.23 ± 27%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
     17.24          +495.7%     102.70 ±  2%  perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
    402.56 ± 15%     -52.8%     189.89 ± 14%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     16.96 ±  4%    +521.4%     105.39 ± 15%  perf-sched.wait_time.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.06          +241.7%       3.61 ±  4%  perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
      1.07           -88.9%       0.12 ±221%  perf-sched.wait_time.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
      0.28 ± 27%    +499.0%       1.67 ± 18%  perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.21 ±  2%    +207.2%       3.71 ±  3%  perf-sched.wait_time.max.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
     13.43 ± 26%     +38.8%      18.64        perf-sched.wait_time.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     28.45          +517.3%     175.65 ± 14%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.79 ± 10%     +62.2%       1.28 ± 25%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
     13.22 ±  2%    +317.2%      55.16 ± 35%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
    834.29 ± 28%     -48.5%     429.53 ± 94%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
     22.48          +628.6%     163.83 ± 16%  perf-sched.wait_time.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
     22.74 ± 18%    +398.0%     113.25 ± 16%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
      7.72 ±  7%     +80.6%      13.95 ±  2%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
      0.74 ±  4%     +77.2%       1.31 ± 32%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      5.01           +14.1%       5.72 ±  2%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     44.98           -19.7       25.32 ±  2%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     43.21           -19.6       23.65 ±  3%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     43.21           -19.6       23.65 ±  3%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     43.18           -19.5       23.63 ±  3%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     40.30           -17.5       22.75 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     41.10           -17.4       23.66 ±  2%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
     39.55           -17.3       22.24 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
     24.76 ±  2%      -8.5       16.23 ±  3%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      8.68 ±  4%      -6.5        2.22 ±  6%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      7.23 ±  4%      -5.8        1.46 ±  8%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
      7.23 ±  4%      -5.8        1.46 ±  8%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.11 ±  4%      -5.7        1.39 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.09 ±  4%      -5.7        1.39 ±  7%  perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
      6.59 ±  3%      -5.1        1.47 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      6.59 ±  3%      -5.1        1.47 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      6.59 ±  3%      -5.1        1.47 ±  7%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
      5.76 ±  2%      -5.0        0.80 ±  9%  perf-profile.calltrace.cycles-pp.start_thread
      7.43 ±  2%      -4.9        2.52 ±  7%  perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      5.51 ±  3%      -4.8        0.70 ±  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.start_thread
      5.50 ±  3%      -4.8        0.70 ±  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
      5.48 ±  3%      -4.8        0.69 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
      5.42 ±  3%      -4.7        0.69 ±  7%  perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
      5.90 ±  5%      -3.9        2.01 ±  4%  perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
      4.18 ±  5%      -3.8        0.37 ± 71%  perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.76 ±  5%      -3.8        1.98 ±  4%  perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
      5.04 ±  7%      -3.7        1.32 ±  9%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__clone
      5.03 ±  7%      -3.7        1.32 ±  9%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
      5.02 ±  7%      -3.7        1.32 ±  9%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
      5.02 ±  7%      -3.7        1.32 ±  9%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
      5.62 ±  5%      -3.7        1.96 ±  3%  perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
      4.03 ±  4%      -3.1        0.92 ±  7%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      6.03 ±  5%      -3.1        2.94 ±  3%  perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
      3.43 ±  5%      -2.8        0.67 ± 13%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3.43 ±  5%      -2.8        0.67 ± 13%  perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
      3.41 ±  5%      -2.7        0.66 ± 13%  perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
      3.40 ±  5%      -2.7        0.66 ± 13%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
      3.67 ±  7%      -2.7        0.94 ± 10%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.92 ±  7%      -2.4        0.50 ± 46%  perf-profile.calltrace.cycles-pp.stress_pthread
      2.54 ±  6%      -2.2        0.38 ± 70%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2.46 ±  6%      -1.8        0.63 ± 10%  perf-profile.calltrace.cycles-pp.dup_task_struct.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
      3.00 ±  6%      -1.6        1.43 ±  7%  perf-profile.calltrace.cycles-pp.__munmap
      2.96 ±  6%      -1.5        1.42 ±  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
      2.96 ±  6%      -1.5        1.42 ±  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      2.95 ±  6%      -1.5        1.41 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      2.95 ±  6%      -1.5        1.41 ±  7%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      2.02 ±  4%      -1.5        0.52 ± 46%  perf-profile.calltrace.cycles-pp.__lll_lock_wait
      1.78 ±  3%      -1.5        0.30 ±100%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
      1.77 ±  3%      -1.5        0.30 ±100%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
      1.54 ±  6%      -1.3        0.26 ±100%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      2.54 ±  6%      -1.2        1.38 ±  6%  perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.51 ±  6%      -1.1        1.37 ±  7%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      1.13            -0.7        0.40 ± 70%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.15 ±  5%      -0.7        0.46 ± 45%  perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
      1.58 ±  5%      -0.6        0.94 ±  7%  perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      0.99 ±  5%      -0.5        0.51 ± 45%  perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
      1.01 ±  5%      -0.5        0.54 ± 45%  perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
      0.82 ±  4%      -0.2        0.59 ±  5%  perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
      0.00            +0.5        0.54 ±  5%  perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
      0.00            +0.6        0.60 ±  5%  perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
      0.00            +0.6        0.61 ±  6%  perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      0.00            +0.6        0.62 ±  6%  perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      0.53 ±  5%      +0.6        1.17 ± 13%  perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
      1.94 ±  2%      +0.7        2.64 ±  9%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      0.00            +0.7        0.73 ±  5%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
      0.00            +0.8        0.75 ± 20%  perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
      2.02 ±  2%      +0.8        2.85 ±  9%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      0.74 ±  5%      +0.8        1.57 ± 11%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      0.00            +0.9        0.90 ±  4%  perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
      0.00            +0.9        0.92 ± 13%  perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues
      0.86 ±  4%      +1.0        1.82 ± 10%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
      0.86 ±  4%      +1.0        1.83 ± 10%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
      0.00            +1.0        0.98 ±  7%  perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked
      0.09 ±223%      +1.0        1.07 ± 11%  perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt
      0.00            +1.0        0.99 ±  6%  perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd
      0.00            +1.0        1.00 ±  7%  perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range
      0.09 ±223%      +1.0        1.10 ± 12%  perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
      0.00            +1.0        1.01 ±  6%  perf-profile.calltrace.cycles-pp.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
      0.00            +1.1        1.10 ±  5%  perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath
      0.00            +1.1        1.12 ±  5%  perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock
      0.00            +1.2        1.23 ±  4%  perf-profile.calltrace.cycles-pp.page_add_anon_rmap.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
      0.00            +1.3        1.32 ±  4%  perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd
      0.00            +1.4        1.38 ±  5%  perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
      0.00            +2.4        2.44 ± 10%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range
      0.00            +3.1        3.10 ±  5%  perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
      0.00            +3.5        3.52 ±  5%  perf-profile.calltrace.cycles-pp.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
      0.88 ±  4%      +3.8        4.69 ±  4%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
      6.30 ±  6%     +13.5       19.85 ±  7%  perf-profile.calltrace.cycles-pp.__clone
      0.00           +16.7       16.69 ±  7%  perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
      1.19 ± 29%     +17.1       18.32 ±  7%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00           +17.6       17.56 ±  7%  perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.63 ±  7%     +17.7       18.35 ±  7%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__clone
      0.59 ±  5%     +17.8       18.34 ±  7%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.__clone
      0.59 ±  5%     +17.8       18.34 ±  7%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
      0.00           +17.9       17.90 ±  7%  perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      0.36 ± 71%     +18.0       18.33 ±  7%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
      0.00           +32.0       32.03 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range
      0.00           +32.6       32.62 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
      0.00           +36.2       36.19 ±  2%  perf-profile.calltrace.cycles-pp.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
      7.97 ±  4%     +36.6       44.52 ±  2%  perf-profile.calltrace.cycles-pp.__madvise
      7.91 ±  4%     +36.6       44.46 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
      7.90 ±  4%     +36.6       44.46 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      7.87 ±  4%     +36.6       44.44 ±  2%  perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      7.86 ±  4%     +36.6       44.44 ±  2%  perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      7.32 ±  4%     +36.8       44.07 ±  2%  perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.25 ±  4%     +36.8       44.06 ±  2%  perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
      1.04 ±  4%     +40.0       41.08 ±  2%  perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
      1.00 ±  3%     +40.1       41.06 ±  2%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
     44.98           -19.7       25.32 ±  2%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     44.98           -19.7       25.32 ±  2%  perf-profile.children.cycles-pp.cpu_startup_entry
     44.96           -19.6       25.31 ±  2%  perf-profile.children.cycles-pp.do_idle
     43.21           -19.6       23.65 ±  3%  perf-profile.children.cycles-pp.start_secondary
     41.98           -17.6       24.40 ±  2%  perf-profile.children.cycles-pp.cpuidle_idle_call
     41.21           -17.3       23.86 ±  2%  perf-profile.children.cycles-pp.cpuidle_enter
     41.20           -17.3       23.86 ±  2%  perf-profile.children.cycles-pp.cpuidle_enter_state
     12.69 ±  3%     -10.6        2.12 ±  6%  perf-profile.children.cycles-pp.do_exit
     12.60 ±  3%     -10.5        2.08 ±  7%  perf-profile.children.cycles-pp.__x64_sys_exit
     24.76 ±  2%      -8.5       16.31 ±  2%  perf-profile.children.cycles-pp.intel_idle
     12.34 ±  2%      -8.4        3.90 ±  5%  perf-profile.children.cycles-pp.intel_idle_irq
      6.96 ±  4%      -5.4        1.58 ±  7%  perf-profile.children.cycles-pp.ret_from_fork_asm
      6.69 ±  4%      -5.2        1.51 ±  7%  perf-profile.children.cycles-pp.ret_from_fork
      6.59 ±  3%      -5.1        1.47 ±  7%  perf-profile.children.cycles-pp.kthread
      5.78 ±  2%      -5.0        0.80 ±  8%  perf-profile.children.cycles-pp.start_thread
      4.68 ±  4%      -4.5        0.22 ± 10%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      5.03 ±  7%      -3.7        1.32 ±  9%  perf-profile.children.cycles-pp.__do_sys_clone
      5.02 ±  7%      -3.7        1.32 ±  9%  perf-profile.children.cycles-pp.kernel_clone
      4.20 ±  5%      -3.7        0.53 ±  9%  perf-profile.children.cycles-pp.exit_notify
      4.67 ±  5%      -3.6        1.10 ±  9%  perf-profile.children.cycles-pp.rcu_core
      4.60 ±  4%      -3.5        1.06 ± 10%  perf-profile.children.cycles-pp.rcu_do_batch
      4.89 ±  5%      -3.4        1.44 ± 11%  perf-profile.children.cycles-pp.__do_softirq
      5.64 ±  3%      -3.2        2.39 ±  6%  perf-profile.children.cycles-pp.__schedule
      6.27 ±  5%      -3.2        3.03 ±  4%  perf-profile.children.cycles-pp.flush_tlb_mm_range
      4.03 ±  4%      -3.1        0.92 ±  7%  perf-profile.children.cycles-pp.smpboot_thread_fn
      6.68 ±  4%      -3.1        3.61 ±  3%  perf-profile.children.cycles-pp.tlb_finish_mmu
      6.04 ±  5%      -3.1        2.99 ±  4%  perf-profile.children.cycles-pp.on_each_cpu_cond_mask
      6.04 ±  5%      -3.0        2.99 ±  4%  perf-profile.children.cycles-pp.smp_call_function_many_cond
      3.77 ±  2%      -3.0        0.73 ± 16%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      7.78            -3.0        4.77 ±  5%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      3.43 ±  5%      -2.8        0.67 ± 13%  perf-profile.children.cycles-pp.run_ksoftirqd
      3.67 ±  7%      -2.7        0.94 ± 10%  perf-profile.children.cycles-pp.copy_process
      2.80 ±  6%      -2.5        0.34 ± 15%  perf-profile.children.cycles-pp.queued_write_lock_slowpath
      3.41 ±  2%      -2.5        0.96 ± 16%  perf-profile.children.cycles-pp.do_futex
      3.06 ±  5%      -2.4        0.68 ± 16%  perf-profile.children.cycles-pp.free_unref_page_commit
      3.02 ±  5%      -2.4        0.67 ± 16%  perf-profile.children.cycles-pp.free_pcppages_bulk
      2.92 ±  7%      -2.3        0.58 ± 14%  perf-profile.children.cycles-pp.stress_pthread
      3.22 ±  3%      -2.3        0.90 ± 18%  perf-profile.children.cycles-pp.__x64_sys_futex
      2.52 ±  5%      -2.2        0.35 ±  7%  perf-profile.children.cycles-pp.release_task
      2.54 ±  6%      -2.0        0.53 ± 10%  perf-profile.children.cycles-pp.worker_thread
      3.12 ±  5%      -1.9        1.17 ± 11%  perf-profile.children.cycles-pp.free_unref_page
      2.31 ±  6%      -1.9        0.45 ± 11%  perf-profile.children.cycles-pp.process_one_work
      2.47 ±  6%      -1.8        0.63 ± 10%  perf-profile.children.cycles-pp.dup_task_struct
      2.19 ±  5%      -1.8        0.41 ± 12%  perf-profile.children.cycles-pp.delayed_vfree_work
      2.14 ±  5%      -1.7        0.40 ± 11%  perf-profile.children.cycles-pp.vfree
      3.19 ±  2%      -1.6        1.58 ±  8%  perf-profile.children.cycles-pp.schedule
      2.06 ±  3%      -1.6        0.46 ±  7%  perf-profile.children.cycles-pp.__sigtimedwait
      3.02 ±  6%      -1.6        1.44 ±  7%  perf-profile.children.cycles-pp.__munmap
      1.94 ±  4%      -1.6        0.39 ± 14%  perf-profile.children.cycles-pp.__unfreeze_partials
      2.95 ±  6%      -1.5        1.41 ±  7%  perf-profile.children.cycles-pp.__x64_sys_munmap
      2.95 ±  6%      -1.5        1.41 ±  7%  perf-profile.children.cycles-pp.__vm_munmap
      2.14 ±  3%      -1.5        0.60 ± 21%  perf-profile.children.cycles-pp.futex_wait
      2.08 ±  4%      -1.5        0.60 ± 19%  perf-profile.children.cycles-pp.__lll_lock_wait
      2.04 ±  3%      -1.5        0.56 ± 20%  perf-profile.children.cycles-pp.__futex_wait
      1.77 ±  5%      -1.5        0.32 ± 10%  perf-profile.children.cycles-pp.remove_vm_area
      1.86 ±  5%      -1.4        0.46 ± 10%  perf-profile.children.cycles-pp.open64
      1.74 ±  4%      -1.4        0.37 ±  7%  perf-profile.children.cycles-pp.__x64_sys_rt_sigtimedwait
      1.71 ±  4%      -1.4        0.36 ±  8%  perf-profile.children.cycles-pp.do_sigtimedwait
      1.79 ±  5%      -1.3        0.46 ±  9%  perf-profile.children.cycles-pp.__x64_sys_openat
      1.78 ±  5%      -1.3        0.46 ±  8%  perf-profile.children.cycles-pp.do_sys_openat2
      1.61 ±  4%      -1.3        0.32 ± 12%  perf-profile.children.cycles-pp.poll_idle
      1.65 ±  9%      -1.3        0.37 ± 14%  perf-profile.children.cycles-pp.pthread_create@@GLIBC_2.2.5
      1.56 ±  8%      -1.2        0.35 ±  7%  perf-profile.children.cycles-pp.alloc_thread_stack_node
      2.32 ±  3%      -1.2        1.13 ±  8%  perf-profile.children.cycles-pp.pick_next_task_fair
      2.59 ±  6%      -1.2        1.40 ±  7%  perf-profile.children.cycles-pp.do_vmi_munmap
      1.55 ±  4%      -1.2        0.40 ± 19%  perf-profile.children.cycles-pp.futex_wait_queue
      1.37 ±  5%      -1.1        0.22 ± 12%  perf-profile.children.cycles-pp.find_unlink_vmap_area
      2.52 ±  6%      -1.1        1.38 ±  6%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      1.53 ±  5%      -1.1        0.39 ±  8%  perf-profile.children.cycles-pp.do_filp_open
      1.52 ±  5%      -1.1        0.39 ±  7%  perf-profile.children.cycles-pp.path_openat
      1.25 ±  3%      -1.1        0.14 ± 12%  perf-profile.children.cycles-pp.sigpending
      1.58 ±  5%      -1.1        0.50 ±  6%  perf-profile.children.cycles-pp.schedule_idle
      1.29 ±  5%      -1.1        0.21 ± 21%  perf-profile.children.cycles-pp.__mprotect
      1.40 ±  8%      -1.1        0.32 ±  4%  perf-profile.children.cycles-pp.__vmalloc_node_range
      2.06 ±  3%      -1.0        1.02 ±  9%  perf-profile.children.cycles-pp.newidle_balance
      1.04 ±  3%      -1.0        0.08 ± 23%  perf-profile.children.cycles-pp.__x64_sys_rt_sigpending
      1.14 ±  6%      -1.0        0.18 ± 18%  perf-profile.children.cycles-pp.__x64_sys_mprotect
      1.13 ±  6%      -1.0        0.18 ± 17%  perf-profile.children.cycles-pp.do_mprotect_pkey
      1.30 ±  7%      -0.9        0.36 ± 10%  perf-profile.children.cycles-pp.wake_up_new_task
      1.14 ±  9%      -0.9        0.22 ± 16%  perf-profile.children.cycles-pp.do_anonymous_page
      0.95 ±  3%      -0.9        0.04 ± 71%  perf-profile.children.cycles-pp.do_sigpending
      1.24 ±  3%      -0.9        0.34 ±  9%  perf-profile.children.cycles-pp.futex_wake
      1.02 ±  6%      -0.9        0.14 ± 15%  perf-profile.children.cycles-pp.mprotect_fixup
      1.91 ±  2%      -0.9        1.06 ±  9%  perf-profile.children.cycles-pp.load_balance
      1.38 ±  5%      -0.8        0.53 ±  6%  perf-profile.children.cycles-pp.select_task_rq_fair
      1.14 ±  4%      -0.8        0.31 ± 12%  perf-profile.children.cycles-pp.__pthread_mutex_unlock_usercnt
      2.68 ±  3%      -0.8        1.91 ±  6%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      1.00 ±  4%      -0.7        0.26 ± 10%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
      1.44 ±  3%      -0.7        0.73 ± 10%  perf-profile.children.cycles-pp.find_busiest_group
      0.81 ±  6%      -0.7        0.10 ± 18%  perf-profile.children.cycles-pp.vma_modify
      1.29 ±  3%      -0.7        0.60 ±  8%  perf-profile.children.cycles-pp.exit_mm
      1.40 ±  3%      -0.7        0.71 ± 10%  perf-profile.children.cycles-pp.update_sd_lb_stats
      0.78 ±  7%      -0.7        0.10 ± 19%  perf-profile.children.cycles-pp.__split_vma
      0.90 ±  8%      -0.7        0.22 ± 10%  perf-profile.children.cycles-pp.__vmalloc_area_node
      0.75 ±  4%      -0.7        0.10 ±  5%  perf-profile.children.cycles-pp.__exit_signal
      1.49 ±  2%      -0.7        0.84 ±  7%  perf-profile.children.cycles-pp.try_to_wake_up
      0.89 ±  7%      -0.6        0.24 ± 10%  perf-profile.children.cycles-pp.find_idlest_cpu
      1.59 ±  5%      -0.6        0.95 ±  7%  perf-profile.children.cycles-pp.unmap_region
      0.86 ±  3%      -0.6        0.22 ± 26%  perf-profile.children.cycles-pp.pthread_cond_timedwait@@GLIBC_2.3.2
      1.59 ±  3%      -0.6        0.95 ±  9%  perf-profile.children.cycles-pp.irq_exit_rcu
      1.24 ±  3%      -0.6        0.61 ± 10%  perf-profile.children.cycles-pp.update_sg_lb_stats
      0.94 ±  5%      -0.6        0.32 ± 11%  perf-profile.children.cycles-pp.do_task_dead
      0.87 ±  3%      -0.6        0.25 ± 19%  perf-profile.children.cycles-pp.perf_iterate_sb
      0.82 ±  4%      -0.6        0.22 ± 10%  perf-profile.children.cycles-pp.sched_ttwu_pending
      1.14 ±  3%      -0.6        0.54 ± 10%  perf-profile.children.cycles-pp.activate_task
      0.84            -0.6        0.25 ± 10%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.81 ±  6%      -0.6        0.22 ± 11%  perf-profile.children.cycles-pp.find_idlest_group
      0.75 ±  5%      -0.6        0.18 ± 14%  perf-profile.children.cycles-pp.step_into
      0.74 ±  8%      -0.6        0.18 ± 14%  perf-profile.children.cycles-pp.__alloc_pages_bulk
      0.74 ±  6%      -0.5        0.19 ± 11%  perf-profile.children.cycles-pp.update_sg_wakeup_stats
      0.72 ±  5%      -0.5        0.18 ± 15%  perf-profile.children.cycles-pp.pick_link
      1.06 ±  2%      -0.5        0.52 ±  9%  perf-profile.children.cycles-pp.enqueue_task_fair
      0.77 ±  6%      -0.5        0.23 ± 12%  perf-profile.children.cycles-pp.unmap_vmas
      0.76 ±  2%      -0.5        0.22 ±  8%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.94 ±  2%      -0.5        0.42 ± 10%  perf-profile.children.cycles-pp.dequeue_task_fair
      0.65 ±  5%      -0.5        0.15 ± 18%  perf-profile.children.cycles-pp.open_last_lookups
      1.37 ±  3%      -0.5        0.87 ±  4%  perf-profile.children.cycles-pp.llist_add_batch
      0.70 ±  4%      -0.5        0.22 ± 19%  perf-profile.children.cycles-pp.memcpy_orig
      0.91 ±  4%      -0.5        0.44 ±  7%  perf-profile.children.cycles-pp.update_load_avg
      0.67            -0.5        0.20 ±  8%  perf-profile.children.cycles-pp.switch_fpu_return
      0.88 ±  3%      -0.5        0.42 ±  8%  perf-profile.children.cycles-pp.enqueue_entity
      0.91 ±  4%      -0.5        0.45 ± 12%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.77 ±  4%      -0.5        0.32 ± 10%  perf-profile.children.cycles-pp.schedule_hrtimeout_range_clock
      0.63 ±  5%      -0.4        0.20 ± 21%  perf-profile.children.cycles-pp.arch_dup_task_struct
      0.74 ±  3%      -0.4        0.32 ± 15%  perf-profile.children.cycles-pp.dequeue_entity
      0.62 ±  5%      -0.4        0.21 ±  5%  perf-profile.children.cycles-pp.finish_task_switch
      0.56            -0.4        0.16 ±  7%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
      0.53 ±  4%      -0.4        0.13 ±  9%  perf-profile.children.cycles-pp.syscall
      0.50 ±  9%      -0.4        0.11 ± 18%  perf-profile.children.cycles-pp.__get_vm_area_node
      0.51 ±  3%      -0.4        0.12 ± 12%  perf-profile.children.cycles-pp.__slab_free
      0.52 ±  2%      -0.4        0.14 ± 10%  perf-profile.children.cycles-pp.kmem_cache_free
      0.75 ±  3%      -0.4        0.37 ±  9%  perf-profile.children.cycles-pp.exit_mm_release
      0.50 ±  6%      -0.4        0.12 ± 21%  perf-profile.children.cycles-pp.do_send_specific
      0.74 ±  3%      -0.4        0.37 ±  8%  perf-profile.children.cycles-pp.futex_exit_release
      0.45 ± 10%      -0.4        0.09 ± 17%  perf-profile.children.cycles-pp.alloc_vmap_area
      0.47 ±  3%      -0.4        0.11 ± 20%  perf-profile.children.cycles-pp.tgkill
      0.68 ± 11%      -0.4        0.32 ± 12%  perf-profile.children.cycles-pp.__mmap
      0.48 ±  3%      -0.4        0.13 ±  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.76 ±  5%      -0.3        0.41 ± 10%  perf-profile.children.cycles-pp.wake_up_q
      0.42 ±  7%      -0.3        0.08 ± 22%  perf-profile.children.cycles-pp.__close
      0.49 ±  7%      -0.3        0.14 ± 25%  perf-profile.children.cycles-pp.kmem_cache_alloc
      0.49 ±  9%      -0.3        0.15 ± 14%  perf-profile.children.cycles-pp.mas_store_gfp
      0.46 ±  4%      -0.3        0.12 ± 23%  perf-profile.children.cycles-pp.perf_event_task_output
      0.44 ± 10%      -0.3        0.10 ± 28%  perf-profile.children.cycles-pp.pthread_sigqueue
      0.46 ±  4%      -0.3        0.12 ± 15%  perf-profile.children.cycles-pp.link_path_walk
      0.42 ±  8%      -0.3        0.10 ± 20%  perf-profile.children.cycles-pp.proc_ns_get_link
      0.63 ± 10%      -0.3        0.32 ± 12%  perf-profile.children.cycles-pp.vm_mmap_pgoff
      0.45 ±  4%      -0.3        0.14 ± 13%  perf-profile.children.cycles-pp.sched_move_task
      0.36 ±  8%      -0.3        0.06 ± 49%  perf-profile.children.cycles-pp.__x64_sys_close
      0.46 ±  8%      -0.3        0.17 ± 14%  perf-profile.children.cycles-pp.prctl
      0.65 ±  3%      -0.3        0.35 ±  7%  perf-profile.children.cycles-pp.futex_cleanup
      0.42 ±  7%      -0.3        0.12 ± 15%  perf-profile.children.cycles-pp.mas_store_prealloc
      0.49 ±  5%      -0.3        0.20 ± 13%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      0.37 ±  7%      -0.3        0.08 ± 16%  perf-profile.children.cycles-pp.do_tkill
      0.36 ± 10%      -0.3        0.08 ± 20%  perf-profile.children.cycles-pp.ns_get_path
      0.37 ±  4%      -0.3        0.09 ± 18%  perf-profile.children.cycles-pp.setns
      0.67 ±  3%      -0.3        0.41 ±  8%  perf-profile.children.cycles-pp.hrtimer_wakeup
      0.35 ±  5%      -0.3        0.10 ± 16%  perf-profile.children.cycles-pp.__task_pid_nr_ns
      0.41 ±  5%      -0.3        0.16 ± 12%  perf-profile.children.cycles-pp.mas_wr_bnode
      0.35 ±  4%      -0.3        0.10 ± 20%  perf-profile.children.cycles-pp.rcu_cblist_dequeue
      0.37 ±  5%      -0.2        0.12 ± 17%  perf-profile.children.cycles-pp.exit_task_stack_account
      0.56 ±  4%      -0.2        0.31 ± 12%  perf-profile.children.cycles-pp.select_task_rq
      0.29 ±  6%      -0.2        0.05 ± 46%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.34 ±  4%      -0.2        0.10 ± 27%  perf-profile.children.cycles-pp.perf_event_task
      0.39 ±  9%      -0.2        0.15 ± 12%  perf-profile.children.cycles-pp.__switch_to_asm
      0.35 ±  5%      -0.2        0.11 ± 11%  perf-profile.children.cycles-pp.account_kernel_stack
      0.30 ±  7%      -0.2        0.06 ± 48%  perf-profile.children.cycles-pp.__ns_get_path
      0.31 ±  9%      -0.2        0.07 ± 17%  perf-profile.children.cycles-pp.free_vmap_area_noflush
      0.31 ±  5%      -0.2        0.08 ± 19%  perf-profile.children.cycles-pp.__do_sys_setns
      0.33 ±  7%      -0.2        0.10 ±  7%  perf-profile.children.cycles-pp.__free_one_page
      0.31 ± 11%      -0.2        0.08 ± 13%  perf-profile.children.cycles-pp.__pte_alloc
      0.36 ±  6%      -0.2        0.13 ± 12%  perf-profile.children.cycles-pp.switch_mm_irqs_off
      0.27 ± 12%      -0.2        0.05 ± 71%  perf-profile.children.cycles-pp.__fput
      0.53 ±  9%      -0.2        0.31 ± 12%  perf-profile.children.cycles-pp.do_mmap
      0.27 ± 12%      -0.2        0.05 ± 77%  perf-profile.children.cycles-pp.__x64_sys_rt_tgsigqueueinfo
      0.28 ±  5%      -0.2        0.06 ± 50%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.34 ± 10%      -0.2        0.12 ± 29%  perf-profile.children.cycles-pp.futex_wait_setup
      0.27 ±  6%      -0.2        0.06 ± 45%  perf-profile.children.cycles-pp.__x64_sys_tgkill
      0.31 ±  7%      -0.2        0.11 ± 18%  perf-profile.children.cycles-pp.__switch_to
      0.26 ±  8%      -0.2        0.06 ± 21%  perf-profile.children.cycles-pp.__call_rcu_common
      0.33 ±  9%      -0.2        0.13 ± 18%  perf-profile.children.cycles-pp.__do_sys_prctl
      0.28 ±  5%      -0.2        0.08 ± 17%  perf-profile.children.cycles-pp.mm_release
      0.52 ±  2%      -0.2        0.32 ±  9%  perf-profile.children.cycles-pp.__get_user_8
      0.24 ± 10%      -0.2        0.04 ± 72%  perf-profile.children.cycles-pp.dput
      0.25 ± 14%      -0.2        0.05 ± 46%  perf-profile.children.cycles-pp.perf_event_mmap
      0.24 ±  7%      -0.2        0.06 ± 50%  perf-profile.children.cycles-pp.mas_walk
      0.28 ±  6%      -0.2        0.10 ± 24%  perf-profile.children.cycles-pp.rmqueue_bulk
      0.23 ± 15%      -0.2        0.05 ± 46%  perf-profile.children.cycles-pp.perf_event_mmap_event
      0.25 ± 15%      -0.2        0.08 ± 45%  perf-profile.children.cycles-pp.___slab_alloc
      0.20 ± 14%      -0.2        0.03 ±100%  perf-profile.children.cycles-pp.lookup_fast
      0.20 ± 10%      -0.2        0.04 ± 75%  perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
      0.28 ±  7%      -0.2        0.12 ± 24%  perf-profile.children.cycles-pp.prepare_task_switch
      0.22 ± 11%      -0.2        0.05 ±  8%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
      0.63 ±  5%      -0.2        0.47 ± 12%  perf-profile.children.cycles-pp.llist_reverse_order
      0.25 ± 11%      -0.2        0.09 ± 34%  perf-profile.children.cycles-pp.futex_q_lock
      0.21 ±  6%      -0.2        0.06 ± 47%  perf-profile.children.cycles-pp.kmem_cache_alloc_node
      0.18 ± 11%      -0.2        0.03 ±100%  perf-profile.children.cycles-pp.alloc_empty_file
      0.19 ±  5%      -0.2        0.04 ± 71%  perf-profile.children.cycles-pp.__put_task_struct
      0.19 ± 15%      -0.2        0.03 ± 70%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.24 ±  6%      -0.2        0.09 ± 20%  perf-profile.children.cycles-pp.___perf_sw_event
      0.18 ±  7%      -0.2        0.03 ±100%  perf-profile.children.cycles-pp.perf_event_fork
      0.19 ± 11%      -0.1        0.04 ± 71%  perf-profile.children.cycles-pp.select_idle_core
      0.30 ± 11%      -0.1        0.15 ±  7%  perf-profile.children.cycles-pp.pte_alloc_one
      0.25 ±  6%      -0.1        0.11 ± 10%  perf-profile.children.cycles-pp.set_next_entity
      0.20 ± 10%      -0.1        0.06 ± 49%  perf-profile.children.cycles-pp.__perf_event_header__init_id
      0.18 ± 15%      -0.1        0.03 ±101%  perf-profile.children.cycles-pp.__radix_tree_lookup
      0.22 ± 11%      -0.1        0.08 ± 21%  perf-profile.children.cycles-pp.mas_spanning_rebalance
      0.20 ±  9%      -0.1        0.06 ±  9%  perf-profile.children.cycles-pp.stress_pthread_func
      0.18 ± 12%      -0.1        0.04 ± 73%  perf-profile.children.cycles-pp.__getpid
      0.16 ± 13%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.walk_component
      0.28 ±  5%      -0.1        0.15 ± 13%  perf-profile.children.cycles-pp.update_curr
      0.25 ±  5%      -0.1        0.11 ± 22%  perf-profile.children.cycles-pp.balance_fair
      0.16 ±  9%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.futex_wake_mark
      0.16 ± 12%      -0.1        0.04 ± 71%  perf-profile.children.cycles-pp.get_futex_key
      0.17 ±  6%      -0.1        0.05 ± 47%  perf-profile.children.cycles-pp.memcg_account_kmem
      0.25 ± 11%      -0.1        0.12 ± 11%  perf-profile.children.cycles-pp._find_next_bit
      0.15 ± 13%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.do_open
      0.20 ±  8%      -0.1        0.08 ± 16%  perf-profile.children.cycles-pp.mas_rebalance
      0.17 ± 13%      -0.1        0.05 ± 45%  perf-profile.children.cycles-pp.__memcg_kmem_charge_page
      0.33 ±  6%      -0.1        0.21 ± 10%  perf-profile.children.cycles-pp.select_idle_sibling
      0.14 ± 11%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.get_user_pages_fast
      0.18 ±  7%      -0.1        0.07 ± 14%  perf-profile.children.cycles-pp.mas_alloc_nodes
      0.14 ± 11%      -0.1        0.03 ±101%  perf-profile.children.cycles-pp.set_task_cpu
      0.14 ± 12%      -0.1        0.03 ±101%  perf-profile.children.cycles-pp.vm_unmapped_area
      0.38 ±  6%      -0.1        0.27 ±  7%  perf-profile.children.cycles-pp.native_sched_clock
      0.16 ± 10%      -0.1        0.05 ± 47%  perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      0.36 ±  9%      -0.1        0.25 ± 12%  perf-profile.children.cycles-pp.mmap_region
      0.23 ±  7%      -0.1        0.12 ±  9%  perf-profile.children.cycles-pp.available_idle_cpu
      0.13 ± 11%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.internal_get_user_pages_fast
      0.16 ± 10%      -0.1        0.06 ± 18%  perf-profile.children.cycles-pp.get_unmapped_area
      0.50 ±  7%      -0.1        0.40 ±  6%  perf-profile.children.cycles-pp.menu_select
      0.24 ±  9%      -0.1        0.14 ± 13%  perf-profile.children.cycles-pp.rmqueue
      0.17 ± 14%      -0.1        0.07 ± 26%  perf-profile.children.cycles-pp.perf_event_comm
      0.17 ± 15%      -0.1        0.07 ± 23%  perf-profile.children.cycles-pp.perf_event_comm_event
      0.17 ± 11%      -0.1        0.07 ± 14%  perf-profile.children.cycles-pp.pick_next_entity
      0.13 ± 14%      -0.1        0.03 ±102%  perf-profile.children.cycles-pp.perf_output_begin
      0.23 ±  6%      -0.1        0.13 ± 21%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.14 ± 18%      -0.1        0.04 ± 72%  perf-profile.children.cycles-pp.perf_event_comm_output
      0.21 ±  9%      -0.1        0.12 ±  9%  perf-profile.children.cycles-pp.update_rq_clock
      0.16 ±  8%      -0.1        0.06 ± 19%  perf-profile.children.cycles-pp.mas_split
      0.13 ± 14%      -0.1        0.04 ± 71%  perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
      0.13 ±  6%      -0.1        0.04 ± 71%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.13 ±  7%      -0.1        0.04 ± 72%  perf-profile.children.cycles-pp.mas_topiary_replace
      0.14 ±  8%      -0.1        0.06 ±  9%  perf-profile.children.cycles-pp.mas_preallocate
      0.16 ± 11%      -0.1        0.07 ± 18%  perf-profile.children.cycles-pp.__pick_eevdf
      0.11 ± 14%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.mas_empty_area_rev
      0.25 ±  7%      -0.1        0.17 ± 10%  perf-profile.children.cycles-pp.select_idle_cpu
      0.14 ± 12%      -0.1        0.06 ± 14%  perf-profile.children.cycles-pp.cpu_stopper_thread
      0.14 ± 10%      -0.1        0.06 ± 13%  perf-profile.children.cycles-pp.active_load_balance_cpu_stop
      0.14 ± 14%      -0.1        0.06 ± 11%  perf-profile.children.cycles-pp.os_xsave
      0.18 ±  6%      -0.1        0.11 ± 14%  perf-profile.children.cycles-pp.idle_cpu
      0.17 ±  4%      -0.1        0.10 ± 15%  perf-profile.children.cycles-pp.hrtimer_start_range_ns
      0.11 ± 14%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.__pthread_mutex_lock
      0.32 ±  5%      -0.1        0.25 ±  5%  perf-profile.children.cycles-pp.sched_clock
      0.11 ±  6%      -0.1        0.03 ± 70%  perf-profile.children.cycles-pp.wakeup_preempt
      0.23 ±  7%      -0.1        0.16 ± 13%  perf-profile.children.cycles-pp.update_rq_clock_task
      0.13 ±  8%      -0.1        0.06 ± 16%  perf-profile.children.cycles-pp.local_clock_noinstr
      0.11 ± 10%      -0.1        0.04 ± 71%  perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
      0.34 ±  4%      -0.1        0.27 ±  6%  perf-profile.children.cycles-pp.sched_clock_cpu
      0.11 ±  9%      -0.1        0.04 ± 76%  perf-profile.children.cycles-pp.avg_vruntime
      0.15 ±  8%      -0.1        0.08 ± 14%  perf-profile.children.cycles-pp.update_cfs_group
      0.10 ±  8%      -0.1        0.04 ± 71%  perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
      0.13 ±  8%      -0.1        0.06 ± 11%  perf-profile.children.cycles-pp.sched_use_asym_prio
      0.09 ± 12%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.getname_flags
      0.18 ±  9%      -0.1        0.12 ± 12%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.11 ±  8%      -0.1        0.05 ± 46%  perf-profile.children.cycles-pp.place_entity
      0.08 ± 12%      -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.folio_add_lru_vma
      0.10 ±  7%      -0.0        0.05 ± 46%  perf-profile.children.cycles-pp._find_next_and_bit
      0.10 ±  6%      -0.0        0.06 ± 24%  perf-profile.children.cycles-pp.reweight_entity
      0.03 ± 70%      +0.0        0.08 ± 14%  perf-profile.children.cycles-pp.perf_rotate_context
      0.19 ± 10%      +0.1        0.25 ±  7%  perf-profile.children.cycles-pp.irqtime_account_irq
      0.08 ± 11%      +0.1        0.14 ± 21%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
      0.00            +0.1        0.06 ± 14%  perf-profile.children.cycles-pp.rcu_pending
      0.10 ± 17%      +0.1        0.16 ± 13%  perf-profile.children.cycles-pp.rebalance_domains
      0.14 ± 16%      +0.1        0.21 ± 12%  perf-profile.children.cycles-pp.downgrade_write
      0.14 ± 14%      +0.1        0.21 ± 10%  perf-profile.children.cycles-pp.down_read_killable
      0.00            +0.1        0.07 ± 11%  perf-profile.children.cycles-pp.free_tail_page_prepare
      0.02 ±141%      +0.1        0.09 ± 20%  perf-profile.children.cycles-pp.rcu_sched_clock_irq
      0.01 ±223%      +0.1        0.08 ± 25%  perf-profile.children.cycles-pp.arch_scale_freq_tick
      0.55 ±  9%      +0.1        0.62 ±  9%  perf-profile.children.cycles-pp.__alloc_pages
      0.34 ±  5%      +0.1        0.41 ±  9%  perf-profile.children.cycles-pp.clock_nanosleep
      0.00            +0.1        0.08 ± 23%  perf-profile.children.cycles-pp.tick_nohz_next_event
      0.70 ±  2%      +0.1        0.78 ±  5%  perf-profile.children.cycles-pp.flush_tlb_func
      0.14 ± 10%      +0.1        0.23 ± 13%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
      0.07 ± 19%      +0.1        0.17 ± 17%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.04 ± 71%      +0.1        0.14 ± 11%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
      0.25 ±  9%      +0.1        0.38 ± 11%  perf-profile.children.cycles-pp.down_read
      0.43 ±  9%      +0.1        0.56 ± 10%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.00            +0.1        0.15 ±  6%  perf-profile.children.cycles-pp.vm_normal_page
      0.31 ±  7%      +0.2        0.46 ±  9%  perf-profile.children.cycles-pp.native_flush_tlb_local
      0.00            +0.2        0.16 ±  8%  perf-profile.children.cycles-pp.__tlb_remove_page_size
      0.28 ± 11%      +0.2        0.46 ± 13%  perf-profile.children.cycles-pp.vma_alloc_folio
      0.00            +0.2        0.24 ±  5%  perf-profile.children.cycles-pp._compound_head
      0.07 ± 16%      +0.2        0.31 ±  6%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.38 ±  5%      +0.2        0.62 ±  7%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
      0.22 ± 12%      +0.2        0.47 ± 10%  perf-profile.children.cycles-pp.schedule_preempt_disabled
      0.38 ±  5%      +0.3        0.64 ±  7%  perf-profile.children.cycles-pp.perf_event_task_tick
      0.00            +0.3        0.27 ±  5%  perf-profile.children.cycles-pp.free_swap_cache
      0.30 ± 10%      +0.3        0.58 ± 10%  perf-profile.children.cycles-pp.rwsem_down_read_slowpath
      0.00            +0.3        0.30 ±  4%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.09 ± 10%      +0.3        0.42 ±  7%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.00            +0.3        0.34 ±  9%  perf-profile.children.cycles-pp.deferred_split_folio
      0.00            +0.4        0.36 ± 13%  perf-profile.children.cycles-pp.prep_compound_page
      0.09 ± 10%      +0.4        0.50 ±  9%  perf-profile.children.cycles-pp.free_unref_page_prepare
      0.00            +0.4        0.42 ± 11%  perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page
      1.67 ±  3%      +0.4        2.12 ±  8%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.63 ±  3%      +0.5        1.11 ± 12%  perf-profile.children.cycles-pp.scheduler_tick
      1.93 ±  3%      +0.5        2.46 ±  8%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      1.92 ±  3%      +0.5        2.45 ±  8%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.73 ±  3%      +0.6        1.31 ± 11%  perf-profile.children.cycles-pp.update_process_times
      0.74 ±  3%      +0.6        1.34 ± 11%  perf-profile.children.cycles-pp.tick_sched_handle
      0.20 ±  8%      +0.6        0.83 ± 18%  perf-profile.children.cycles-pp.__cond_resched
      0.78 ±  4%      +0.6        1.43 ± 12%  perf-profile.children.cycles-pp.tick_nohz_highres_handler
      0.12 ±  7%      +0.7        0.81 ±  5%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      0.28 ±  7%      +0.9        1.23 ±  4%  perf-profile.children.cycles-pp.release_pages
      0.00            +1.0        1.01 ±  6%  perf-profile.children.cycles-pp.pmdp_invalidate
      0.35 ±  6%      +1.2        1.56 ±  5%  perf-profile.children.cycles-pp.__mod_lruvec_page_state
      0.30 ±  8%      +1.2        1.53 ±  4%  perf-profile.children.cycles-pp.tlb_batch_pages_flush
      0.00            +1.3        1.26 ±  4%  perf-profile.children.cycles-pp.page_add_anon_rmap
      0.09 ± 11%      +3.1        3.20 ±  5%  perf-profile.children.cycles-pp.page_remove_rmap
      1.60 ±  2%      +3.4        5.04 ±  4%  perf-profile.children.cycles-pp.zap_pte_range
      0.03 ±100%      +3.5        3.55 ±  5%  perf-profile.children.cycles-pp.__split_huge_pmd_locked
     41.36           +11.6       52.92 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     41.22           +11.7       52.88 ±  2%  perf-profile.children.cycles-pp.do_syscall_64
      6.42 ±  6%     +13.5       19.88 ±  7%  perf-profile.children.cycles-pp.__clone
      0.82 ±  6%     +16.2       16.98 ±  7%  perf-profile.children.cycles-pp.clear_page_erms
      2.62 ±  5%     +16.4       19.04 ±  7%  perf-profile.children.cycles-pp.asm_exc_page_fault
      2.18 ±  5%     +16.8       18.94 ±  7%  perf-profile.children.cycles-pp.exc_page_fault
      2.06 ±  6%     +16.8       18.90 ±  7%  perf-profile.children.cycles-pp.do_user_addr_fault
      1.60 ±  8%     +17.0       18.60 ±  7%  perf-profile.children.cycles-pp.handle_mm_fault
      1.52 ±  7%     +17.1       18.58 ±  7%  perf-profile.children.cycles-pp.__handle_mm_fault
      0.30 ±  7%     +17.4       17.72 ±  7%  perf-profile.children.cycles-pp.clear_huge_page
      0.31 ±  8%     +17.6       17.90 ±  7%  perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
     11.66 ±  3%     +22.2       33.89 ±  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      3.29 ±  3%     +30.2       33.46        perf-profile.children.cycles-pp._raw_spin_lock
      0.04 ± 71%     +36.2       36.21 ±  2%  perf-profile.children.cycles-pp.__split_huge_pmd
      8.00 ±  4%     +36.5       44.54 ±  2%  perf-profile.children.cycles-pp.__madvise
      7.87 ±  4%     +36.6       44.44 ±  2%  perf-profile.children.cycles-pp.__x64_sys_madvise
      7.86 ±  4%     +36.6       44.44 ±  2%  perf-profile.children.cycles-pp.do_madvise
      7.32 ±  4%     +36.8       44.07 ±  2%  perf-profile.children.cycles-pp.madvise_vma_behavior
      7.26 ±  4%     +36.8       44.06 ±  2%  perf-profile.children.cycles-pp.zap_page_range_single
      1.78           +39.5       41.30 ±  2%  perf-profile.children.cycles-pp.unmap_page_range
      1.72           +39.6       41.28 ±  2%  perf-profile.children.cycles-pp.zap_pmd_range
     24.76 ±  2%      -8.5       16.31 ±  2%  perf-profile.self.cycles-pp.intel_idle
     11.46 ±  2%      -7.8        3.65 ±  5%  perf-profile.self.cycles-pp.intel_idle_irq
      3.16 ±  7%      -2.1        1.04 ±  6%  perf-profile.self.cycles-pp.smp_call_function_many_cond
      1.49 ±  4%      -1.2        0.30 ± 12%  perf-profile.self.cycles-pp.poll_idle
      1.15 ±  3%      -0.6        0.50 ±  9%  perf-profile.self.cycles-pp._raw_spin_lock
      0.60 ±  6%      -0.6        0.03 ±100%  perf-profile.self.cycles-pp.queued_write_lock_slowpath
      0.69 ±  4%      -0.5        0.22 ± 20%  perf-profile.self.cycles-pp.memcpy_orig
      0.66 ±  7%      -0.5        0.18 ± 11%  perf-profile.self.cycles-pp.update_sg_wakeup_stats
      0.59 ±  4%      -0.5        0.13 ±  8%  perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.86 ±  3%      -0.4        0.43 ± 12%  perf-profile.self.cycles-pp.update_sg_lb_stats
      0.56            -0.4        0.16 ±  7%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
      0.48 ±  3%      -0.4        0.12 ± 10%  perf-profile.self.cycles-pp.__slab_free
      1.18 ±  2%      -0.4        0.82 ±  3%  perf-profile.self.cycles-pp.llist_add_batch
      0.54 ±  5%      -0.3        0.19 ±  6%  perf-profile.self.cycles-pp.__schedule
      0.47 ±  7%      -0.3        0.18 ± 13%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.34 ±  5%      -0.2        0.09 ± 18%  perf-profile.self.cycles-pp.kmem_cache_free
      0.43 ±  4%      -0.2        0.18 ± 11%  perf-profile.self.cycles-pp.update_load_avg
      0.35 ±  4%      -0.2        0.10 ± 23%  perf-profile.self.cycles-pp.rcu_cblist_dequeue
      0.38 ±  9%      -0.2        0.15 ± 10%  perf-profile.self.cycles-pp.__switch_to_asm
      0.33 ±  5%      -0.2        0.10 ± 16%  perf-profile.self.cycles-pp.__task_pid_nr_ns
      0.36 ±  6%      -0.2        0.13 ± 14%  perf-profile.self.cycles-pp.switch_mm_irqs_off
      0.31 ±  6%      -0.2        0.09 ±  6%  perf-profile.self.cycles-pp.__free_one_page
      0.28 ±  5%      -0.2        0.06 ± 50%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.27 ± 13%      -0.2        0.06 ± 23%  perf-profile.self.cycles-pp.pthread_create@@GLIBC_2.2.5
      0.30 ±  7%      -0.2        0.10 ± 19%  perf-profile.self.cycles-pp.__switch_to
      0.27 ±  4%      -0.2        0.10 ± 17%  perf-profile.self.cycles-pp.finish_task_switch
      0.23 ±  7%      -0.2        0.06 ± 50%  perf-profile.self.cycles-pp.mas_walk
      0.22 ±  9%      -0.2        0.05 ± 48%  perf-profile.self.cycles-pp.__clone
      0.63 ±  5%      -0.2        0.46 ± 12%  perf-profile.self.cycles-pp.llist_reverse_order
      0.20 ±  4%      -0.2        0.04 ± 72%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.24 ± 10%      -0.1        0.09 ± 19%  perf-profile.self.cycles-pp.rmqueue_bulk
      0.18 ± 13%      -0.1        0.03 ±101%  perf-profile.self.cycles-pp.__radix_tree_lookup
      0.18 ± 11%      -0.1        0.04 ± 71%  perf-profile.self.cycles-pp.stress_pthread_func
      0.36 ±  8%      -0.1        0.22 ± 11%  perf-profile.self.cycles-pp.menu_select
      0.22 ±  4%      -0.1        0.08 ± 19%  perf-profile.self.cycles-pp.___perf_sw_event
      0.20 ± 13%      -0.1        0.07 ± 20%  perf-profile.self.cycles-pp.start_thread
      0.16 ± 13%      -0.1        0.03 ±101%  perf-profile.self.cycles-pp.alloc_vmap_area
      0.17 ± 10%      -0.1        0.04 ± 73%  perf-profile.self.cycles-pp.kmem_cache_alloc
      0.14 ±  9%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.futex_wake
      0.17 ±  4%      -0.1        0.06 ± 11%  perf-profile.self.cycles-pp.dequeue_task_fair
      0.23 ±  6%      -0.1        0.12 ± 11%  perf-profile.self.cycles-pp.available_idle_cpu
      0.22 ± 13%      -0.1        0.11 ± 12%  perf-profile.self.cycles-pp._find_next_bit
      0.21 ±  7%      -0.1        0.10 ±  6%  perf-profile.self.cycles-pp.__rmqueue_pcplist
      0.37 ±  7%      -0.1        0.26 ±  8%  perf-profile.self.cycles-pp.native_sched_clock
      0.22 ±  7%      -0.1        0.12 ± 21%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.19 ±  7%      -0.1        0.10 ± 11%  perf-profile.self.cycles-pp.enqueue_entity
      0.15 ±  5%      -0.1        0.06 ± 45%  perf-profile.self.cycles-pp.enqueue_task_fair
      0.15 ± 11%      -0.1        0.06 ± 17%  perf-profile.self.cycles-pp.__pick_eevdf
      0.13 ± 13%      -0.1        0.05 ± 72%  perf-profile.self.cycles-pp.prepare_task_switch
      0.17 ± 10%      -0.1        0.08 ±  8%  perf-profile.self.cycles-pp.update_rq_clock_task
      0.54 ±  4%      -0.1        0.46 ±  6%  perf-profile.self.cycles-pp.__flush_smp_call_function_queue
      0.14 ± 14%      -0.1        0.06 ± 11%  perf-profile.self.cycles-pp.os_xsave
      0.11 ± 10%      -0.1        0.03 ± 70%  perf-profile.self.cycles-pp.try_to_wake_up
      0.10 ±  8%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.futex_wait
      0.14 ±  9%      -0.1        0.07 ± 10%  perf-profile.self.cycles-pp.update_curr
      0.18 ±  9%      -0.1        0.11 ± 14%  perf-profile.self.cycles-pp.idle_cpu
      0.11 ± 11%      -0.1        0.04 ± 76%  perf-profile.self.cycles-pp.avg_vruntime
      0.15 ± 10%      -0.1        0.08 ± 14%  perf-profile.self.cycles-pp.update_cfs_group
      0.09 ±  9%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.reweight_entity
      0.12 ± 13%      -0.1        0.06 ±  8%  perf-profile.self.cycles-pp.do_idle
      0.18 ± 10%      -0.1        0.12 ± 13%  perf-profile.self.cycles-pp.__update_load_avg_se
      0.09 ± 17%      -0.1        0.04 ± 71%  perf-profile.self.cycles-pp.cpuidle_idle_call
      0.10 ± 11%      -0.0        0.06 ± 45%  perf-profile.self.cycles-pp.update_rq_clock
      0.12 ± 15%      -0.0        0.07 ± 16%  perf-profile.self.cycles-pp.update_sd_lb_stats
      0.09 ±  5%      -0.0        0.05 ± 46%  perf-profile.self.cycles-pp._find_next_and_bit
      0.01 ±223%      +0.1        0.08 ± 25%  perf-profile.self.cycles-pp.arch_scale_freq_tick
      0.78 ±  4%      +0.1        0.87 ±  4%  perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
      0.14 ± 10%      +0.1        0.23 ± 13%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
      0.06 ± 46%      +0.1        0.15 ± 19%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.19 ±  3%      +0.1        0.29 ±  4%  perf-profile.self.cycles-pp.cpuidle_enter_state
      0.00            +0.1        0.10 ± 11%  perf-profile.self.cycles-pp.__mod_lruvec_state
      0.00            +0.1        0.11 ± 18%  perf-profile.self.cycles-pp.__tlb_remove_page_size
      0.00            +0.1        0.12 ±  9%  perf-profile.self.cycles-pp.vm_normal_page
      0.23 ±  7%      +0.1        0.36 ±  8%  perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
      0.20 ±  8%      +0.2        0.35 ±  7%  perf-profile.self.cycles-pp.__mod_lruvec_page_state
      1.12 ±  2%      +0.2        1.28 ±  4%  perf-profile.self.cycles-pp.zap_pte_range
      0.31 ±  8%      +0.2        0.46 ±  9%  perf-profile.self.cycles-pp.native_flush_tlb_local
      0.00            +0.2        0.16 ±  5%  perf-profile.self.cycles-pp._compound_head
      0.06 ± 17%      +0.2        0.26 ±  4%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.00            +0.2        0.24 ±  6%  perf-profile.self.cycles-pp.free_swap_cache
      0.00            +0.3        0.27 ± 15%  perf-profile.self.cycles-pp.clear_huge_page
      0.00            +0.3        0.27 ± 11%  perf-profile.self.cycles-pp.deferred_split_folio
      0.00            +0.4        0.36 ± 13%  perf-profile.self.cycles-pp.prep_compound_page
      0.05 ± 47%      +0.4        0.43 ±  9%  perf-profile.self.cycles-pp.free_unref_page_prepare
      0.08 ±  7%      +0.5        0.57 ± 23%  perf-profile.self.cycles-pp.__cond_resched
      0.08 ± 12%      +0.5        0.58 ±  5%  perf-profile.self.cycles-pp.release_pages
      0.10 ± 10%      +0.5        0.63 ±  6%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      0.00            +1.1        1.11 ±  7%  perf-profile.self.cycles-pp.__split_huge_pmd_locked
      0.00            +1.2        1.18 ±  4%  perf-profile.self.cycles-pp.page_add_anon_rmap
      0.03 ±101%      +1.3        1.35 ±  7%  perf-profile.self.cycles-pp.page_remove_rmap
      0.82 ±  5%     +16.1       16.88 ±  7%  perf-profile.self.cycles-pp.clear_page_erms
     11.65 ±  3%     +20.2       31.88 ±  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


***************************************************************************************************
lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
=========================================================================================
array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
  50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream

commit: 
  30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
  1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")

30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     10.50 ± 14%     +55.6%      16.33 ± 16%  perf-c2c.DRAM.local
      6724           -11.4%       5954 ±  2%  vmstat.system.cs
 2.746e+09           +16.7%  3.205e+09 ±  2%  cpuidle..time
   2771516           +16.0%    3213723 ±  2%  cpuidle..usage
      0.06 ±  4%      -0.0        0.05 ±  5%  mpstat.cpu.all.soft%
      0.47 ±  2%      -0.1        0.39 ±  2%  mpstat.cpu.all.sys%
      0.01 ± 85%   +1700.0%       0.20 ±188%  perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
     15.11 ± 13%     -28.8%      10.76 ± 34%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     15.09 ± 13%     -30.3%      10.51 ± 38%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
   1023952           +13.4%    1161219        meminfo.AnonHugePages
   1319741           +10.8%    1461995        meminfo.AnonPages
   1331039           +11.2%    1480149        meminfo.Inactive
   1330865           +11.2%    1479975        meminfo.Inactive(anon)
   1266202           +16.0%    1469399 ±  2%  turbostat.C1E
   1509871           +16.6%    1760853 ±  2%  turbostat.C6
   3521203           +17.4%    4134075 ±  3%  turbostat.IRQ
    580.32            -3.8%     558.30        turbostat.PkgWatt
     77.42           -14.0%      66.60 ±  2%  turbostat.RAMWatt
    330416           +10.8%     366020        proc-vmstat.nr_anon_pages
    500.90           +13.4%     567.99        proc-vmstat.nr_anon_transparent_hugepages
    333197           +11.2%     370536        proc-vmstat.nr_inactive_anon
    333197           +11.2%     370536        proc-vmstat.nr_zone_inactive_anon
    129879 ± 11%     -46.7%      69207 ± 12%  proc-vmstat.numa_pages_migrated
   3879028            +5.9%    4109180        proc-vmstat.pgalloc_normal
   3403414            +6.6%    3628929        proc-vmstat.pgfree
    129879 ± 11%     -46.7%      69207 ± 12%  proc-vmstat.pgmigrate_success
      5763            +9.8%       6327        proc-vmstat.thp_fault_alloc
    350993           -15.6%     296081 ±  2%  stream.add_bandwidth_MBps
    349830           -16.1%     293492 ±  2%  stream.add_bandwidth_MBps_harmonicMean
    333973           -20.5%     265439 ±  3%  stream.copy_bandwidth_MBps
    332930           -21.7%     260548 ±  3%  stream.copy_bandwidth_MBps_harmonicMean
    302788           -16.2%     253817 ±  2%  stream.scale_bandwidth_MBps
    302157           -17.1%     250577 ±  2%  stream.scale_bandwidth_MBps_harmonicMean
   1177276            +9.3%    1286614        stream.time.maximum_resident_set_size
      5038            +1.1%       5095        stream.time.percent_of_cpu_this_job_got
    694.19 ±  2%     +19.5%     829.85 ±  2%  stream.time.user_time
    339047           -12.1%     298061        stream.triad_bandwidth_MBps
    338186           -12.4%     296218        stream.triad_bandwidth_MBps_harmonicMean
      8.42 ±100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi
      8.42 ±100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
      8.42 ±100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
      8.42 ±100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
      8.42 ±100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
      8.42 ±100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode
      0.84 ±103%      +1.7        2.57 ± 59%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      0.84 ±103%      +1.7        2.57 ± 59%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      0.31 ±223%      +2.0        2.33 ± 44%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
      0.31 ±223%      +2.0        2.33 ± 44%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
      3.07 ± 56%      +2.8        5.88 ± 28%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
      8.42 ±100%      -8.4        0.00        perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
      8.42 ±100%      -8.1        0.36 ±223%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
     12.32 ± 25%      -6.6        5.69 ± 69%  perf-profile.children.cycles-pp.vsnprintf
     12.76 ± 27%      -6.6        6.19 ± 67%  perf-profile.children.cycles-pp.seq_printf
      3.07 ± 56%      +2.8        5.88 ± 28%  perf-profile.children.cycles-pp.__x64_sys_exit_group
     40.11           -11.0%      35.71 ±  2%  perf-stat.i.MPKI
 1.563e+10           -12.3%  1.371e+10 ±  2%  perf-stat.i.branch-instructions
 3.721e+09 ±  2%     -23.2%  2.858e+09 ±  4%  perf-stat.i.cache-misses
 4.471e+09 ±  3%     -22.7%  3.458e+09 ±  4%  perf-stat.i.cache-references
      5970 ±  5%     -15.9%       5021 ±  4%  perf-stat.i.context-switches
      1.66 ±  2%     +15.8%       1.92 ±  2%  perf-stat.i.cpi
     41.83 ±  4%     +30.6%      54.63 ±  4%  perf-stat.i.cycles-between-cache-misses
 2.282e+10 ±  2%     -14.5%  1.952e+10 ±  2%  perf-stat.i.dTLB-loads
    572602 ±  3%      -9.2%     519922 ±  5%  perf-stat.i.dTLB-store-misses
 1.483e+10 ±  2%     -15.7%   1.25e+10 ±  2%  perf-stat.i.dTLB-stores
 9.179e+10           -13.7%  7.924e+10 ±  2%  perf-stat.i.instructions
      0.61           -13.4%       0.52 ±  2%  perf-stat.i.ipc
    373.79 ±  4%     -37.8%     232.60 ±  9%  perf-stat.i.metric.K/sec
    251.45           -13.4%     217.72 ±  2%  perf-stat.i.metric.M/sec
     21446 ±  3%     -24.1%      16278 ±  8%  perf-stat.i.minor-faults
     15.07 ±  5%      -6.0        9.10 ± 10%  perf-stat.i.node-load-miss-rate%
  68275790 ±  5%     -44.9%   37626128 ± 12%  perf-stat.i.node-load-misses
     21448 ±  3%     -24.1%      16281 ±  8%  perf-stat.i.page-faults
     40.71           -11.3%      36.10 ±  2%  perf-stat.overall.MPKI
      1.67           +15.3%       1.93 ±  2%  perf-stat.overall.cpi
     41.07 ±  3%     +30.1%      53.42 ±  4%  perf-stat.overall.cycles-between-cache-misses
      0.00 ±  2%      +0.0        0.00 ±  2%  perf-stat.overall.dTLB-store-miss-rate%
      0.60           -13.2%       0.52 ±  2%  perf-stat.overall.ipc
     15.19 ±  5%      -6.2        9.03 ± 11%  perf-stat.overall.node-load-miss-rate%
   1.4e+10            -9.3%  1.269e+10        perf-stat.ps.branch-instructions
 3.352e+09 ±  3%     -20.9%  2.652e+09 ±  4%  perf-stat.ps.cache-misses
 4.026e+09 ±  3%     -20.3%  3.208e+09 ±  4%  perf-stat.ps.cache-references
      4888 ±  4%     -10.8%       4362 ±  3%  perf-stat.ps.context-switches
    206092            +2.1%     210375        perf-stat.ps.cpu-clock
 1.375e+11            +2.8%  1.414e+11        perf-stat.ps.cpu-cycles
    258.23 ±  5%      +8.8%     280.85 ±  4%  perf-stat.ps.cpu-migrations
 2.048e+10           -11.7%  1.809e+10 ±  2%  perf-stat.ps.dTLB-loads
 1.333e+10 ±  2%     -13.0%   1.16e+10 ±  2%  perf-stat.ps.dTLB-stores
 8.231e+10           -10.8%  7.342e+10        perf-stat.ps.instructions
     15755 ±  3%     -16.3%      13187 ±  6%  perf-stat.ps.minor-faults
  61706790 ±  6%     -43.8%   34699716 ± 11%  perf-stat.ps.node-load-misses
     15757 ±  3%     -16.3%      13189 ±  6%  perf-stat.ps.page-faults
    206092            +2.1%     210375        perf-stat.ps.task-clock
 1.217e+12            +4.1%  1.267e+12 ±  2%  perf-stat.total.instructions



***************************************************************************************************
lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite

commit: 
  30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
  1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")

30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    232.12 ±  7%     -12.0%     204.18 ±  8%  sched_debug.cfs_rq:/.load_avg.stddev
      6797            -3.3%       6576        vmstat.system.cs
     15161            -0.9%      15029        vmstat.system.in
    349927           +44.3%     504820        meminfo.AnonHugePages
    507807           +27.1%     645169        meminfo.AnonPages
   1499332           +10.2%    1652612        meminfo.Inactive(anon)
      8.67 ± 62%    +184.6%      24.67 ± 25%  turbostat.C10
      1.50            -0.1        1.45        turbostat.C1E%
      3.30            -3.2%       3.20        turbostat.RAMWatt
      1.40 ± 14%      -0.3        1.09 ± 13%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
      1.44 ± 12%      -0.3        1.12 ± 13%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.03 ±141%      +0.1        0.10 ± 30%  perf-profile.children.cycles-pp.next_uptodate_folio
      0.02 ±141%      +0.1        0.10 ± 22%  perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
      0.02 ±143%      +0.1        0.10 ± 25%  perf-profile.self.cycles-pp.next_uptodate_folio
      0.01 ±223%      +0.1        0.09 ± 19%  perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
     19806            -3.5%      19109        phoronix-test-suite.ramspeed.Average.Integer.mb_s
    283.70            +3.8%     294.50        phoronix-test-suite.time.elapsed_time
    283.70            +3.8%     294.50        phoronix-test-suite.time.elapsed_time.max
    120454            +1.6%     122334        phoronix-test-suite.time.maximum_resident_set_size
    281337           -54.8%     127194        phoronix-test-suite.time.minor_page_faults
    259.13            +4.1%     269.81        phoronix-test-suite.time.user_time
    126951           +27.0%     161291        proc-vmstat.nr_anon_pages
    170.86           +44.3%     246.49        proc-vmstat.nr_anon_transparent_hugepages
    355917            -1.0%     352250        proc-vmstat.nr_dirty_background_threshold
    712705            -1.0%     705362        proc-vmstat.nr_dirty_threshold
   3265201            -1.1%    3228465        proc-vmstat.nr_free_pages
    374833           +10.2%     413153        proc-vmstat.nr_inactive_anon
      1767            +4.8%       1853        proc-vmstat.nr_page_table_pages
    374833           +10.2%     413153        proc-vmstat.nr_zone_inactive_anon
    854665           -34.3%     561406        proc-vmstat.numa_hit
    854632           -34.3%     561397        proc-vmstat.numa_local
   5548755            +1.1%    5610598        proc-vmstat.pgalloc_normal
   1083315           -26.2%     799129        proc-vmstat.pgfault
    113425            +3.7%     117656        proc-vmstat.pgreuse
      9025            +7.6%       9714        proc-vmstat.thp_fault_alloc
      3.38            +0.1        3.45        perf-stat.i.branch-miss-rate%
 4.135e+08            -3.2%  4.003e+08        perf-stat.i.cache-misses
 5.341e+08            -2.7%  5.197e+08        perf-stat.i.cache-references
      6832            -3.4%       6600        perf-stat.i.context-switches
      4.06            +3.1%       4.19        perf-stat.i.cpi
    438639 ±  5%     -18.7%     356730 ±  6%  perf-stat.i.dTLB-load-misses
 1.119e+09            -3.8%  1.077e+09        perf-stat.i.dTLB-loads
      0.02 ± 15%      -0.0        0.01 ± 26%  perf-stat.i.dTLB-store-miss-rate%
     80407 ± 10%     -63.5%      29387 ± 23%  perf-stat.i.dTLB-store-misses
 7.319e+08            -3.8%  7.043e+08        perf-stat.i.dTLB-stores
     57.72            +0.8       58.52        perf-stat.i.iTLB-load-miss-rate%
    129846            -3.8%     124973        perf-stat.i.iTLB-load-misses
    144448            -5.3%     136837        perf-stat.i.iTLB-loads
 2.389e+09            -3.5%  2.305e+09        perf-stat.i.instructions
      0.28            -2.9%       0.27        perf-stat.i.ipc
    220.59            -3.4%     213.11        perf-stat.i.metric.M/sec
      3610           -31.2%       2483        perf-stat.i.minor-faults
  49238342            +1.1%   49776834        perf-stat.i.node-loads
  98106028            -3.1%   95018390        perf-stat.i.node-stores
      3615           -31.2%       2487        perf-stat.i.page-faults
      3.65            +3.7%       3.78        perf-stat.overall.cpi
     21.08            +3.3%      21.79        perf-stat.overall.cycles-between-cache-misses
      0.04 ±  5%      -0.0        0.03 ±  6%  perf-stat.overall.dTLB-load-miss-rate%
      0.01 ± 10%      -0.0        0.00 ± 23%  perf-stat.overall.dTLB-store-miss-rate%
      0.27            -3.6%       0.26        perf-stat.overall.ipc
 4.122e+08            -3.2%   3.99e+08        perf-stat.ps.cache-misses
 5.324e+08            -2.7%  5.181e+08        perf-stat.ps.cache-references
      6809            -3.4%       6580        perf-stat.ps.context-switches
    437062 ±  5%     -18.7%     355481 ±  6%  perf-stat.ps.dTLB-load-misses
 1.115e+09            -3.8%  1.073e+09        perf-stat.ps.dTLB-loads
     80134 ± 10%     -63.5%      29283 ± 23%  perf-stat.ps.dTLB-store-misses
 7.295e+08            -3.8%  7.021e+08        perf-stat.ps.dTLB-stores
    129362            -3.7%     124535        perf-stat.ps.iTLB-load-misses
    143865            -5.2%     136338        perf-stat.ps.iTLB-loads
 2.381e+09            -3.5%  2.297e+09        perf-stat.ps.instructions
      3596           -31.2%       2473        perf-stat.ps.minor-faults
  49081949            +1.1%   49621463        perf-stat.ps.node-loads
  97795918            -3.1%   94724831        perf-stat.ps.node-stores
      3600           -31.2%       2477        perf-stat.ps.page-faults



***************************************************************************************************
lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite

commit: 
  30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
  1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")

30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    167.28 ±  5%     -13.1%     145.32 ±  6%  sched_debug.cfs_rq:/.util_est_enqueued.avg
      6845            -2.5%       6674        vmstat.system.cs
    351910 ±  2%     +40.2%     493341        meminfo.AnonHugePages
    505908           +27.2%     643328        meminfo.AnonPages
   1497656           +10.2%    1650453        meminfo.Inactive(anon)
     18957 ± 13%     +26.3%      23947 ± 17%  turbostat.C1
      1.52            -0.0        1.48        turbostat.C1E%
      3.32            -2.9%       3.23        turbostat.RAMWatt
     19978            -3.0%      19379        phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
    280.71            +3.3%     289.93        phoronix-test-suite.time.elapsed_time
    280.71            +3.3%     289.93        phoronix-test-suite.time.elapsed_time.max
    120465            +1.5%     122257        phoronix-test-suite.time.maximum_resident_set_size
    281047           -54.7%     127190        phoronix-test-suite.time.minor_page_faults
    257.03            +3.5%     265.95        phoronix-test-suite.time.user_time
    126473           +27.2%     160831        proc-vmstat.nr_anon_pages
    171.83 ±  2%     +40.2%     240.89        proc-vmstat.nr_anon_transparent_hugepages
    355973            -1.0%     352304        proc-vmstat.nr_dirty_background_threshold
    712818            -1.0%     705471        proc-vmstat.nr_dirty_threshold
   3265800            -1.1%    3228879        proc-vmstat.nr_free_pages
    374410           +10.2%     412613        proc-vmstat.nr_inactive_anon
      1770            +4.4%       1848        proc-vmstat.nr_page_table_pages
    374410           +10.2%     412613        proc-vmstat.nr_zone_inactive_anon
    852082           -34.9%     555093        proc-vmstat.numa_hit
    852125           -34.9%     555018        proc-vmstat.numa_local
   1078293           -26.6%     791038        proc-vmstat.pgfault
    112693            +2.9%     116004        proc-vmstat.pgreuse
      9025            +7.6%       9713        proc-vmstat.thp_fault_alloc
      3.63 ±  6%      +0.6        4.25 ±  9%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      0.25 ± 55%      -0.2        0.08 ± 68%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.25 ± 55%      -0.2        0.08 ± 68%  perf-profile.children.cycles-pp.ret_from_fork
      0.23 ± 56%      -0.2        0.07 ± 69%  perf-profile.children.cycles-pp.kthread
      0.14 ± 36%      -0.1        0.05 ±120%  perf-profile.children.cycles-pp.do_anonymous_page
      0.14 ± 35%      -0.1        0.05 ± 76%  perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string
      0.04 ± 72%      +0.0        0.08 ± 19%  perf-profile.children.cycles-pp.try_to_wake_up
      0.04 ±118%      +0.1        0.10 ± 36%  perf-profile.children.cycles-pp.update_rq_clock
      0.07 ± 79%      +0.1        0.17 ± 21%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      7.99 ± 11%      +1.0        9.02 ±  5%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.23 ± 28%      -0.1        0.14 ± 49%  perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
      0.14 ± 35%      -0.1        0.05 ± 76%  perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string
      0.06 ± 79%      +0.1        0.16 ± 21%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.21 ± 34%      +0.2        0.36 ± 18%  perf-profile.self.cycles-pp.ktime_get
 1.187e+08            -4.6%  1.133e+08        perf-stat.i.branch-instructions
      3.36            +0.1        3.42        perf-stat.i.branch-miss-rate%
   5492420            -3.9%    5275592        perf-stat.i.branch-misses
 4.148e+08            -2.8%  4.034e+08        perf-stat.i.cache-misses
 5.251e+08            -2.6%  5.114e+08        perf-stat.i.cache-references
      6880            -2.5%       6711        perf-stat.i.context-switches
      4.30            +2.9%       4.43        perf-stat.i.cpi
      0.10 ±  7%      -0.0        0.09 ±  2%  perf-stat.i.dTLB-load-miss-rate%
    472268 ±  6%     -19.9%     378489        perf-stat.i.dTLB-load-misses
 8.107e+08            -3.4%  7.831e+08        perf-stat.i.dTLB-loads
      0.02 ± 16%      -0.0        0.01 ±  2%  perf-stat.i.dTLB-store-miss-rate%
     90535 ± 11%     -59.8%      36371 ±  2%  perf-stat.i.dTLB-store-misses
 5.323e+08            -3.3%  5.145e+08        perf-stat.i.dTLB-stores
    129981            -3.0%     126061        perf-stat.i.iTLB-load-misses
    143662            -3.1%     139223        perf-stat.i.iTLB-loads
 2.253e+09            -3.6%  2.172e+09        perf-stat.i.instructions
      0.26            -3.2%       0.25        perf-stat.i.ipc
      4.71 ±  2%      -6.4%       4.41 ±  2%  perf-stat.i.major-faults
    180.03            -3.0%     174.57        perf-stat.i.metric.M/sec
      3627           -30.8%       2510 ±  2%  perf-stat.i.minor-faults
      3632           -30.8%       2514 ±  2%  perf-stat.i.page-faults
      3.88            +3.6%       4.02        perf-stat.overall.cpi
     21.08            +2.7%      21.65        perf-stat.overall.cycles-between-cache-misses
      0.06 ±  6%      -0.0        0.05        perf-stat.overall.dTLB-load-miss-rate%
      0.02 ± 11%      -0.0        0.01 ±  2%  perf-stat.overall.dTLB-store-miss-rate%
      0.26            -3.5%       0.25        perf-stat.overall.ipc
 1.182e+08            -4.6%  1.128e+08        perf-stat.ps.branch-instructions
   5468166            -4.0%    5251939        perf-stat.ps.branch-misses
 4.135e+08            -2.7%  4.021e+08        perf-stat.ps.cache-misses
 5.234e+08            -2.6%  5.098e+08        perf-stat.ps.cache-references
      6859            -2.5%       6685        perf-stat.ps.context-switches
    470567 ±  6%     -19.9%     377127        perf-stat.ps.dTLB-load-misses
 8.079e+08            -3.4%  7.805e+08        perf-stat.ps.dTLB-loads
     90221 ± 11%     -59.8%      36239 ±  2%  perf-stat.ps.dTLB-store-misses
 5.305e+08            -3.3%  5.128e+08        perf-stat.ps.dTLB-stores
    129499            -3.0%     125601        perf-stat.ps.iTLB-load-misses
    143121            -3.1%     138638        perf-stat.ps.iTLB-loads
 2.246e+09            -3.6%  2.165e+09        perf-stat.ps.instructions
      4.69 ±  2%      -6.3%       4.39 ±  2%  perf-stat.ps.major-faults
      3613           -30.8%       2500 ±  2%  perf-stat.ps.minor-faults
      3617           -30.8%       2504 ±  2%  perf-stat.ps.page-faults





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-19 15:41 [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression kernel test robot
@ 2023-12-20  5:27 ` Yang Shi
  2023-12-20  8:29   ` Yin Fengwei
  0 siblings, 1 reply; 24+ messages in thread
From: Yang Shi @ 2023-12-20  5:27 UTC (permalink / raw)
  To: kernel test robot
  Cc: Rik van Riel, oe-lkp, lkp, Linux Memory Management List,
	Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang,
	feng.tang, fengwei.yin

On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> for this commit, we reported
> "[mm]  96db82a66d:  will-it-scale.per_process_ops -95.3% regression"
> in Aug, 2022 when it's in linux-next/master
> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
>
> later, we reported
> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
> in Oct, 2022 when it's in linus/master
> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
>
> and the commit was reverted finally by
> commit 0ba09b1733878afe838fe35c310715fda3d46428
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Sun Dec 4 12:51:59 2022 -0800
>
> now we noticed it goes into linux-next/master again.
>
> we are not sure if there is an agreement that the benefit of this commit
> has already overweight performance drop in some mirco benchmark.
>
> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
> that
> "This patch was applied to v6.1, but was reverted due to a regression
> report.  However it turned out the regression was not due to this patch.
> I ping'ed Andrew to reapply this patch, Andrew may forget it.  This
> patch helps promote THP, so I rebased it onto the latest mm-unstable."

IIRC, Huang Ying's analysis showed the regression for will-it-scale
micro benchmark is fine, it was actually reverted due to kernel build
regression with LLVM reported by Nathan Chancellor. Then the
regression was resolved by commit
81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
if page in deferred queue already"). And this patch did improve kernel
build with GCC by ~3% if I remember correctly.

>
> however, unfortunately, in our latest tests, we still observed below regression
> upon this commit. just FYI.
>
>
>
> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:

Interesting, wasn't the same regression seen last time? And I'm a
little bit confused about how pthread got regressed. I didn't see the
pthread benchmark do any intensive memory alloc/free operations. Do
the pthread APIs do any intensive memory operations? I saw the
benchmark does allocate memory for thread stack, but it should be just
8K per thread, so it should not trigger what this patch does. With
1024 threads, the thread stacks may get merged into one single VMA (8M
total), but it may do so even though the patch is not applied.

>
>
> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: stress-ng
> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
> parameters:
>
>         nr_threads: 1
>         disk: 1HDD
>         testtime: 60s
>         fs: ext4
>         class: os
>         test: pthread
>         cpufreq_governor: performance
>
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+-----------------------------------------------------------------------------------------------+
> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression                                         |
> | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory   |
> | test parameters  | array_size=50000000                                                                           |
> |                  | cpufreq_governor=performance                                                                  |
> |                  | iterations=10x                                                                                |
> |                  | loop=100                                                                                      |
> |                  | nr_threads=25%                                                                                |
> |                  | omp=true                                                                                      |
> +------------------+-----------------------------------------------------------------------------------------------+
> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression       |
> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
> | test parameters  | cpufreq_governor=performance                                                                  |
> |                  | option_a=Average                                                                              |
> |                  | option_b=Integer                                                                              |
> |                  | test=ramspeed-1.4.3                                                                           |
> +------------------+-----------------------------------------------------------------------------------------------+
> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
> | test parameters  | cpufreq_governor=performance                                                                  |
> |                  | option_a=Average                                                                              |
> |                  | option_b=Floating Point                                                                       |
> |                  | test=ramspeed-1.4.3                                                                           |
> +------------------+-----------------------------------------------------------------------------------------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
>
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
>
> commit:
>   30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>   1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>   13405796           -65.5%    4620124        cpuidle..usage
>       8.00            +8.2%       8.66 ą  2%  iostat.cpu.system
>       1.61           -60.6%       0.63        iostat.cpu.user
>     597.50 ą 14%     -64.3%     213.50 ą 14%  perf-c2c.DRAM.local
>       1882 ą 14%     -74.7%     476.83 ą  7%  perf-c2c.HITM.local
>    3768436           -12.9%    3283395        vmstat.memory.cache
>     355105           -75.7%      86344 ą  3%  vmstat.system.cs
>     385435           -20.7%     305714 ą  3%  vmstat.system.in
>       1.13            -0.2        0.88        mpstat.cpu.all.irq%
>       0.29            -0.2        0.10 ą  2%  mpstat.cpu.all.soft%
>       6.76 ą  2%      +1.1        7.88 ą  2%  mpstat.cpu.all.sys%
>       1.62            -1.0        0.62 ą  2%  mpstat.cpu.all.usr%
>    2234397           -84.3%     350161 ą  5%  stress-ng.pthread.ops
>      37237           -84.3%       5834 ą  5%  stress-ng.pthread.ops_per_sec
>     294706 ą  2%     -68.0%      94191 ą  6%  stress-ng.time.involuntary_context_switches
>      41442 ą  2%   +5023.4%    2123284        stress-ng.time.maximum_resident_set_size
>    4466457           -83.9%     717053 ą  5%  stress-ng.time.minor_page_faults

The larger RSS and fewer page faults are expected.

>     243.33           +13.5%     276.17 ą  3%  stress-ng.time.percent_of_cpu_this_job_got
>     131.64           +27.7%     168.11 ą  3%  stress-ng.time.system_time
>      19.73           -82.1%       3.53 ą  4%  stress-ng.time.user_time

Much less user time. And it seems to match the drop of the pthread metric.

>    7715609           -80.2%    1530125 ą  4%  stress-ng.time.voluntary_context_switches
>     494566           -59.5%     200338 ą  3%  meminfo.Active
>     478287           -61.5%     184050 ą  3%  meminfo.Active(anon)
>      58549 ą 17%   +1532.8%     956006 ą 14%  meminfo.AnonHugePages
>     424631          +194.9%    1252445 ą 10%  meminfo.AnonPages
>    3677263           -13.0%    3197755        meminfo.Cached
>    5829485 ą  4%     -19.0%    4724784 ą 10%  meminfo.Committed_AS
>     692486          +108.6%    1444669 ą  8%  meminfo.Inactive
>     662179          +113.6%    1414338 ą  9%  meminfo.Inactive(anon)
>     182416           -50.2%      90759        meminfo.Mapped
>    4614466           +10.0%    5076604 ą  2%  meminfo.Memused
>       6985           +47.6%      10307 ą  4%  meminfo.PageTables
>     718445           -66.7%     238913 ą  3%  meminfo.Shmem
>      35906           -20.7%      28471 ą  3%  meminfo.VmallocUsed
>    4838522           +25.6%    6075302        meminfo.max_used_kB
>     488.83           -20.9%     386.67 ą  2%  turbostat.Avg_MHz
>      12.95            -2.7       10.26 ą  2%  turbostat.Busy%
>    7156734           -87.2%     919149 ą  4%  turbostat.C1
>      10.59            -8.9        1.65 ą  5%  turbostat.C1%
>    3702647           -55.1%    1663518 ą  2%  turbostat.C1E
>      32.99           -20.6       12.36 ą  3%  turbostat.C1E%
>    1161078           +64.5%    1909611        turbostat.C6
>      44.25           +31.8       76.10        turbostat.C6%
>       0.18           -33.3%       0.12        turbostat.IPC
>   74338573 ą  2%     -33.9%   49159610 ą  4%  turbostat.IRQ
>    1381661           -91.0%     124075 ą  6%  turbostat.POLL
>       0.26            -0.2        0.04 ą 12%  turbostat.POLL%
>      96.15            -5.4%      90.95        turbostat.PkgWatt
>      12.12           +19.3%      14.46        turbostat.RAMWatt
>     119573           -61.5%      46012 ą  3%  proc-vmstat.nr_active_anon
>     106168          +195.8%     314047 ą 10%  proc-vmstat.nr_anon_pages
>      28.60 ą 17%   +1538.5%     468.68 ą 14%  proc-vmstat.nr_anon_transparent_hugepages
>     923365           -13.0%     803489        proc-vmstat.nr_file_pages
>     165571          +113.5%     353493 ą  9%  proc-vmstat.nr_inactive_anon
>      45605           -50.2%      22690        proc-vmstat.nr_mapped
>       1752           +47.1%       2578 ą  4%  proc-vmstat.nr_page_table_pages
>     179613           -66.7%      59728 ą  3%  proc-vmstat.nr_shmem
>      21490            -2.4%      20981        proc-vmstat.nr_slab_reclaimable
>      28260            -7.3%      26208        proc-vmstat.nr_slab_unreclaimable
>     119573           -61.5%      46012 ą  3%  proc-vmstat.nr_zone_active_anon
>     165570          +113.5%     353492 ą  9%  proc-vmstat.nr_zone_inactive_anon
>   17343640           -76.3%    4116748 ą  4%  proc-vmstat.numa_hit
>   17364975           -76.3%    4118098 ą  4%  proc-vmstat.numa_local
>     249252           -66.2%      84187 ą  2%  proc-vmstat.pgactivate
>   27528916          +567.1%  1.836e+08 ą  5%  proc-vmstat.pgalloc_normal
>    4912427           -79.2%    1019949 ą  3%  proc-vmstat.pgfault
>   27227124          +574.1%  1.835e+08 ą  5%  proc-vmstat.pgfree
>       8728         +3896.4%     348802 ą  5%  proc-vmstat.thp_deferred_split_page
>       8730         +3895.3%     348814 ą  5%  proc-vmstat.thp_fault_alloc
>       8728         +3896.4%     348802 ą  5%  proc-vmstat.thp_split_pmd
>     316745           -21.5%     248756 ą  4%  sched_debug.cfs_rq:/.avg_vruntime.avg
>     112735 ą  4%     -34.3%      74061 ą  6%  sched_debug.cfs_rq:/.avg_vruntime.min
>       0.49 ą  6%     -17.2%       0.41 ą  8%  sched_debug.cfs_rq:/.h_nr_running.stddev
>      12143 ą120%     -99.9%      15.70 ą116%  sched_debug.cfs_rq:/.left_vruntime.avg
>     414017 ą126%     -99.9%     428.50 ą102%  sched_debug.cfs_rq:/.left_vruntime.max
>      68492 ą125%     -99.9%      78.15 ą106%  sched_debug.cfs_rq:/.left_vruntime.stddev
>      41917 ą 24%     -48.3%      21690 ą 57%  sched_debug.cfs_rq:/.load.avg
>     176151 ą 30%     -56.9%      75963 ą 57%  sched_debug.cfs_rq:/.load.stddev
>       6489 ą 17%     -29.0%       4608 ą 12%  sched_debug.cfs_rq:/.load_avg.max
>       4.42 ą 45%     -81.1%       0.83 ą 74%  sched_debug.cfs_rq:/.load_avg.min
>       1112 ą 17%     -31.0%     767.62 ą 11%  sched_debug.cfs_rq:/.load_avg.stddev
>     316745           -21.5%     248756 ą  4%  sched_debug.cfs_rq:/.min_vruntime.avg
>     112735 ą  4%     -34.3%      74061 ą  6%  sched_debug.cfs_rq:/.min_vruntime.min
>       0.49 ą  6%     -17.2%       0.41 ą  8%  sched_debug.cfs_rq:/.nr_running.stddev
>      12144 ą120%     -99.9%      15.70 ą116%  sched_debug.cfs_rq:/.right_vruntime.avg
>     414017 ą126%     -99.9%     428.50 ą102%  sched_debug.cfs_rq:/.right_vruntime.max
>      68492 ą125%     -99.9%      78.15 ą106%  sched_debug.cfs_rq:/.right_vruntime.stddev
>      14.25 ą 44%     -76.6%       3.33 ą 58%  sched_debug.cfs_rq:/.runnable_avg.min
>      11.58 ą 49%     -77.7%       2.58 ą 58%  sched_debug.cfs_rq:/.util_avg.min
>     423972 ą 23%     +59.3%     675379 ą  3%  sched_debug.cpu.avg_idle.avg
>       5720 ą 43%    +439.5%      30864        sched_debug.cpu.avg_idle.min
>      99.79 ą  2%     -23.7%      76.11 ą  2%  sched_debug.cpu.clock_task.stddev
>     162475 ą 49%     -95.8%       6813 ą 26%  sched_debug.cpu.curr->pid.avg
>    1061268           -84.0%     170212 ą  4%  sched_debug.cpu.curr->pid.max
>     365404 ą 20%     -91.3%      31839 ą 10%  sched_debug.cpu.curr->pid.stddev
>       0.51 ą  3%     -20.1%       0.41 ą  9%  sched_debug.cpu.nr_running.stddev
>     311923           -74.2%      80615 ą  2%  sched_debug.cpu.nr_switches.avg
>     565973 ą  4%     -77.8%     125597 ą 10%  sched_debug.cpu.nr_switches.max
>     192666 ą  4%     -70.6%      56695 ą  6%  sched_debug.cpu.nr_switches.min
>      67485 ą  8%     -79.9%      13558 ą 10%  sched_debug.cpu.nr_switches.stddev
>       2.62          +102.1%       5.30        perf-stat.i.MPKI
>   2.09e+09           -47.6%  1.095e+09 ą  4%  perf-stat.i.branch-instructions
>       1.56            -0.5        1.01        perf-stat.i.branch-miss-rate%
>   31951200           -60.9%   12481432 ą  2%  perf-stat.i.branch-misses
>      19.38           +23.7       43.08        perf-stat.i.cache-miss-rate%
>   26413597            -5.7%   24899132 ą  4%  perf-stat.i.cache-misses
>  1.363e+08           -58.3%   56906133 ą  4%  perf-stat.i.cache-references
>     370628           -75.8%      89743 ą  3%  perf-stat.i.context-switches
>       1.77           +65.1%       2.92 ą  2%  perf-stat.i.cpi
>  1.748e+10           -21.8%  1.367e+10 ą  2%  perf-stat.i.cpu-cycles
>      61611           -79.1%      12901 ą  6%  perf-stat.i.cpu-migrations
>     716.97 ą  2%     -17.2%     593.35 ą  2%  perf-stat.i.cycles-between-cache-misses
>       0.12 ą  4%      -0.1        0.05        perf-stat.i.dTLB-load-miss-rate%
>    3066100 ą  3%     -81.3%     573066 ą  5%  perf-stat.i.dTLB-load-misses
>  2.652e+09           -50.1%  1.324e+09 ą  4%  perf-stat.i.dTLB-loads
>       0.08 ą  2%      -0.0        0.03        perf-stat.i.dTLB-store-miss-rate%
>    1168195 ą  2%     -82.9%     199438 ą  5%  perf-stat.i.dTLB-store-misses
>  1.478e+09           -56.8%  6.384e+08 ą  3%  perf-stat.i.dTLB-stores
>    8080423           -73.2%    2169371 ą  3%  perf-stat.i.iTLB-load-misses
>    5601321           -74.3%    1440571 ą  2%  perf-stat.i.iTLB-loads
>  1.028e+10           -49.7%  5.173e+09 ą  4%  perf-stat.i.instructions
>       1450           +73.1%       2511 ą  2%  perf-stat.i.instructions-per-iTLB-miss
>       0.61           -35.9%       0.39        perf-stat.i.ipc
>       0.48           -21.4%       0.38 ą  2%  perf-stat.i.metric.GHz
>     616.28           -17.6%     507.69 ą  4%  perf-stat.i.metric.K/sec
>     175.16           -50.8%      86.18 ą  4%  perf-stat.i.metric.M/sec
>      76728           -80.8%      14724 ą  4%  perf-stat.i.minor-faults
>    5600408           -61.4%    2160997 ą  5%  perf-stat.i.node-loads
>    8873996           +52.1%   13499744 ą  5%  perf-stat.i.node-stores
>     112409           -81.9%      20305 ą  4%  perf-stat.i.page-faults
>       2.55           +89.6%       4.83        perf-stat.overall.MPKI

Much more TLB misses.

>       1.51            -0.4        1.13        perf-stat.overall.branch-miss-rate%
>      19.26           +24.5       43.71        perf-stat.overall.cache-miss-rate%
>       1.70           +56.4%       2.65        perf-stat.overall.cpi
>     665.84           -17.5%     549.51 ą  2%  perf-stat.overall.cycles-between-cache-misses
>       0.12 ą  4%      -0.1        0.04        perf-stat.overall.dTLB-load-miss-rate%
>       0.08 ą  2%      -0.0        0.03        perf-stat.overall.dTLB-store-miss-rate%
>      59.16            +0.9       60.04        perf-stat.overall.iTLB-load-miss-rate%
>       1278           +86.1%       2379 ą  2%  perf-stat.overall.instructions-per-iTLB-miss
>       0.59           -36.1%       0.38        perf-stat.overall.ipc

Worse IPC and CPI.

>  2.078e+09           -48.3%  1.074e+09 ą  4%  perf-stat.ps.branch-instructions
>   31292687           -61.2%   12133349 ą  2%  perf-stat.ps.branch-misses
>   26057291            -5.9%   24512034 ą  4%  perf-stat.ps.cache-misses
>  1.353e+08           -58.6%   56072195 ą  4%  perf-stat.ps.cache-references
>     365254           -75.8%      88464 ą  3%  perf-stat.ps.context-switches
>  1.735e+10           -22.4%  1.346e+10 ą  2%  perf-stat.ps.cpu-cycles
>      60838           -79.1%      12727 ą  6%  perf-stat.ps.cpu-migrations
>    3056601 ą  4%     -81.5%     565354 ą  4%  perf-stat.ps.dTLB-load-misses
>  2.636e+09           -50.7%    1.3e+09 ą  4%  perf-stat.ps.dTLB-loads
>    1155253 ą  2%     -83.0%     196581 ą  5%  perf-stat.ps.dTLB-store-misses
>  1.473e+09           -57.4%  6.268e+08 ą  3%  perf-stat.ps.dTLB-stores
>    7997726           -73.3%    2131477 ą  3%  perf-stat.ps.iTLB-load-misses
>    5521346           -74.3%    1418623 ą  2%  perf-stat.ps.iTLB-loads
>  1.023e+10           -50.4%  5.073e+09 ą  4%  perf-stat.ps.instructions
>      75671           -80.9%      14479 ą  4%  perf-stat.ps.minor-faults
>    5549722           -61.4%    2141750 ą  4%  perf-stat.ps.node-loads
>    8769156           +51.6%   13296579 ą  5%  perf-stat.ps.node-stores
>     110795           -82.0%      19977 ą  4%  perf-stat.ps.page-faults
>  6.482e+11           -50.7%  3.197e+11 ą  4%  perf-stat.total.instructions
>       0.00 ą 37%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>       0.01 ą 18%   +8373.1%       0.73 ą 49%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>       0.01 ą 16%   +4600.0%       0.38 ą 24%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit

More time spent in madvise and munmap. but I'm not sure whether this
is caused by tearing down the address space when exiting the test. If
so it should not count in the regression.

>       0.01 ą204%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       0.01 ą  8%   +3678.9%       0.36 ą 79%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
>       0.01 ą 14%     -38.5%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>       0.01 ą  5%   +2946.2%       0.26 ą 43%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
>       0.00 ą 14%    +125.0%       0.01 ą 12%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.02 ą170%     -83.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00 ą 69%   +6578.6%       0.31 ą  4%  perf-sched.sch_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
>       0.00          +100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       0.02 ą 86%   +4234.4%       0.65 ą  4%  perf-sched.sch_delay.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>       0.01 ą  6%   +6054.3%       0.47        perf-sched.sch_delay.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
>       0.00 ą 14%    +195.2%       0.01 ą 89%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       0.00 ą102%    +340.0%       0.01 ą 85%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
>       0.00          +100.0%       0.00        perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
>       0.00 ą 11%     +66.7%       0.01 ą 21%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
>       0.01 ą 89%   +1096.1%       0.15 ą 30%  perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>       0.00          +141.7%       0.01 ą 61%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
>       0.00 ą223%   +9975.0%       0.07 ą203%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
>       0.00 ą 10%    +789.3%       0.04 ą 69%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
>       0.00 ą 31%   +6691.3%       0.26 ą  5%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
>       0.00 ą 28%  +14612.5%       0.59 ą  4%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
>       0.00 ą 24%   +4904.2%       0.20 ą  4%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
>       0.00 ą 28%    +450.0%       0.01 ą 74%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
>       0.00 ą 17%    +984.6%       0.02 ą 79%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>       0.00 ą 20%    +231.8%       0.01 ą 89%  perf-sched.sch_delay.avg.ms.schedule_timeout.io_schedule_timeout.__wait_for_common.submit_bio_wait
>       0.00          +350.0%       0.01 ą 16%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       0.02 ą 16%    +320.2%       0.07 ą  2%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.02 ą  2%    +282.1%       0.09 ą  5%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       0.00 ą 14%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>       0.05 ą 35%   +3784.5%       1.92 ą 16%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>       0.29 ą128%    +563.3%       1.92 ą  7%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
>       0.14 ą217%     -99.7%       0.00 ą223%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
>       0.03 ą 49%     -74.0%       0.01 ą 51%  perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>       0.01 ą 54%     -57.4%       0.00 ą 75%  perf-sched.sch_delay.max.ms.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
>       0.12 ą 21%    +873.0%       1.19 ą 60%  perf-sched.sch_delay.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
>       2.27 ą220%     -99.7%       0.01 ą 19%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>       0.02 ą 36%     -54.4%       0.01 ą 55%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
>       0.04 ą 36%     -77.1%       0.01 ą 31%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>       0.12 ą 32%   +1235.8%       1.58 ą 31%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
>       2.25 ą218%     -99.3%       0.02 ą 52%  perf-sched.sch_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.01 ą 85%  +19836.4%       2.56 ą  7%  perf-sched.sch_delay.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
>       0.03 ą 70%     -93.6%       0.00 ą223%  perf-sched.sch_delay.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
>       0.10 ą 16%   +2984.2%       3.21 ą  6%  perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
>       0.01 ą 20%    +883.9%       0.05 ą177%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       0.01 ą 15%    +694.7%       0.08 ą123%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
>       0.00 ą223%   +6966.7%       0.07 ą199%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
>       0.01 ą 38%   +8384.6%       0.55 ą 72%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>       0.01 ą 13%  +12995.7%       1.51 ą103%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>     117.80 ą 56%     -96.4%       4.26 ą 36%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       0.01 ą 68%    +331.9%       0.03        perf-sched.total_sch_delay.average.ms
>       4.14          +242.6%      14.20 ą  4%  perf-sched.total_wait_and_delay.average.ms
>     700841           -69.6%     212977 ą  3%  perf-sched.total_wait_and_delay.count.ms
>       4.14          +242.4%      14.16 ą  4%  perf-sched.total_wait_time.average.ms
>      11.68 ą  8%    +213.3%      36.59 ą 28%  perf-sched.wait_and_delay.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
>      10.00 ą  2%    +226.1%      32.62 ą 20%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
>      10.55 ą  3%    +259.8%      37.96 ą  7%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
>       9.80 ą 12%    +196.5%      29.07 ą 32%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
>       9.80 ą  4%    +234.9%      32.83 ą 14%  perf-sched.wait_and_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>      10.32 ą  2%    +223.8%      33.42 ą  6%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
>       8.15 ą 14%    +271.3%      30.25 ą 35%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>       9.60 ą  4%    +240.8%      32.73 ą 16%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
>      10.37 ą  4%    +232.0%      34.41 ą 10%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>       7.32 ą 46%    +269.7%      27.07 ą 49%  perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>       9.88          +236.2%      33.23 ą  4%  perf-sched.wait_and_delay.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
>       4.44 ą  4%    +379.0%      21.27 ą 18%  perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      10.05 ą  2%    +235.6%      33.73 ą 11%  perf-sched.wait_and_delay.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.03          +462.6%       0.15 ą  6%  perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.78 ą  4%    +482.1%      39.46 ą  3%  perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>       3.17          +683.3%      24.85 ą  8%  perf-sched.wait_and_delay.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
>      36.64 ą 13%    +244.7%     126.32 ą  6%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
>       9.81          +302.4%      39.47 ą  4%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
>       1.05           +48.2%       1.56        perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
>       0.93           +14.2%       1.06 ą  2%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
>       9.93          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
>      12.02 ą  3%    +139.8%      28.83 ą  6%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       6.09 ą  2%    +403.0%      30.64 ą  5%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      23.17 ą 19%     -83.5%       3.83 ą143%  perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio
>      79.83 ą  9%     -55.1%      35.83 ą 16%  perf-sched.wait_and_delay.count.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
>      14.83 ą 14%     -59.6%       6.00 ą 56%  perf-sched.wait_and_delay.count.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>       8.50 ą 17%     -80.4%       1.67 ą 89%  perf-sched.wait_and_delay.count.__cond_resched.dput.__ns_get_path.ns_get_path.proc_ns_get_link
>     114.00 ą 14%     -62.4%      42.83 ą 11%  perf-sched.wait_and_delay.count.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
>      94.67 ą  7%     -48.1%      49.17 ą 13%  perf-sched.wait_and_delay.count.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>      59.83 ą 13%     -76.0%      14.33 ą 48%  perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>     103.00 ą 12%     -48.1%      53.50 ą 20%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
>      19.33 ą 16%     -56.0%       8.50 ą 29%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>      68.17 ą 11%     -39.1%      41.50 ą 19%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>      36.67 ą 22%     -79.1%       7.67 ą 46%  perf-sched.wait_and_delay.count.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
>     465.50 ą  9%     -47.4%     244.83 ą 11%  perf-sched.wait_and_delay.count.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
>      14492 ą  3%     -96.3%     533.67 ą 10%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>     128.67 ą  7%     -53.5%      59.83 ą 10%  perf-sched.wait_and_delay.count.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.67 ą 34%     -80.4%       1.50 ą107%  perf-sched.wait_and_delay.count.__cond_resched.vunmap_p4d_range.__vunmap_range_noflush.remove_vm_area.vfree
>     147533           -81.0%      28023 ą  5%  perf-sched.wait_and_delay.count.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       4394 ą  4%     -78.5%     942.83 ą  7%  perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>     228791           -79.3%      47383 ą  4%  perf-sched.wait_and_delay.count.futex_wait_queue.__futex_wait.futex_wait.do_futex
>     368.50 ą  2%     -67.1%     121.33 ą  3%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
>     147506           -81.0%      28010 ą  5%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
>       5387 ą  6%     -16.7%       4488 ą  5%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
>       8303 ą  2%     -56.9%       3579 ą  5%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read_killable.lock_mm_and_find_vma
>      14.67 ą  7%    -100.0%       0.00        perf-sched.wait_and_delay.count.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
>     370.50 ą141%    +221.9%       1192 ą  5%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>      24395 ą  2%     -51.2%      11914 ą  6%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      31053 ą  2%     -80.5%       6047 ą  5%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      16.41 ą  2%    +342.7%      72.65 ą 29%  perf-sched.wait_and_delay.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
>      16.49 ą  3%    +463.3%      92.90 ą 27%  perf-sched.wait_and_delay.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
>      17.32 ą  5%    +520.9%     107.52 ą 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
>      15.38 ą  6%    +325.2%      65.41 ą 22%  perf-sched.wait_and_delay.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
>      16.73 ą  4%    +456.2%      93.04 ą 11%  perf-sched.wait_and_delay.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>      17.14 ą  3%    +510.6%     104.68 ą 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
>      15.70 ą  4%    +379.4%      75.25 ą 28%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>      15.70 ą  3%    +422.1%      81.97 ą 19%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
>      16.38          +528.4%     102.91 ą 21%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>      45.20 ą 48%    +166.0%     120.23 ą 27%  perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>      17.25          +495.5%     102.71 ą  2%  perf-sched.wait_and_delay.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
>     402.57 ą 15%     -52.8%     189.90 ą 14%  perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      16.96 ą  4%    +521.3%     105.40 ą 15%  perf-sched.wait_and_delay.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      28.45          +517.3%     175.65 ą 14%  perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      22.49          +628.5%     163.83 ą 16%  perf-sched.wait_and_delay.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
>      26.53 ą 30%    +326.9%     113.25 ą 16%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
>      15.54          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.ext4_lazyinit_thread.part.0.kthread
>       1.67 ą141%    +284.6%       6.44 ą  4%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>       0.07 ą 34%     -93.6%       0.00 ą105%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
>      10.21 ą 15%    +295.8%      40.43 ą 50%  perf-sched.wait_time.avg.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       3.89 ą 40%     -99.8%       0.01 ą113%  perf-sched.wait_time.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>      11.67 ą  8%    +213.5%      36.58 ą 28%  perf-sched.wait_time.avg.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
>       9.98 ą  2%    +226.8%      32.61 ą 20%  perf-sched.wait_time.avg.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
>       1.03           +71.2%       1.77 ą 20%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>       0.06 ą 79%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
>       0.05 ą 22%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
>       0.08 ą 82%     -98.2%       0.00 ą223%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      10.72 ą 10%    +166.9%      28.61 ą 29%  perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
>      10.53 ą  3%    +260.5%      37.95 ą  7%  perf-sched.wait_time.avg.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
>       9.80 ą 12%    +196.6%      29.06 ą 32%  perf-sched.wait_time.avg.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
>       9.80 ą  4%    +235.1%      32.82 ą 14%  perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>       9.50 ą 12%    +281.9%      36.27 ą 70%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>      10.31 ą  2%    +223.9%      33.40 ą  6%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
>       8.04 ą 15%    +276.1%      30.25 ą 35%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>       9.60 ą  4%    +240.9%      32.72 ą 16%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
>       0.06 ą 66%     -98.3%       0.00 ą223%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
>      10.36 ą  4%    +232.1%      34.41 ą 10%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>       0.08 ą 50%     -95.7%       0.00 ą100%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
>       0.01 ą 49%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
>       0.03 ą 73%     -87.4%       0.00 ą145%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
>       8.01 ą 25%    +238.0%      27.07 ą 49%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>       9.86          +237.0%      33.23 ą  4%  perf-sched.wait_time.avg.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
>       4.44 ą  4%    +379.2%      21.26 ą 18%  perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      10.03          +236.3%      33.73 ą 11%  perf-sched.wait_time.avg.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.97 ą  8%     -87.8%       0.12 ą221%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
>       0.02 ą 13%   +1846.8%       0.45 ą 11%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       1.01           +64.7%       1.66        perf-sched.wait_time.avg.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>       0.75 ą  4%    +852.1%       7.10 ą  5%  perf-sched.wait_time.avg.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       0.03          +462.6%       0.15 ą  6%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.24 ą  4%     +25.3%       0.30 ą  8%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
>       1.98 ą 15%    +595.7%      13.80 ą 90%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
>       2.78 ą 14%    +444.7%      15.12 ą 16%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
>       6.77 ą  4%    +483.0%      39.44 ą  3%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>       3.17          +684.7%      24.85 ą  8%  perf-sched.wait_time.avg.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
>      36.64 ą 13%    +244.7%     126.32 ą  6%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
>       9.79          +303.0%      39.45 ą  4%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
>       1.05           +23.8%       1.30        perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.do_madvise
>       0.86          +101.2%       1.73 ą  3%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.exit_mm
>       0.11 ą 21%    +438.9%       0.61 ą 15%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
>       0.32 ą  4%     +28.5%       0.41 ą 13%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>      12.00 ą  3%    +139.6%      28.76 ą  6%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       6.07 ą  2%    +403.5%      30.56 ą  5%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       0.38 ą 41%     -98.8%       0.00 ą105%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
>       0.36 ą 34%     -84.3%       0.06 ą200%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.vma_alloc_folio.do_anonymous_page
>       0.36 ą 51%     -92.9%       0.03 ą114%  perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
>      15.98 ą  5%    +361.7%      73.80 ą 23%  perf-sched.wait_time.max.ms.__cond_resched.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.51 ą 14%     -92.8%       0.04 ą196%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.__vmalloc_area_node.__vmalloc_node_range
>       8.56 ą 11%     -99.9%       0.01 ą126%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>       0.43 ą 32%     -68.2%       0.14 ą119%  perf-sched.wait_time.max.ms.__cond_resched.__kmem_cache_alloc_node.kmalloc_node_trace.__get_vm_area_node.__vmalloc_node_range
>       0.46 ą 20%     -89.3%       0.05 ą184%  perf-sched.wait_time.max.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct
>      16.40 ą  2%    +342.9%      72.65 ą 29%  perf-sched.wait_time.max.ms.__cond_resched.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file
>       0.31 ą 63%     -76.2%       0.07 ą169%  perf-sched.wait_time.max.ms.__cond_resched.cgroup_css_set_fork.cgroup_can_fork.copy_process.kernel_clone
>       0.14 ą 93%    +258.7%       0.49 ą 14%  perf-sched.wait_time.max.ms.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>      16.49 ą  3%    +463.5%      92.89 ą 27%  perf-sched.wait_time.max.ms.__cond_resched.dentry_kill.dput.__fput.__x64_sys_close
>       1.09          +171.0%       2.96 ą 10%  perf-sched.wait_time.max.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>       1.16 ą  7%    +155.1%       2.97 ą  4%  perf-sched.wait_time.max.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
>       0.19 ą 78%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.__split_vma.vma_modify.mprotect_fixup
>       0.33 ą 35%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_expand.mmap_region.do_mmap
>       0.20 ą101%     -99.3%       0.00 ą223%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      17.31 ą  5%    +521.0%     107.51 ą 14%  perf-sched.wait_time.max.ms.__cond_resched.dput.nd_jump_link.proc_ns_get_link.pick_link
>      15.38 ą  6%    +325.3%      65.40 ą 22%  perf-sched.wait_time.max.ms.__cond_resched.dput.pick_link.step_into.open_last_lookups
>      16.72 ą  4%    +456.6%      93.04 ą 11%  perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
>       1.16 ą  2%     +88.7%       2.20 ą 33%  perf-sched.wait_time.max.ms.__cond_resched.exit_signals.do_exit.__x64_sys_exit.do_syscall_64
>      53.96 ą 32%    +444.0%     293.53 ą109%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
>      17.13 ą  2%    +511.2%     104.68 ą 14%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.alloc_empty_file.path_openat.do_filp_open
>      15.69 ą  4%    +379.5%      75.25 ą 28%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.create_new_namespaces.__do_sys_setns.do_syscall_64
>      15.70 ą  3%    +422.2%      81.97 ą 19%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.getname_flags.part.0
>       0.27 ą 80%     -99.6%       0.00 ą223%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.mas_alloc_nodes.mas_preallocate.__split_vma
>      16.37          +528.6%     102.90 ą 21%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.security_file_alloc.init_file.alloc_empty_file
>       0.44 ą 33%     -99.1%       0.00 ą104%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_dup.__split_vma.vma_modify
>       0.02 ą 83%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
>       0.08 ą 83%     -95.4%       0.00 ą147%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node.dup_task_struct.copy_process.kernel_clone
>       1.16 ą  2%    +134.7%       2.72 ą 19%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
>      49.88 ą 25%    +141.0%     120.23 ą 27%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
>      17.24          +495.7%     102.70 ą  2%  perf-sched.wait_time.max.ms.__cond_resched.slab_pre_alloc_hook.constprop.0.kmem_cache_alloc_lru
>     402.56 ą 15%     -52.8%     189.89 ą 14%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>      16.96 ą  4%    +521.4%     105.39 ą 15%  perf-sched.wait_time.max.ms.__cond_resched.switch_task_namespaces.__do_sys_setns.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.06          +241.7%       3.61 ą  4%  perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
>       1.07           -88.9%       0.12 ą221%  perf-sched.wait_time.max.ms.__cond_resched.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
>       0.28 ą 27%    +499.0%       1.67 ą 18%  perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       1.21 ą  2%    +207.2%       3.71 ą  3%  perf-sched.wait_time.max.ms.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>      13.43 ą 26%     +38.8%      18.64        perf-sched.wait_time.max.ms.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>      28.45          +517.3%     175.65 ą 14%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.79 ą 10%     +62.2%       1.28 ą 25%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
>      13.22 ą  2%    +317.2%      55.16 ą 35%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_call_function
>     834.29 ą 28%     -48.5%     429.53 ą 94%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>      22.48          +628.6%     163.83 ą 16%  perf-sched.wait_time.max.ms.futex_wait_queue.__futex_wait.futex_wait.do_futex
>      22.74 ą 18%    +398.0%     113.25 ą 16%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_sigtimedwait.__x64_sys_rt_sigtimedwait.do_syscall_64
>       7.72 ą  7%     +80.6%      13.95 ą  2%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap
>       0.74 ą  4%     +77.2%       1.31 ą 32%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
>       5.01           +14.1%       5.72 ą  2%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
>      44.98           -19.7       25.32 ą  2%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
>      43.21           -19.6       23.65 ą  3%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
>      43.21           -19.6       23.65 ą  3%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>      43.18           -19.5       23.63 ą  3%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>      40.30           -17.5       22.75 ą  3%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>      41.10           -17.4       23.66 ą  2%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>      39.55           -17.3       22.24 ą  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
>      24.76 ą  2%      -8.5       16.23 ą  3%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       8.68 ą  4%      -6.5        2.22 ą  6%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>       7.23 ą  4%      -5.8        1.46 ą  8%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>       7.23 ą  4%      -5.8        1.46 ą  8%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.11 ą  4%      -5.7        1.39 ą  7%  perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.09 ą  4%      -5.7        1.39 ą  7%  perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.59 ą  3%      -5.1        1.47 ą  7%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
>       6.59 ą  3%      -5.1        1.47 ą  7%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
>       6.59 ą  3%      -5.1        1.47 ą  7%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
>       5.76 ą  2%      -5.0        0.80 ą  9%  perf-profile.calltrace.cycles-pp.start_thread
>       7.43 ą  2%      -4.9        2.52 ą  7%  perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       5.51 ą  3%      -4.8        0.70 ą  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.start_thread
>       5.50 ą  3%      -4.8        0.70 ą  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
>       5.48 ą  3%      -4.8        0.69 ą  7%  perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
>       5.42 ą  3%      -4.7        0.69 ą  7%  perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
>       5.90 ą  5%      -3.9        2.01 ą  4%  perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
>       4.18 ą  5%      -3.8        0.37 ą 71%  perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       5.76 ą  5%      -3.8        1.98 ą  4%  perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
>       5.04 ą  7%      -3.7        1.32 ą  9%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__clone
>       5.03 ą  7%      -3.7        1.32 ą  9%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
>       5.02 ą  7%      -3.7        1.32 ą  9%  perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
>       5.02 ą  7%      -3.7        1.32 ą  9%  perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
>       5.62 ą  5%      -3.7        1.96 ą  3%  perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
>       4.03 ą  4%      -3.1        0.92 ą  7%  perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       6.03 ą  5%      -3.1        2.94 ą  3%  perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>       3.43 ą  5%      -2.8        0.67 ą 13%  perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
>       3.43 ą  5%      -2.8        0.67 ą 13%  perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
>       3.41 ą  5%      -2.7        0.66 ą 13%  perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
>       3.40 ą  5%      -2.7        0.66 ą 13%  perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
>       3.67 ą  7%      -2.7        0.94 ą 10%  perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       2.92 ą  7%      -2.4        0.50 ą 46%  perf-profile.calltrace.cycles-pp.stress_pthread
>       2.54 ą  6%      -2.2        0.38 ą 70%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>       2.46 ą  6%      -1.8        0.63 ą 10%  perf-profile.calltrace.cycles-pp.dup_task_struct.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
>       3.00 ą  6%      -1.6        1.43 ą  7%  perf-profile.calltrace.cycles-pp.__munmap
>       2.96 ą  6%      -1.5        1.42 ą  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
>       2.96 ą  6%      -1.5        1.42 ą  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>       2.95 ą  6%      -1.5        1.41 ą  7%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>       2.95 ą  6%      -1.5        1.41 ą  7%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>       2.02 ą  4%      -1.5        0.52 ą 46%  perf-profile.calltrace.cycles-pp.__lll_lock_wait
>       1.78 ą  3%      -1.5        0.30 ą100%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
>       1.77 ą  3%      -1.5        0.30 ą100%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__lll_lock_wait
>       1.54 ą  6%      -1.3        0.26 ą100%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>       2.54 ą  6%      -1.2        1.38 ą  6%  perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       2.51 ą  6%      -1.1        1.37 ą  7%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
>       1.13            -0.7        0.40 ą 70%  perf-profile.calltrace.cycles-pp.exit_mm.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       1.15 ą  5%      -0.7        0.46 ą 45%  perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
>       1.58 ą  5%      -0.6        0.94 ą  7%  perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
>       0.99 ą  5%      -0.5        0.51 ą 45%  perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
>       1.01 ą  5%      -0.5        0.54 ą 45%  perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
>       0.82 ą  4%      -0.2        0.59 ą  5%  perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
>       0.00            +0.5        0.54 ą  5%  perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
>       0.00            +0.6        0.60 ą  5%  perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
>       0.00            +0.6        0.61 ą  6%  perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
>       0.00            +0.6        0.62 ą  6%  perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
>       0.53 ą  5%      +0.6        1.17 ą 13%  perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
>       1.94 ą  2%      +0.7        2.64 ą  9%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>       0.00            +0.7        0.73 ą  5%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
>       0.00            +0.8        0.75 ą 20%  perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>       2.02 ą  2%      +0.8        2.85 ą  9%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       0.74 ą  5%      +0.8        1.57 ą 11%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>       0.00            +0.9        0.90 ą  4%  perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
>       0.00            +0.9        0.92 ą 13%  perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues
>       0.86 ą  4%      +1.0        1.82 ą 10%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
>       0.86 ą  4%      +1.0        1.83 ą 10%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
>       0.00            +1.0        0.98 ą  7%  perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked
>       0.09 ą223%      +1.0        1.07 ą 11%  perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt
>       0.00            +1.0        0.99 ą  6%  perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd
>       0.00            +1.0        1.00 ą  7%  perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range
>       0.09 ą223%      +1.0        1.10 ą 12%  perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
>       0.00            +1.0        1.01 ą  6%  perf-profile.calltrace.cycles-pp.pmdp_invalidate.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
>       0.00            +1.1        1.10 ą  5%  perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath
>       0.00            +1.1        1.12 ą  5%  perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock
>       0.00            +1.2        1.23 ą  4%  perf-profile.calltrace.cycles-pp.page_add_anon_rmap.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range
>       0.00            +1.3        1.32 ą  4%  perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd
>       0.00            +1.4        1.38 ą  5%  perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
>       0.00            +2.4        2.44 ą 10%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range
>       0.00            +3.1        3.10 ą  5%  perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
>       0.00            +3.5        3.52 ą  5%  perf-profile.calltrace.cycles-pp.__split_huge_pmd_locked.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
>       0.88 ą  4%      +3.8        4.69 ą  4%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
>       6.30 ą  6%     +13.5       19.85 ą  7%  perf-profile.calltrace.cycles-pp.__clone
>       0.00           +16.7       16.69 ą  7%  perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>       1.19 ą 29%     +17.1       18.32 ą  7%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>       0.00           +17.6       17.56 ą  7%  perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
>       0.63 ą  7%     +17.7       18.35 ą  7%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__clone
>       0.59 ą  5%     +17.8       18.34 ą  7%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.__clone
>       0.59 ą  5%     +17.8       18.34 ą  7%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
>       0.00           +17.9       17.90 ą  7%  perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>       0.36 ą 71%     +18.0       18.33 ą  7%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.__clone
>       0.00           +32.0       32.03 ą  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range
>       0.00           +32.6       32.62 ą  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single
>       0.00           +36.2       36.19 ą  2%  perf-profile.calltrace.cycles-pp.__split_huge_pmd.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
>       7.97 ą  4%     +36.6       44.52 ą  2%  perf-profile.calltrace.cycles-pp.__madvise
>       7.91 ą  4%     +36.6       44.46 ą  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
>       7.90 ą  4%     +36.6       44.46 ą  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>       7.87 ą  4%     +36.6       44.44 ą  2%  perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>       7.86 ą  4%     +36.6       44.44 ą  2%  perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
>       7.32 ą  4%     +36.8       44.07 ą  2%  perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.25 ą  4%     +36.8       44.06 ą  2%  perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
>       1.04 ą  4%     +40.0       41.08 ą  2%  perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
>       1.00 ą  3%     +40.1       41.06 ą  2%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
>      44.98           -19.7       25.32 ą  2%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
>      44.98           -19.7       25.32 ą  2%  perf-profile.children.cycles-pp.cpu_startup_entry
>      44.96           -19.6       25.31 ą  2%  perf-profile.children.cycles-pp.do_idle
>      43.21           -19.6       23.65 ą  3%  perf-profile.children.cycles-pp.start_secondary
>      41.98           -17.6       24.40 ą  2%  perf-profile.children.cycles-pp.cpuidle_idle_call
>      41.21           -17.3       23.86 ą  2%  perf-profile.children.cycles-pp.cpuidle_enter
>      41.20           -17.3       23.86 ą  2%  perf-profile.children.cycles-pp.cpuidle_enter_state
>      12.69 ą  3%     -10.6        2.12 ą  6%  perf-profile.children.cycles-pp.do_exit
>      12.60 ą  3%     -10.5        2.08 ą  7%  perf-profile.children.cycles-pp.__x64_sys_exit
>      24.76 ą  2%      -8.5       16.31 ą  2%  perf-profile.children.cycles-pp.intel_idle
>      12.34 ą  2%      -8.4        3.90 ą  5%  perf-profile.children.cycles-pp.intel_idle_irq
>       6.96 ą  4%      -5.4        1.58 ą  7%  perf-profile.children.cycles-pp.ret_from_fork_asm
>       6.69 ą  4%      -5.2        1.51 ą  7%  perf-profile.children.cycles-pp.ret_from_fork
>       6.59 ą  3%      -5.1        1.47 ą  7%  perf-profile.children.cycles-pp.kthread
>       5.78 ą  2%      -5.0        0.80 ą  8%  perf-profile.children.cycles-pp.start_thread
>       4.68 ą  4%      -4.5        0.22 ą 10%  perf-profile.children.cycles-pp._raw_spin_lock_irq
>       5.03 ą  7%      -3.7        1.32 ą  9%  perf-profile.children.cycles-pp.__do_sys_clone
>       5.02 ą  7%      -3.7        1.32 ą  9%  perf-profile.children.cycles-pp.kernel_clone
>       4.20 ą  5%      -3.7        0.53 ą  9%  perf-profile.children.cycles-pp.exit_notify
>       4.67 ą  5%      -3.6        1.10 ą  9%  perf-profile.children.cycles-pp.rcu_core
>       4.60 ą  4%      -3.5        1.06 ą 10%  perf-profile.children.cycles-pp.rcu_do_batch
>       4.89 ą  5%      -3.4        1.44 ą 11%  perf-profile.children.cycles-pp.__do_softirq
>       5.64 ą  3%      -3.2        2.39 ą  6%  perf-profile.children.cycles-pp.__schedule
>       6.27 ą  5%      -3.2        3.03 ą  4%  perf-profile.children.cycles-pp.flush_tlb_mm_range
>       4.03 ą  4%      -3.1        0.92 ą  7%  perf-profile.children.cycles-pp.smpboot_thread_fn
>       6.68 ą  4%      -3.1        3.61 ą  3%  perf-profile.children.cycles-pp.tlb_finish_mmu
>       6.04 ą  5%      -3.1        2.99 ą  4%  perf-profile.children.cycles-pp.on_each_cpu_cond_mask
>       6.04 ą  5%      -3.0        2.99 ą  4%  perf-profile.children.cycles-pp.smp_call_function_many_cond
>       3.77 ą  2%      -3.0        0.73 ą 16%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       7.78            -3.0        4.77 ą  5%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>       3.43 ą  5%      -2.8        0.67 ą 13%  perf-profile.children.cycles-pp.run_ksoftirqd
>       3.67 ą  7%      -2.7        0.94 ą 10%  perf-profile.children.cycles-pp.copy_process
>       2.80 ą  6%      -2.5        0.34 ą 15%  perf-profile.children.cycles-pp.queued_write_lock_slowpath
>       3.41 ą  2%      -2.5        0.96 ą 16%  perf-profile.children.cycles-pp.do_futex
>       3.06 ą  5%      -2.4        0.68 ą 16%  perf-profile.children.cycles-pp.free_unref_page_commit
>       3.02 ą  5%      -2.4        0.67 ą 16%  perf-profile.children.cycles-pp.free_pcppages_bulk
>       2.92 ą  7%      -2.3        0.58 ą 14%  perf-profile.children.cycles-pp.stress_pthread
>       3.22 ą  3%      -2.3        0.90 ą 18%  perf-profile.children.cycles-pp.__x64_sys_futex
>       2.52 ą  5%      -2.2        0.35 ą  7%  perf-profile.children.cycles-pp.release_task
>       2.54 ą  6%      -2.0        0.53 ą 10%  perf-profile.children.cycles-pp.worker_thread
>       3.12 ą  5%      -1.9        1.17 ą 11%  perf-profile.children.cycles-pp.free_unref_page
>       2.31 ą  6%      -1.9        0.45 ą 11%  perf-profile.children.cycles-pp.process_one_work
>       2.47 ą  6%      -1.8        0.63 ą 10%  perf-profile.children.cycles-pp.dup_task_struct
>       2.19 ą  5%      -1.8        0.41 ą 12%  perf-profile.children.cycles-pp.delayed_vfree_work
>       2.14 ą  5%      -1.7        0.40 ą 11%  perf-profile.children.cycles-pp.vfree
>       3.19 ą  2%      -1.6        1.58 ą  8%  perf-profile.children.cycles-pp.schedule
>       2.06 ą  3%      -1.6        0.46 ą  7%  perf-profile.children.cycles-pp.__sigtimedwait
>       3.02 ą  6%      -1.6        1.44 ą  7%  perf-profile.children.cycles-pp.__munmap
>       1.94 ą  4%      -1.6        0.39 ą 14%  perf-profile.children.cycles-pp.__unfreeze_partials
>       2.95 ą  6%      -1.5        1.41 ą  7%  perf-profile.children.cycles-pp.__x64_sys_munmap
>       2.95 ą  6%      -1.5        1.41 ą  7%  perf-profile.children.cycles-pp.__vm_munmap
>       2.14 ą  3%      -1.5        0.60 ą 21%  perf-profile.children.cycles-pp.futex_wait
>       2.08 ą  4%      -1.5        0.60 ą 19%  perf-profile.children.cycles-pp.__lll_lock_wait
>       2.04 ą  3%      -1.5        0.56 ą 20%  perf-profile.children.cycles-pp.__futex_wait
>       1.77 ą  5%      -1.5        0.32 ą 10%  perf-profile.children.cycles-pp.remove_vm_area
>       1.86 ą  5%      -1.4        0.46 ą 10%  perf-profile.children.cycles-pp.open64
>       1.74 ą  4%      -1.4        0.37 ą  7%  perf-profile.children.cycles-pp.__x64_sys_rt_sigtimedwait
>       1.71 ą  4%      -1.4        0.36 ą  8%  perf-profile.children.cycles-pp.do_sigtimedwait
>       1.79 ą  5%      -1.3        0.46 ą  9%  perf-profile.children.cycles-pp.__x64_sys_openat
>       1.78 ą  5%      -1.3        0.46 ą  8%  perf-profile.children.cycles-pp.do_sys_openat2
>       1.61 ą  4%      -1.3        0.32 ą 12%  perf-profile.children.cycles-pp.poll_idle
>       1.65 ą  9%      -1.3        0.37 ą 14%  perf-profile.children.cycles-pp.pthread_create@@GLIBC_2.2.5
>       1.56 ą  8%      -1.2        0.35 ą  7%  perf-profile.children.cycles-pp.alloc_thread_stack_node
>       2.32 ą  3%      -1.2        1.13 ą  8%  perf-profile.children.cycles-pp.pick_next_task_fair
>       2.59 ą  6%      -1.2        1.40 ą  7%  perf-profile.children.cycles-pp.do_vmi_munmap
>       1.55 ą  4%      -1.2        0.40 ą 19%  perf-profile.children.cycles-pp.futex_wait_queue
>       1.37 ą  5%      -1.1        0.22 ą 12%  perf-profile.children.cycles-pp.find_unlink_vmap_area
>       2.52 ą  6%      -1.1        1.38 ą  6%  perf-profile.children.cycles-pp.do_vmi_align_munmap
>       1.53 ą  5%      -1.1        0.39 ą  8%  perf-profile.children.cycles-pp.do_filp_open
>       1.52 ą  5%      -1.1        0.39 ą  7%  perf-profile.children.cycles-pp.path_openat
>       1.25 ą  3%      -1.1        0.14 ą 12%  perf-profile.children.cycles-pp.sigpending
>       1.58 ą  5%      -1.1        0.50 ą  6%  perf-profile.children.cycles-pp.schedule_idle
>       1.29 ą  5%      -1.1        0.21 ą 21%  perf-profile.children.cycles-pp.__mprotect
>       1.40 ą  8%      -1.1        0.32 ą  4%  perf-profile.children.cycles-pp.__vmalloc_node_range
>       2.06 ą  3%      -1.0        1.02 ą  9%  perf-profile.children.cycles-pp.newidle_balance
>       1.04 ą  3%      -1.0        0.08 ą 23%  perf-profile.children.cycles-pp.__x64_sys_rt_sigpending
>       1.14 ą  6%      -1.0        0.18 ą 18%  perf-profile.children.cycles-pp.__x64_sys_mprotect
>       1.13 ą  6%      -1.0        0.18 ą 17%  perf-profile.children.cycles-pp.do_mprotect_pkey
>       1.30 ą  7%      -0.9        0.36 ą 10%  perf-profile.children.cycles-pp.wake_up_new_task
>       1.14 ą  9%      -0.9        0.22 ą 16%  perf-profile.children.cycles-pp.do_anonymous_page
>       0.95 ą  3%      -0.9        0.04 ą 71%  perf-profile.children.cycles-pp.do_sigpending
>       1.24 ą  3%      -0.9        0.34 ą  9%  perf-profile.children.cycles-pp.futex_wake
>       1.02 ą  6%      -0.9        0.14 ą 15%  perf-profile.children.cycles-pp.mprotect_fixup
>       1.91 ą  2%      -0.9        1.06 ą  9%  perf-profile.children.cycles-pp.load_balance
>       1.38 ą  5%      -0.8        0.53 ą  6%  perf-profile.children.cycles-pp.select_task_rq_fair
>       1.14 ą  4%      -0.8        0.31 ą 12%  perf-profile.children.cycles-pp.__pthread_mutex_unlock_usercnt
>       2.68 ą  3%      -0.8        1.91 ą  6%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
>       1.00 ą  4%      -0.7        0.26 ą 10%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
>       1.44 ą  3%      -0.7        0.73 ą 10%  perf-profile.children.cycles-pp.find_busiest_group
>       0.81 ą  6%      -0.7        0.10 ą 18%  perf-profile.children.cycles-pp.vma_modify
>       1.29 ą  3%      -0.7        0.60 ą  8%  perf-profile.children.cycles-pp.exit_mm
>       1.40 ą  3%      -0.7        0.71 ą 10%  perf-profile.children.cycles-pp.update_sd_lb_stats
>       0.78 ą  7%      -0.7        0.10 ą 19%  perf-profile.children.cycles-pp.__split_vma
>       0.90 ą  8%      -0.7        0.22 ą 10%  perf-profile.children.cycles-pp.__vmalloc_area_node
>       0.75 ą  4%      -0.7        0.10 ą  5%  perf-profile.children.cycles-pp.__exit_signal
>       1.49 ą  2%      -0.7        0.84 ą  7%  perf-profile.children.cycles-pp.try_to_wake_up
>       0.89 ą  7%      -0.6        0.24 ą 10%  perf-profile.children.cycles-pp.find_idlest_cpu
>       1.59 ą  5%      -0.6        0.95 ą  7%  perf-profile.children.cycles-pp.unmap_region
>       0.86 ą  3%      -0.6        0.22 ą 26%  perf-profile.children.cycles-pp.pthread_cond_timedwait@@GLIBC_2.3.2
>       1.59 ą  3%      -0.6        0.95 ą  9%  perf-profile.children.cycles-pp.irq_exit_rcu
>       1.24 ą  3%      -0.6        0.61 ą 10%  perf-profile.children.cycles-pp.update_sg_lb_stats
>       0.94 ą  5%      -0.6        0.32 ą 11%  perf-profile.children.cycles-pp.do_task_dead
>       0.87 ą  3%      -0.6        0.25 ą 19%  perf-profile.children.cycles-pp.perf_iterate_sb
>       0.82 ą  4%      -0.6        0.22 ą 10%  perf-profile.children.cycles-pp.sched_ttwu_pending
>       1.14 ą  3%      -0.6        0.54 ą 10%  perf-profile.children.cycles-pp.activate_task
>       0.84            -0.6        0.25 ą 10%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       0.81 ą  6%      -0.6        0.22 ą 11%  perf-profile.children.cycles-pp.find_idlest_group
>       0.75 ą  5%      -0.6        0.18 ą 14%  perf-profile.children.cycles-pp.step_into
>       0.74 ą  8%      -0.6        0.18 ą 14%  perf-profile.children.cycles-pp.__alloc_pages_bulk
>       0.74 ą  6%      -0.5        0.19 ą 11%  perf-profile.children.cycles-pp.update_sg_wakeup_stats
>       0.72 ą  5%      -0.5        0.18 ą 15%  perf-profile.children.cycles-pp.pick_link
>       1.06 ą  2%      -0.5        0.52 ą  9%  perf-profile.children.cycles-pp.enqueue_task_fair
>       0.77 ą  6%      -0.5        0.23 ą 12%  perf-profile.children.cycles-pp.unmap_vmas
>       0.76 ą  2%      -0.5        0.22 ą  8%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>       0.94 ą  2%      -0.5        0.42 ą 10%  perf-profile.children.cycles-pp.dequeue_task_fair
>       0.65 ą  5%      -0.5        0.15 ą 18%  perf-profile.children.cycles-pp.open_last_lookups
>       1.37 ą  3%      -0.5        0.87 ą  4%  perf-profile.children.cycles-pp.llist_add_batch
>       0.70 ą  4%      -0.5        0.22 ą 19%  perf-profile.children.cycles-pp.memcpy_orig
>       0.91 ą  4%      -0.5        0.44 ą  7%  perf-profile.children.cycles-pp.update_load_avg
>       0.67            -0.5        0.20 ą  8%  perf-profile.children.cycles-pp.switch_fpu_return
>       0.88 ą  3%      -0.5        0.42 ą  8%  perf-profile.children.cycles-pp.enqueue_entity
>       0.91 ą  4%      -0.5        0.45 ą 12%  perf-profile.children.cycles-pp.ttwu_do_activate
>       0.77 ą  4%      -0.5        0.32 ą 10%  perf-profile.children.cycles-pp.schedule_hrtimeout_range_clock
>       0.63 ą  5%      -0.4        0.20 ą 21%  perf-profile.children.cycles-pp.arch_dup_task_struct
>       0.74 ą  3%      -0.4        0.32 ą 15%  perf-profile.children.cycles-pp.dequeue_entity
>       0.62 ą  5%      -0.4        0.21 ą  5%  perf-profile.children.cycles-pp.finish_task_switch
>       0.56            -0.4        0.16 ą  7%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
>       0.53 ą  4%      -0.4        0.13 ą  9%  perf-profile.children.cycles-pp.syscall
>       0.50 ą  9%      -0.4        0.11 ą 18%  perf-profile.children.cycles-pp.__get_vm_area_node
>       0.51 ą  3%      -0.4        0.12 ą 12%  perf-profile.children.cycles-pp.__slab_free
>       0.52 ą  2%      -0.4        0.14 ą 10%  perf-profile.children.cycles-pp.kmem_cache_free
>       0.75 ą  3%      -0.4        0.37 ą  9%  perf-profile.children.cycles-pp.exit_mm_release
>       0.50 ą  6%      -0.4        0.12 ą 21%  perf-profile.children.cycles-pp.do_send_specific
>       0.74 ą  3%      -0.4        0.37 ą  8%  perf-profile.children.cycles-pp.futex_exit_release
>       0.45 ą 10%      -0.4        0.09 ą 17%  perf-profile.children.cycles-pp.alloc_vmap_area
>       0.47 ą  3%      -0.4        0.11 ą 20%  perf-profile.children.cycles-pp.tgkill
>       0.68 ą 11%      -0.4        0.32 ą 12%  perf-profile.children.cycles-pp.__mmap
>       0.48 ą  3%      -0.4        0.13 ą  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64
>       0.76 ą  5%      -0.3        0.41 ą 10%  perf-profile.children.cycles-pp.wake_up_q
>       0.42 ą  7%      -0.3        0.08 ą 22%  perf-profile.children.cycles-pp.__close
>       0.49 ą  7%      -0.3        0.14 ą 25%  perf-profile.children.cycles-pp.kmem_cache_alloc
>       0.49 ą  9%      -0.3        0.15 ą 14%  perf-profile.children.cycles-pp.mas_store_gfp
>       0.46 ą  4%      -0.3        0.12 ą 23%  perf-profile.children.cycles-pp.perf_event_task_output
>       0.44 ą 10%      -0.3        0.10 ą 28%  perf-profile.children.cycles-pp.pthread_sigqueue
>       0.46 ą  4%      -0.3        0.12 ą 15%  perf-profile.children.cycles-pp.link_path_walk
>       0.42 ą  8%      -0.3        0.10 ą 20%  perf-profile.children.cycles-pp.proc_ns_get_link
>       0.63 ą 10%      -0.3        0.32 ą 12%  perf-profile.children.cycles-pp.vm_mmap_pgoff
>       0.45 ą  4%      -0.3        0.14 ą 13%  perf-profile.children.cycles-pp.sched_move_task
>       0.36 ą  8%      -0.3        0.06 ą 49%  perf-profile.children.cycles-pp.__x64_sys_close
>       0.46 ą  8%      -0.3        0.17 ą 14%  perf-profile.children.cycles-pp.prctl
>       0.65 ą  3%      -0.3        0.35 ą  7%  perf-profile.children.cycles-pp.futex_cleanup
>       0.42 ą  7%      -0.3        0.12 ą 15%  perf-profile.children.cycles-pp.mas_store_prealloc
>       0.49 ą  5%      -0.3        0.20 ą 13%  perf-profile.children.cycles-pp.__rmqueue_pcplist
>       0.37 ą  7%      -0.3        0.08 ą 16%  perf-profile.children.cycles-pp.do_tkill
>       0.36 ą 10%      -0.3        0.08 ą 20%  perf-profile.children.cycles-pp.ns_get_path
>       0.37 ą  4%      -0.3        0.09 ą 18%  perf-profile.children.cycles-pp.setns
>       0.67 ą  3%      -0.3        0.41 ą  8%  perf-profile.children.cycles-pp.hrtimer_wakeup
>       0.35 ą  5%      -0.3        0.10 ą 16%  perf-profile.children.cycles-pp.__task_pid_nr_ns
>       0.41 ą  5%      -0.3        0.16 ą 12%  perf-profile.children.cycles-pp.mas_wr_bnode
>       0.35 ą  4%      -0.3        0.10 ą 20%  perf-profile.children.cycles-pp.rcu_cblist_dequeue
>       0.37 ą  5%      -0.2        0.12 ą 17%  perf-profile.children.cycles-pp.exit_task_stack_account
>       0.56 ą  4%      -0.2        0.31 ą 12%  perf-profile.children.cycles-pp.select_task_rq
>       0.29 ą  6%      -0.2        0.05 ą 46%  perf-profile.children.cycles-pp.mas_wr_store_entry
>       0.34 ą  4%      -0.2        0.10 ą 27%  perf-profile.children.cycles-pp.perf_event_task
>       0.39 ą  9%      -0.2        0.15 ą 12%  perf-profile.children.cycles-pp.__switch_to_asm
>       0.35 ą  5%      -0.2        0.11 ą 11%  perf-profile.children.cycles-pp.account_kernel_stack
>       0.30 ą  7%      -0.2        0.06 ą 48%  perf-profile.children.cycles-pp.__ns_get_path
>       0.31 ą  9%      -0.2        0.07 ą 17%  perf-profile.children.cycles-pp.free_vmap_area_noflush
>       0.31 ą  5%      -0.2        0.08 ą 19%  perf-profile.children.cycles-pp.__do_sys_setns
>       0.33 ą  7%      -0.2        0.10 ą  7%  perf-profile.children.cycles-pp.__free_one_page
>       0.31 ą 11%      -0.2        0.08 ą 13%  perf-profile.children.cycles-pp.__pte_alloc
>       0.36 ą  6%      -0.2        0.13 ą 12%  perf-profile.children.cycles-pp.switch_mm_irqs_off
>       0.27 ą 12%      -0.2        0.05 ą 71%  perf-profile.children.cycles-pp.__fput
>       0.53 ą  9%      -0.2        0.31 ą 12%  perf-profile.children.cycles-pp.do_mmap
>       0.27 ą 12%      -0.2        0.05 ą 77%  perf-profile.children.cycles-pp.__x64_sys_rt_tgsigqueueinfo
>       0.28 ą  5%      -0.2        0.06 ą 50%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.34 ą 10%      -0.2        0.12 ą 29%  perf-profile.children.cycles-pp.futex_wait_setup
>       0.27 ą  6%      -0.2        0.06 ą 45%  perf-profile.children.cycles-pp.__x64_sys_tgkill
>       0.31 ą  7%      -0.2        0.11 ą 18%  perf-profile.children.cycles-pp.__switch_to
>       0.26 ą  8%      -0.2        0.06 ą 21%  perf-profile.children.cycles-pp.__call_rcu_common
>       0.33 ą  9%      -0.2        0.13 ą 18%  perf-profile.children.cycles-pp.__do_sys_prctl
>       0.28 ą  5%      -0.2        0.08 ą 17%  perf-profile.children.cycles-pp.mm_release
>       0.52 ą  2%      -0.2        0.32 ą  9%  perf-profile.children.cycles-pp.__get_user_8
>       0.24 ą 10%      -0.2        0.04 ą 72%  perf-profile.children.cycles-pp.dput
>       0.25 ą 14%      -0.2        0.05 ą 46%  perf-profile.children.cycles-pp.perf_event_mmap
>       0.24 ą  7%      -0.2        0.06 ą 50%  perf-profile.children.cycles-pp.mas_walk
>       0.28 ą  6%      -0.2        0.10 ą 24%  perf-profile.children.cycles-pp.rmqueue_bulk
>       0.23 ą 15%      -0.2        0.05 ą 46%  perf-profile.children.cycles-pp.perf_event_mmap_event
>       0.25 ą 15%      -0.2        0.08 ą 45%  perf-profile.children.cycles-pp.___slab_alloc
>       0.20 ą 14%      -0.2        0.03 ą100%  perf-profile.children.cycles-pp.lookup_fast
>       0.20 ą 10%      -0.2        0.04 ą 75%  perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
>       0.28 ą  7%      -0.2        0.12 ą 24%  perf-profile.children.cycles-pp.prepare_task_switch
>       0.22 ą 11%      -0.2        0.05 ą  8%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
>       0.63 ą  5%      -0.2        0.47 ą 12%  perf-profile.children.cycles-pp.llist_reverse_order
>       0.25 ą 11%      -0.2        0.09 ą 34%  perf-profile.children.cycles-pp.futex_q_lock
>       0.21 ą  6%      -0.2        0.06 ą 47%  perf-profile.children.cycles-pp.kmem_cache_alloc_node
>       0.18 ą 11%      -0.2        0.03 ą100%  perf-profile.children.cycles-pp.alloc_empty_file
>       0.19 ą  5%      -0.2        0.04 ą 71%  perf-profile.children.cycles-pp.__put_task_struct
>       0.19 ą 15%      -0.2        0.03 ą 70%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
>       0.24 ą  6%      -0.2        0.09 ą 20%  perf-profile.children.cycles-pp.___perf_sw_event
>       0.18 ą  7%      -0.2        0.03 ą100%  perf-profile.children.cycles-pp.perf_event_fork
>       0.19 ą 11%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.select_idle_core
>       0.30 ą 11%      -0.1        0.15 ą  7%  perf-profile.children.cycles-pp.pte_alloc_one
>       0.25 ą  6%      -0.1        0.11 ą 10%  perf-profile.children.cycles-pp.set_next_entity
>       0.20 ą 10%      -0.1        0.06 ą 49%  perf-profile.children.cycles-pp.__perf_event_header__init_id
>       0.18 ą 15%      -0.1        0.03 ą101%  perf-profile.children.cycles-pp.__radix_tree_lookup
>       0.22 ą 11%      -0.1        0.08 ą 21%  perf-profile.children.cycles-pp.mas_spanning_rebalance
>       0.20 ą  9%      -0.1        0.06 ą  9%  perf-profile.children.cycles-pp.stress_pthread_func
>       0.18 ą 12%      -0.1        0.04 ą 73%  perf-profile.children.cycles-pp.__getpid
>       0.16 ą 13%      -0.1        0.02 ą 99%  perf-profile.children.cycles-pp.walk_component
>       0.28 ą  5%      -0.1        0.15 ą 13%  perf-profile.children.cycles-pp.update_curr
>       0.25 ą  5%      -0.1        0.11 ą 22%  perf-profile.children.cycles-pp.balance_fair
>       0.16 ą  9%      -0.1        0.03 ą100%  perf-profile.children.cycles-pp.futex_wake_mark
>       0.16 ą 12%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.get_futex_key
>       0.17 ą  6%      -0.1        0.05 ą 47%  perf-profile.children.cycles-pp.memcg_account_kmem
>       0.25 ą 11%      -0.1        0.12 ą 11%  perf-profile.children.cycles-pp._find_next_bit
>       0.15 ą 13%      -0.1        0.02 ą 99%  perf-profile.children.cycles-pp.do_open
>       0.20 ą  8%      -0.1        0.08 ą 16%  perf-profile.children.cycles-pp.mas_rebalance
>       0.17 ą 13%      -0.1        0.05 ą 45%  perf-profile.children.cycles-pp.__memcg_kmem_charge_page
>       0.33 ą  6%      -0.1        0.21 ą 10%  perf-profile.children.cycles-pp.select_idle_sibling
>       0.14 ą 11%      -0.1        0.03 ą100%  perf-profile.children.cycles-pp.get_user_pages_fast
>       0.18 ą  7%      -0.1        0.07 ą 14%  perf-profile.children.cycles-pp.mas_alloc_nodes
>       0.14 ą 11%      -0.1        0.03 ą101%  perf-profile.children.cycles-pp.set_task_cpu
>       0.14 ą 12%      -0.1        0.03 ą101%  perf-profile.children.cycles-pp.vm_unmapped_area
>       0.38 ą  6%      -0.1        0.27 ą  7%  perf-profile.children.cycles-pp.native_sched_clock
>       0.16 ą 10%      -0.1        0.05 ą 47%  perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
>       0.36 ą  9%      -0.1        0.25 ą 12%  perf-profile.children.cycles-pp.mmap_region
>       0.23 ą  7%      -0.1        0.12 ą  9%  perf-profile.children.cycles-pp.available_idle_cpu
>       0.13 ą 11%      -0.1        0.02 ą 99%  perf-profile.children.cycles-pp.internal_get_user_pages_fast
>       0.16 ą 10%      -0.1        0.06 ą 18%  perf-profile.children.cycles-pp.get_unmapped_area
>       0.50 ą  7%      -0.1        0.40 ą  6%  perf-profile.children.cycles-pp.menu_select
>       0.24 ą  9%      -0.1        0.14 ą 13%  perf-profile.children.cycles-pp.rmqueue
>       0.17 ą 14%      -0.1        0.07 ą 26%  perf-profile.children.cycles-pp.perf_event_comm
>       0.17 ą 15%      -0.1        0.07 ą 23%  perf-profile.children.cycles-pp.perf_event_comm_event
>       0.17 ą 11%      -0.1        0.07 ą 14%  perf-profile.children.cycles-pp.pick_next_entity
>       0.13 ą 14%      -0.1        0.03 ą102%  perf-profile.children.cycles-pp.perf_output_begin
>       0.23 ą  6%      -0.1        0.13 ą 21%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
>       0.14 ą 18%      -0.1        0.04 ą 72%  perf-profile.children.cycles-pp.perf_event_comm_output
>       0.21 ą  9%      -0.1        0.12 ą  9%  perf-profile.children.cycles-pp.update_rq_clock
>       0.16 ą  8%      -0.1        0.06 ą 19%  perf-profile.children.cycles-pp.mas_split
>       0.13 ą 14%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
>       0.13 ą  6%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.syscall_return_via_sysret
>       0.13 ą  7%      -0.1        0.04 ą 72%  perf-profile.children.cycles-pp.mas_topiary_replace
>       0.14 ą  8%      -0.1        0.06 ą  9%  perf-profile.children.cycles-pp.mas_preallocate
>       0.16 ą 11%      -0.1        0.07 ą 18%  perf-profile.children.cycles-pp.__pick_eevdf
>       0.11 ą 14%      -0.1        0.02 ą 99%  perf-profile.children.cycles-pp.mas_empty_area_rev
>       0.25 ą  7%      -0.1        0.17 ą 10%  perf-profile.children.cycles-pp.select_idle_cpu
>       0.14 ą 12%      -0.1        0.06 ą 14%  perf-profile.children.cycles-pp.cpu_stopper_thread
>       0.14 ą 10%      -0.1        0.06 ą 13%  perf-profile.children.cycles-pp.active_load_balance_cpu_stop
>       0.14 ą 14%      -0.1        0.06 ą 11%  perf-profile.children.cycles-pp.os_xsave
>       0.18 ą  6%      -0.1        0.11 ą 14%  perf-profile.children.cycles-pp.idle_cpu
>       0.17 ą  4%      -0.1        0.10 ą 15%  perf-profile.children.cycles-pp.hrtimer_start_range_ns
>       0.11 ą 14%      -0.1        0.03 ą100%  perf-profile.children.cycles-pp.__pthread_mutex_lock
>       0.32 ą  5%      -0.1        0.25 ą  5%  perf-profile.children.cycles-pp.sched_clock
>       0.11 ą  6%      -0.1        0.03 ą 70%  perf-profile.children.cycles-pp.wakeup_preempt
>       0.23 ą  7%      -0.1        0.16 ą 13%  perf-profile.children.cycles-pp.update_rq_clock_task
>       0.13 ą  8%      -0.1        0.06 ą 16%  perf-profile.children.cycles-pp.local_clock_noinstr
>       0.11 ą 10%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
>       0.34 ą  4%      -0.1        0.27 ą  6%  perf-profile.children.cycles-pp.sched_clock_cpu
>       0.11 ą  9%      -0.1        0.04 ą 76%  perf-profile.children.cycles-pp.avg_vruntime
>       0.15 ą  8%      -0.1        0.08 ą 14%  perf-profile.children.cycles-pp.update_cfs_group
>       0.10 ą  8%      -0.1        0.04 ą 71%  perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
>       0.13 ą  8%      -0.1        0.06 ą 11%  perf-profile.children.cycles-pp.sched_use_asym_prio
>       0.09 ą 12%      -0.1        0.02 ą 99%  perf-profile.children.cycles-pp.getname_flags
>       0.18 ą  9%      -0.1        0.12 ą 12%  perf-profile.children.cycles-pp.__update_load_avg_se
>       0.11 ą  8%      -0.1        0.05 ą 46%  perf-profile.children.cycles-pp.place_entity
>       0.08 ą 12%      -0.0        0.02 ą 99%  perf-profile.children.cycles-pp.folio_add_lru_vma
>       0.10 ą  7%      -0.0        0.05 ą 46%  perf-profile.children.cycles-pp._find_next_and_bit
>       0.10 ą  6%      -0.0        0.06 ą 24%  perf-profile.children.cycles-pp.reweight_entity
>       0.03 ą 70%      +0.0        0.08 ą 14%  perf-profile.children.cycles-pp.perf_rotate_context
>       0.19 ą 10%      +0.1        0.25 ą  7%  perf-profile.children.cycles-pp.irqtime_account_irq
>       0.08 ą 11%      +0.1        0.14 ą 21%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
>       0.00            +0.1        0.06 ą 14%  perf-profile.children.cycles-pp.rcu_pending
>       0.10 ą 17%      +0.1        0.16 ą 13%  perf-profile.children.cycles-pp.rebalance_domains
>       0.14 ą 16%      +0.1        0.21 ą 12%  perf-profile.children.cycles-pp.downgrade_write
>       0.14 ą 14%      +0.1        0.21 ą 10%  perf-profile.children.cycles-pp.down_read_killable
>       0.00            +0.1        0.07 ą 11%  perf-profile.children.cycles-pp.free_tail_page_prepare
>       0.02 ą141%      +0.1        0.09 ą 20%  perf-profile.children.cycles-pp.rcu_sched_clock_irq
>       0.01 ą223%      +0.1        0.08 ą 25%  perf-profile.children.cycles-pp.arch_scale_freq_tick
>       0.55 ą  9%      +0.1        0.62 ą  9%  perf-profile.children.cycles-pp.__alloc_pages
>       0.34 ą  5%      +0.1        0.41 ą  9%  perf-profile.children.cycles-pp.clock_nanosleep
>       0.00            +0.1        0.08 ą 23%  perf-profile.children.cycles-pp.tick_nohz_next_event
>       0.70 ą  2%      +0.1        0.78 ą  5%  perf-profile.children.cycles-pp.flush_tlb_func
>       0.14 ą 10%      +0.1        0.23 ą 13%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
>       0.07 ą 19%      +0.1        0.17 ą 17%  perf-profile.children.cycles-pp.cgroup_rstat_updated
>       0.04 ą 71%      +0.1        0.14 ą 11%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
>       0.25 ą  9%      +0.1        0.38 ą 11%  perf-profile.children.cycles-pp.down_read
>       0.43 ą  9%      +0.1        0.56 ą 10%  perf-profile.children.cycles-pp.get_page_from_freelist
>       0.00            +0.1        0.15 ą  6%  perf-profile.children.cycles-pp.vm_normal_page
>       0.31 ą  7%      +0.2        0.46 ą  9%  perf-profile.children.cycles-pp.native_flush_tlb_local
>       0.00            +0.2        0.16 ą  8%  perf-profile.children.cycles-pp.__tlb_remove_page_size
>       0.28 ą 11%      +0.2        0.46 ą 13%  perf-profile.children.cycles-pp.vma_alloc_folio
>       0.00            +0.2        0.24 ą  5%  perf-profile.children.cycles-pp._compound_head
>       0.07 ą 16%      +0.2        0.31 ą  6%  perf-profile.children.cycles-pp.__mod_node_page_state
>       0.38 ą  5%      +0.2        0.62 ą  7%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
>       0.22 ą 12%      +0.2        0.47 ą 10%  perf-profile.children.cycles-pp.schedule_preempt_disabled
>       0.38 ą  5%      +0.3        0.64 ą  7%  perf-profile.children.cycles-pp.perf_event_task_tick
>       0.00            +0.3        0.27 ą  5%  perf-profile.children.cycles-pp.free_swap_cache
>       0.30 ą 10%      +0.3        0.58 ą 10%  perf-profile.children.cycles-pp.rwsem_down_read_slowpath
>       0.00            +0.3        0.30 ą  4%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
>       0.09 ą 10%      +0.3        0.42 ą  7%  perf-profile.children.cycles-pp.__mod_lruvec_state
>       0.00            +0.3        0.34 ą  9%  perf-profile.children.cycles-pp.deferred_split_folio
>       0.00            +0.4        0.36 ą 13%  perf-profile.children.cycles-pp.prep_compound_page
>       0.09 ą 10%      +0.4        0.50 ą  9%  perf-profile.children.cycles-pp.free_unref_page_prepare
>       0.00            +0.4        0.42 ą 11%  perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page
>       1.67 ą  3%      +0.4        2.12 ą  8%  perf-profile.children.cycles-pp.__hrtimer_run_queues
>       0.63 ą  3%      +0.5        1.11 ą 12%  perf-profile.children.cycles-pp.scheduler_tick
>       1.93 ą  3%      +0.5        2.46 ą  8%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>       1.92 ą  3%      +0.5        2.45 ą  8%  perf-profile.children.cycles-pp.hrtimer_interrupt
>       0.73 ą  3%      +0.6        1.31 ą 11%  perf-profile.children.cycles-pp.update_process_times
>       0.74 ą  3%      +0.6        1.34 ą 11%  perf-profile.children.cycles-pp.tick_sched_handle
>       0.20 ą  8%      +0.6        0.83 ą 18%  perf-profile.children.cycles-pp.__cond_resched
>       0.78 ą  4%      +0.6        1.43 ą 12%  perf-profile.children.cycles-pp.tick_nohz_highres_handler
>       0.12 ą  7%      +0.7        0.81 ą  5%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
>       0.28 ą  7%      +0.9        1.23 ą  4%  perf-profile.children.cycles-pp.release_pages
>       0.00            +1.0        1.01 ą  6%  perf-profile.children.cycles-pp.pmdp_invalidate
>       0.35 ą  6%      +1.2        1.56 ą  5%  perf-profile.children.cycles-pp.__mod_lruvec_page_state
>       0.30 ą  8%      +1.2        1.53 ą  4%  perf-profile.children.cycles-pp.tlb_batch_pages_flush
>       0.00            +1.3        1.26 ą  4%  perf-profile.children.cycles-pp.page_add_anon_rmap
>       0.09 ą 11%      +3.1        3.20 ą  5%  perf-profile.children.cycles-pp.page_remove_rmap
>       1.60 ą  2%      +3.4        5.04 ą  4%  perf-profile.children.cycles-pp.zap_pte_range
>       0.03 ą100%      +3.5        3.55 ą  5%  perf-profile.children.cycles-pp.__split_huge_pmd_locked
>      41.36           +11.6       52.92 ą  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      41.22           +11.7       52.88 ą  2%  perf-profile.children.cycles-pp.do_syscall_64
>       6.42 ą  6%     +13.5       19.88 ą  7%  perf-profile.children.cycles-pp.__clone
>       0.82 ą  6%     +16.2       16.98 ą  7%  perf-profile.children.cycles-pp.clear_page_erms
>       2.62 ą  5%     +16.4       19.04 ą  7%  perf-profile.children.cycles-pp.asm_exc_page_fault
>       2.18 ą  5%     +16.8       18.94 ą  7%  perf-profile.children.cycles-pp.exc_page_fault
>       2.06 ą  6%     +16.8       18.90 ą  7%  perf-profile.children.cycles-pp.do_user_addr_fault
>       1.60 ą  8%     +17.0       18.60 ą  7%  perf-profile.children.cycles-pp.handle_mm_fault
>       1.52 ą  7%     +17.1       18.58 ą  7%  perf-profile.children.cycles-pp.__handle_mm_fault
>       0.30 ą  7%     +17.4       17.72 ą  7%  perf-profile.children.cycles-pp.clear_huge_page
>       0.31 ą  8%     +17.6       17.90 ą  7%  perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
>      11.66 ą  3%     +22.2       33.89 ą  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>       3.29 ą  3%     +30.2       33.46        perf-profile.children.cycles-pp._raw_spin_lock
>       0.04 ą 71%     +36.2       36.21 ą  2%  perf-profile.children.cycles-pp.__split_huge_pmd
>       8.00 ą  4%     +36.5       44.54 ą  2%  perf-profile.children.cycles-pp.__madvise
>       7.87 ą  4%     +36.6       44.44 ą  2%  perf-profile.children.cycles-pp.__x64_sys_madvise
>       7.86 ą  4%     +36.6       44.44 ą  2%  perf-profile.children.cycles-pp.do_madvise
>       7.32 ą  4%     +36.8       44.07 ą  2%  perf-profile.children.cycles-pp.madvise_vma_behavior
>       7.26 ą  4%     +36.8       44.06 ą  2%  perf-profile.children.cycles-pp.zap_page_range_single
>       1.78           +39.5       41.30 ą  2%  perf-profile.children.cycles-pp.unmap_page_range
>       1.72           +39.6       41.28 ą  2%  perf-profile.children.cycles-pp.zap_pmd_range
>      24.76 ą  2%      -8.5       16.31 ą  2%  perf-profile.self.cycles-pp.intel_idle
>      11.46 ą  2%      -7.8        3.65 ą  5%  perf-profile.self.cycles-pp.intel_idle_irq
>       3.16 ą  7%      -2.1        1.04 ą  6%  perf-profile.self.cycles-pp.smp_call_function_many_cond
>       1.49 ą  4%      -1.2        0.30 ą 12%  perf-profile.self.cycles-pp.poll_idle
>       1.15 ą  3%      -0.6        0.50 ą  9%  perf-profile.self.cycles-pp._raw_spin_lock
>       0.60 ą  6%      -0.6        0.03 ą100%  perf-profile.self.cycles-pp.queued_write_lock_slowpath
>       0.69 ą  4%      -0.5        0.22 ą 20%  perf-profile.self.cycles-pp.memcpy_orig
>       0.66 ą  7%      -0.5        0.18 ą 11%  perf-profile.self.cycles-pp.update_sg_wakeup_stats
>       0.59 ą  4%      -0.5        0.13 ą  8%  perf-profile.self.cycles-pp._raw_spin_lock_irq
>       0.86 ą  3%      -0.4        0.43 ą 12%  perf-profile.self.cycles-pp.update_sg_lb_stats
>       0.56            -0.4        0.16 ą  7%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
>       0.48 ą  3%      -0.4        0.12 ą 10%  perf-profile.self.cycles-pp.__slab_free
>       1.18 ą  2%      -0.4        0.82 ą  3%  perf-profile.self.cycles-pp.llist_add_batch
>       0.54 ą  5%      -0.3        0.19 ą  6%  perf-profile.self.cycles-pp.__schedule
>       0.47 ą  7%      -0.3        0.18 ą 13%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.34 ą  5%      -0.2        0.09 ą 18%  perf-profile.self.cycles-pp.kmem_cache_free
>       0.43 ą  4%      -0.2        0.18 ą 11%  perf-profile.self.cycles-pp.update_load_avg
>       0.35 ą  4%      -0.2        0.10 ą 23%  perf-profile.self.cycles-pp.rcu_cblist_dequeue
>       0.38 ą  9%      -0.2        0.15 ą 10%  perf-profile.self.cycles-pp.__switch_to_asm
>       0.33 ą  5%      -0.2        0.10 ą 16%  perf-profile.self.cycles-pp.__task_pid_nr_ns
>       0.36 ą  6%      -0.2        0.13 ą 14%  perf-profile.self.cycles-pp.switch_mm_irqs_off
>       0.31 ą  6%      -0.2        0.09 ą  6%  perf-profile.self.cycles-pp.__free_one_page
>       0.28 ą  5%      -0.2        0.06 ą 50%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.27 ą 13%      -0.2        0.06 ą 23%  perf-profile.self.cycles-pp.pthread_create@@GLIBC_2.2.5
>       0.30 ą  7%      -0.2        0.10 ą 19%  perf-profile.self.cycles-pp.__switch_to
>       0.27 ą  4%      -0.2        0.10 ą 17%  perf-profile.self.cycles-pp.finish_task_switch
>       0.23 ą  7%      -0.2        0.06 ą 50%  perf-profile.self.cycles-pp.mas_walk
>       0.22 ą  9%      -0.2        0.05 ą 48%  perf-profile.self.cycles-pp.__clone
>       0.63 ą  5%      -0.2        0.46 ą 12%  perf-profile.self.cycles-pp.llist_reverse_order
>       0.20 ą  4%      -0.2        0.04 ą 72%  perf-profile.self.cycles-pp.entry_SYSCALL_64
>       0.24 ą 10%      -0.1        0.09 ą 19%  perf-profile.self.cycles-pp.rmqueue_bulk
>       0.18 ą 13%      -0.1        0.03 ą101%  perf-profile.self.cycles-pp.__radix_tree_lookup
>       0.18 ą 11%      -0.1        0.04 ą 71%  perf-profile.self.cycles-pp.stress_pthread_func
>       0.36 ą  8%      -0.1        0.22 ą 11%  perf-profile.self.cycles-pp.menu_select
>       0.22 ą  4%      -0.1        0.08 ą 19%  perf-profile.self.cycles-pp.___perf_sw_event
>       0.20 ą 13%      -0.1        0.07 ą 20%  perf-profile.self.cycles-pp.start_thread
>       0.16 ą 13%      -0.1        0.03 ą101%  perf-profile.self.cycles-pp.alloc_vmap_area
>       0.17 ą 10%      -0.1        0.04 ą 73%  perf-profile.self.cycles-pp.kmem_cache_alloc
>       0.14 ą  9%      -0.1        0.03 ą100%  perf-profile.self.cycles-pp.futex_wake
>       0.17 ą  4%      -0.1        0.06 ą 11%  perf-profile.self.cycles-pp.dequeue_task_fair
>       0.23 ą  6%      -0.1        0.12 ą 11%  perf-profile.self.cycles-pp.available_idle_cpu
>       0.22 ą 13%      -0.1        0.11 ą 12%  perf-profile.self.cycles-pp._find_next_bit
>       0.21 ą  7%      -0.1        0.10 ą  6%  perf-profile.self.cycles-pp.__rmqueue_pcplist
>       0.37 ą  7%      -0.1        0.26 ą  8%  perf-profile.self.cycles-pp.native_sched_clock
>       0.22 ą  7%      -0.1        0.12 ą 21%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
>       0.19 ą  7%      -0.1        0.10 ą 11%  perf-profile.self.cycles-pp.enqueue_entity
>       0.15 ą  5%      -0.1        0.06 ą 45%  perf-profile.self.cycles-pp.enqueue_task_fair
>       0.15 ą 11%      -0.1        0.06 ą 17%  perf-profile.self.cycles-pp.__pick_eevdf
>       0.13 ą 13%      -0.1        0.05 ą 72%  perf-profile.self.cycles-pp.prepare_task_switch
>       0.17 ą 10%      -0.1        0.08 ą  8%  perf-profile.self.cycles-pp.update_rq_clock_task
>       0.54 ą  4%      -0.1        0.46 ą  6%  perf-profile.self.cycles-pp.__flush_smp_call_function_queue
>       0.14 ą 14%      -0.1        0.06 ą 11%  perf-profile.self.cycles-pp.os_xsave
>       0.11 ą 10%      -0.1        0.03 ą 70%  perf-profile.self.cycles-pp.try_to_wake_up
>       0.10 ą  8%      -0.1        0.03 ą100%  perf-profile.self.cycles-pp.futex_wait
>       0.14 ą  9%      -0.1        0.07 ą 10%  perf-profile.self.cycles-pp.update_curr
>       0.18 ą  9%      -0.1        0.11 ą 14%  perf-profile.self.cycles-pp.idle_cpu
>       0.11 ą 11%      -0.1        0.04 ą 76%  perf-profile.self.cycles-pp.avg_vruntime
>       0.15 ą 10%      -0.1        0.08 ą 14%  perf-profile.self.cycles-pp.update_cfs_group
>       0.09 ą  9%      -0.1        0.03 ą100%  perf-profile.self.cycles-pp.reweight_entity
>       0.12 ą 13%      -0.1        0.06 ą  8%  perf-profile.self.cycles-pp.do_idle
>       0.18 ą 10%      -0.1        0.12 ą 13%  perf-profile.self.cycles-pp.__update_load_avg_se
>       0.09 ą 17%      -0.1        0.04 ą 71%  perf-profile.self.cycles-pp.cpuidle_idle_call
>       0.10 ą 11%      -0.0        0.06 ą 45%  perf-profile.self.cycles-pp.update_rq_clock
>       0.12 ą 15%      -0.0        0.07 ą 16%  perf-profile.self.cycles-pp.update_sd_lb_stats
>       0.09 ą  5%      -0.0        0.05 ą 46%  perf-profile.self.cycles-pp._find_next_and_bit
>       0.01 ą223%      +0.1        0.08 ą 25%  perf-profile.self.cycles-pp.arch_scale_freq_tick
>       0.78 ą  4%      +0.1        0.87 ą  4%  perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
>       0.14 ą 10%      +0.1        0.23 ą 13%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
>       0.06 ą 46%      +0.1        0.15 ą 19%  perf-profile.self.cycles-pp.cgroup_rstat_updated
>       0.19 ą  3%      +0.1        0.29 ą  4%  perf-profile.self.cycles-pp.cpuidle_enter_state
>       0.00            +0.1        0.10 ą 11%  perf-profile.self.cycles-pp.__mod_lruvec_state
>       0.00            +0.1        0.11 ą 18%  perf-profile.self.cycles-pp.__tlb_remove_page_size
>       0.00            +0.1        0.12 ą  9%  perf-profile.self.cycles-pp.vm_normal_page
>       0.23 ą  7%      +0.1        0.36 ą  8%  perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
>       0.20 ą  8%      +0.2        0.35 ą  7%  perf-profile.self.cycles-pp.__mod_lruvec_page_state
>       1.12 ą  2%      +0.2        1.28 ą  4%  perf-profile.self.cycles-pp.zap_pte_range
>       0.31 ą  8%      +0.2        0.46 ą  9%  perf-profile.self.cycles-pp.native_flush_tlb_local
>       0.00            +0.2        0.16 ą  5%  perf-profile.self.cycles-pp._compound_head
>       0.06 ą 17%      +0.2        0.26 ą  4%  perf-profile.self.cycles-pp.__mod_node_page_state
>       0.00            +0.2        0.24 ą  6%  perf-profile.self.cycles-pp.free_swap_cache
>       0.00            +0.3        0.27 ą 15%  perf-profile.self.cycles-pp.clear_huge_page
>       0.00            +0.3        0.27 ą 11%  perf-profile.self.cycles-pp.deferred_split_folio
>       0.00            +0.4        0.36 ą 13%  perf-profile.self.cycles-pp.prep_compound_page
>       0.05 ą 47%      +0.4        0.43 ą  9%  perf-profile.self.cycles-pp.free_unref_page_prepare
>       0.08 ą  7%      +0.5        0.57 ą 23%  perf-profile.self.cycles-pp.__cond_resched
>       0.08 ą 12%      +0.5        0.58 ą  5%  perf-profile.self.cycles-pp.release_pages
>       0.10 ą 10%      +0.5        0.63 ą  6%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
>       0.00            +1.1        1.11 ą  7%  perf-profile.self.cycles-pp.__split_huge_pmd_locked
>       0.00            +1.2        1.18 ą  4%  perf-profile.self.cycles-pp.page_add_anon_rmap
>       0.03 ą101%      +1.3        1.35 ą  7%  perf-profile.self.cycles-pp.page_remove_rmap
>       0.82 ą  5%     +16.1       16.88 ą  7%  perf-profile.self.cycles-pp.clear_page_erms
>      11.65 ą  3%     +20.2       31.88 ą  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>
>
> ***************************************************************************************************
> lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> =========================================================================================
> array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
>   50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
>
> commit:
>   30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>   1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>      10.50 ą 14%     +55.6%      16.33 ą 16%  perf-c2c.DRAM.local
>       6724           -11.4%       5954 ą  2%  vmstat.system.cs
>  2.746e+09           +16.7%  3.205e+09 ą  2%  cpuidle..time
>    2771516           +16.0%    3213723 ą  2%  cpuidle..usage
>       0.06 ą  4%      -0.0        0.05 ą  5%  mpstat.cpu.all.soft%
>       0.47 ą  2%      -0.1        0.39 ą  2%  mpstat.cpu.all.sys%
>       0.01 ą 85%   +1700.0%       0.20 ą188%  perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
>      15.11 ą 13%     -28.8%      10.76 ą 34%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>      15.09 ą 13%     -30.3%      10.51 ą 38%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
>    1023952           +13.4%    1161219        meminfo.AnonHugePages
>    1319741           +10.8%    1461995        meminfo.AnonPages
>    1331039           +11.2%    1480149        meminfo.Inactive
>    1330865           +11.2%    1479975        meminfo.Inactive(anon)
>    1266202           +16.0%    1469399 ą  2%  turbostat.C1E
>    1509871           +16.6%    1760853 ą  2%  turbostat.C6
>    3521203           +17.4%    4134075 ą  3%  turbostat.IRQ
>     580.32            -3.8%     558.30        turbostat.PkgWatt
>      77.42           -14.0%      66.60 ą  2%  turbostat.RAMWatt
>     330416           +10.8%     366020        proc-vmstat.nr_anon_pages
>     500.90           +13.4%     567.99        proc-vmstat.nr_anon_transparent_hugepages
>     333197           +11.2%     370536        proc-vmstat.nr_inactive_anon
>     333197           +11.2%     370536        proc-vmstat.nr_zone_inactive_anon
>     129879 ą 11%     -46.7%      69207 ą 12%  proc-vmstat.numa_pages_migrated
>    3879028            +5.9%    4109180        proc-vmstat.pgalloc_normal
>    3403414            +6.6%    3628929        proc-vmstat.pgfree
>     129879 ą 11%     -46.7%      69207 ą 12%  proc-vmstat.pgmigrate_success
>       5763            +9.8%       6327        proc-vmstat.thp_fault_alloc
>     350993           -15.6%     296081 ą  2%  stream.add_bandwidth_MBps
>     349830           -16.1%     293492 ą  2%  stream.add_bandwidth_MBps_harmonicMean
>     333973           -20.5%     265439 ą  3%  stream.copy_bandwidth_MBps
>     332930           -21.7%     260548 ą  3%  stream.copy_bandwidth_MBps_harmonicMean
>     302788           -16.2%     253817 ą  2%  stream.scale_bandwidth_MBps
>     302157           -17.1%     250577 ą  2%  stream.scale_bandwidth_MBps_harmonicMean
>    1177276            +9.3%    1286614        stream.time.maximum_resident_set_size
>       5038            +1.1%       5095        stream.time.percent_of_cpu_this_job_got
>     694.19 ą  2%     +19.5%     829.85 ą  2%  stream.time.user_time
>     339047           -12.1%     298061        stream.triad_bandwidth_MBps
>     338186           -12.4%     296218        stream.triad_bandwidth_MBps_harmonicMean
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.4        0.00        perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode
>       0.84 ą103%      +1.7        2.57 ą 59%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       0.84 ą103%      +1.7        2.57 ą 59%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
>       0.31 ą223%      +2.0        2.33 ą 44%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
>       0.31 ą223%      +2.0        2.33 ą 44%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
>       3.07 ą 56%      +2.8        5.88 ą 28%  perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       8.42 ą100%      -8.4        0.00        perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
>       8.42 ą100%      -8.1        0.36 ą223%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
>      12.32 ą 25%      -6.6        5.69 ą 69%  perf-profile.children.cycles-pp.vsnprintf
>      12.76 ą 27%      -6.6        6.19 ą 67%  perf-profile.children.cycles-pp.seq_printf
>       3.07 ą 56%      +2.8        5.88 ą 28%  perf-profile.children.cycles-pp.__x64_sys_exit_group
>      40.11           -11.0%      35.71 ą  2%  perf-stat.i.MPKI
>  1.563e+10           -12.3%  1.371e+10 ą  2%  perf-stat.i.branch-instructions
>  3.721e+09 ą  2%     -23.2%  2.858e+09 ą  4%  perf-stat.i.cache-misses
>  4.471e+09 ą  3%     -22.7%  3.458e+09 ą  4%  perf-stat.i.cache-references
>       5970 ą  5%     -15.9%       5021 ą  4%  perf-stat.i.context-switches
>       1.66 ą  2%     +15.8%       1.92 ą  2%  perf-stat.i.cpi
>      41.83 ą  4%     +30.6%      54.63 ą  4%  perf-stat.i.cycles-between-cache-misses
>  2.282e+10 ą  2%     -14.5%  1.952e+10 ą  2%  perf-stat.i.dTLB-loads
>     572602 ą  3%      -9.2%     519922 ą  5%  perf-stat.i.dTLB-store-misses
>  1.483e+10 ą  2%     -15.7%   1.25e+10 ą  2%  perf-stat.i.dTLB-stores
>  9.179e+10           -13.7%  7.924e+10 ą  2%  perf-stat.i.instructions
>       0.61           -13.4%       0.52 ą  2%  perf-stat.i.ipc
>     373.79 ą  4%     -37.8%     232.60 ą  9%  perf-stat.i.metric.K/sec
>     251.45           -13.4%     217.72 ą  2%  perf-stat.i.metric.M/sec
>      21446 ą  3%     -24.1%      16278 ą  8%  perf-stat.i.minor-faults
>      15.07 ą  5%      -6.0        9.10 ą 10%  perf-stat.i.node-load-miss-rate%
>   68275790 ą  5%     -44.9%   37626128 ą 12%  perf-stat.i.node-load-misses
>      21448 ą  3%     -24.1%      16281 ą  8%  perf-stat.i.page-faults
>      40.71           -11.3%      36.10 ą  2%  perf-stat.overall.MPKI
>       1.67           +15.3%       1.93 ą  2%  perf-stat.overall.cpi
>      41.07 ą  3%     +30.1%      53.42 ą  4%  perf-stat.overall.cycles-between-cache-misses
>       0.00 ą  2%      +0.0        0.00 ą  2%  perf-stat.overall.dTLB-store-miss-rate%
>       0.60           -13.2%       0.52 ą  2%  perf-stat.overall.ipc
>      15.19 ą  5%      -6.2        9.03 ą 11%  perf-stat.overall.node-load-miss-rate%
>    1.4e+10            -9.3%  1.269e+10        perf-stat.ps.branch-instructions
>  3.352e+09 ą  3%     -20.9%  2.652e+09 ą  4%  perf-stat.ps.cache-misses
>  4.026e+09 ą  3%     -20.3%  3.208e+09 ą  4%  perf-stat.ps.cache-references
>       4888 ą  4%     -10.8%       4362 ą  3%  perf-stat.ps.context-switches
>     206092            +2.1%     210375        perf-stat.ps.cpu-clock
>  1.375e+11            +2.8%  1.414e+11        perf-stat.ps.cpu-cycles
>     258.23 ą  5%      +8.8%     280.85 ą  4%  perf-stat.ps.cpu-migrations
>  2.048e+10           -11.7%  1.809e+10 ą  2%  perf-stat.ps.dTLB-loads
>  1.333e+10 ą  2%     -13.0%   1.16e+10 ą  2%  perf-stat.ps.dTLB-stores
>  8.231e+10           -10.8%  7.342e+10        perf-stat.ps.instructions
>      15755 ą  3%     -16.3%      13187 ą  6%  perf-stat.ps.minor-faults
>   61706790 ą  6%     -43.8%   34699716 ą 11%  perf-stat.ps.node-load-misses
>      15757 ą  3%     -16.3%      13189 ą  6%  perf-stat.ps.page-faults
>     206092            +2.1%     210375        perf-stat.ps.task-clock
>  1.217e+12            +4.1%  1.267e+12 ą  2%  perf-stat.total.instructions
>
>
>
> ***************************************************************************************************
> lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> commit:
>   30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>   1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>     232.12 ą  7%     -12.0%     204.18 ą  8%  sched_debug.cfs_rq:/.load_avg.stddev
>       6797            -3.3%       6576        vmstat.system.cs
>      15161            -0.9%      15029        vmstat.system.in
>     349927           +44.3%     504820        meminfo.AnonHugePages
>     507807           +27.1%     645169        meminfo.AnonPages
>    1499332           +10.2%    1652612        meminfo.Inactive(anon)
>       8.67 ą 62%    +184.6%      24.67 ą 25%  turbostat.C10
>       1.50            -0.1        1.45        turbostat.C1E%
>       3.30            -3.2%       3.20        turbostat.RAMWatt
>       1.40 ą 14%      -0.3        1.09 ą 13%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
>       1.44 ą 12%      -0.3        1.12 ą 13%  perf-profile.children.cycles-pp.asm_exc_page_fault
>       0.03 ą141%      +0.1        0.10 ą 30%  perf-profile.children.cycles-pp.next_uptodate_folio
>       0.02 ą141%      +0.1        0.10 ą 22%  perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
>       0.02 ą143%      +0.1        0.10 ą 25%  perf-profile.self.cycles-pp.next_uptodate_folio
>       0.01 ą223%      +0.1        0.09 ą 19%  perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
>      19806            -3.5%      19109        phoronix-test-suite.ramspeed.Average.Integer.mb_s
>     283.70            +3.8%     294.50        phoronix-test-suite.time.elapsed_time
>     283.70            +3.8%     294.50        phoronix-test-suite.time.elapsed_time.max
>     120454            +1.6%     122334        phoronix-test-suite.time.maximum_resident_set_size
>     281337           -54.8%     127194        phoronix-test-suite.time.minor_page_faults
>     259.13            +4.1%     269.81        phoronix-test-suite.time.user_time
>     126951           +27.0%     161291        proc-vmstat.nr_anon_pages
>     170.86           +44.3%     246.49        proc-vmstat.nr_anon_transparent_hugepages
>     355917            -1.0%     352250        proc-vmstat.nr_dirty_background_threshold
>     712705            -1.0%     705362        proc-vmstat.nr_dirty_threshold
>    3265201            -1.1%    3228465        proc-vmstat.nr_free_pages
>     374833           +10.2%     413153        proc-vmstat.nr_inactive_anon
>       1767            +4.8%       1853        proc-vmstat.nr_page_table_pages
>     374833           +10.2%     413153        proc-vmstat.nr_zone_inactive_anon
>     854665           -34.3%     561406        proc-vmstat.numa_hit
>     854632           -34.3%     561397        proc-vmstat.numa_local
>    5548755            +1.1%    5610598        proc-vmstat.pgalloc_normal
>    1083315           -26.2%     799129        proc-vmstat.pgfault
>     113425            +3.7%     117656        proc-vmstat.pgreuse
>       9025            +7.6%       9714        proc-vmstat.thp_fault_alloc
>       3.38            +0.1        3.45        perf-stat.i.branch-miss-rate%
>  4.135e+08            -3.2%  4.003e+08        perf-stat.i.cache-misses
>  5.341e+08            -2.7%  5.197e+08        perf-stat.i.cache-references
>       6832            -3.4%       6600        perf-stat.i.context-switches
>       4.06            +3.1%       4.19        perf-stat.i.cpi
>     438639 ą  5%     -18.7%     356730 ą  6%  perf-stat.i.dTLB-load-misses
>  1.119e+09            -3.8%  1.077e+09        perf-stat.i.dTLB-loads
>       0.02 ą 15%      -0.0        0.01 ą 26%  perf-stat.i.dTLB-store-miss-rate%
>      80407 ą 10%     -63.5%      29387 ą 23%  perf-stat.i.dTLB-store-misses
>  7.319e+08            -3.8%  7.043e+08        perf-stat.i.dTLB-stores
>      57.72            +0.8       58.52        perf-stat.i.iTLB-load-miss-rate%
>     129846            -3.8%     124973        perf-stat.i.iTLB-load-misses
>     144448            -5.3%     136837        perf-stat.i.iTLB-loads
>  2.389e+09            -3.5%  2.305e+09        perf-stat.i.instructions
>       0.28            -2.9%       0.27        perf-stat.i.ipc
>     220.59            -3.4%     213.11        perf-stat.i.metric.M/sec
>       3610           -31.2%       2483        perf-stat.i.minor-faults
>   49238342            +1.1%   49776834        perf-stat.i.node-loads
>   98106028            -3.1%   95018390        perf-stat.i.node-stores
>       3615           -31.2%       2487        perf-stat.i.page-faults
>       3.65            +3.7%       3.78        perf-stat.overall.cpi
>      21.08            +3.3%      21.79        perf-stat.overall.cycles-between-cache-misses
>       0.04 ą  5%      -0.0        0.03 ą  6%  perf-stat.overall.dTLB-load-miss-rate%
>       0.01 ą 10%      -0.0        0.00 ą 23%  perf-stat.overall.dTLB-store-miss-rate%
>       0.27            -3.6%       0.26        perf-stat.overall.ipc
>  4.122e+08            -3.2%   3.99e+08        perf-stat.ps.cache-misses
>  5.324e+08            -2.7%  5.181e+08        perf-stat.ps.cache-references
>       6809            -3.4%       6580        perf-stat.ps.context-switches
>     437062 ą  5%     -18.7%     355481 ą  6%  perf-stat.ps.dTLB-load-misses
>  1.115e+09            -3.8%  1.073e+09        perf-stat.ps.dTLB-loads
>      80134 ą 10%     -63.5%      29283 ą 23%  perf-stat.ps.dTLB-store-misses
>  7.295e+08            -3.8%  7.021e+08        perf-stat.ps.dTLB-stores
>     129362            -3.7%     124535        perf-stat.ps.iTLB-load-misses
>     143865            -5.2%     136338        perf-stat.ps.iTLB-loads
>  2.381e+09            -3.5%  2.297e+09        perf-stat.ps.instructions
>       3596           -31.2%       2473        perf-stat.ps.minor-faults
>   49081949            +1.1%   49621463        perf-stat.ps.node-loads
>   97795918            -3.1%   94724831        perf-stat.ps.node-stores
>       3600           -31.2%       2477        perf-stat.ps.page-faults
>
>
>
> ***************************************************************************************************
> lkp-cfl-d1: 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> commit:
>   30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>   1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>     167.28 ą  5%     -13.1%     145.32 ą  6%  sched_debug.cfs_rq:/.util_est_enqueued.avg
>       6845            -2.5%       6674        vmstat.system.cs
>     351910 ą  2%     +40.2%     493341        meminfo.AnonHugePages
>     505908           +27.2%     643328        meminfo.AnonPages
>    1497656           +10.2%    1650453        meminfo.Inactive(anon)
>      18957 ą 13%     +26.3%      23947 ą 17%  turbostat.C1
>       1.52            -0.0        1.48        turbostat.C1E%
>       3.32            -2.9%       3.23        turbostat.RAMWatt
>      19978            -3.0%      19379        phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
>     280.71            +3.3%     289.93        phoronix-test-suite.time.elapsed_time
>     280.71            +3.3%     289.93        phoronix-test-suite.time.elapsed_time.max
>     120465            +1.5%     122257        phoronix-test-suite.time.maximum_resident_set_size
>     281047           -54.7%     127190        phoronix-test-suite.time.minor_page_faults
>     257.03            +3.5%     265.95        phoronix-test-suite.time.user_time
>     126473           +27.2%     160831        proc-vmstat.nr_anon_pages
>     171.83 ą  2%     +40.2%     240.89        proc-vmstat.nr_anon_transparent_hugepages
>     355973            -1.0%     352304        proc-vmstat.nr_dirty_background_threshold
>     712818            -1.0%     705471        proc-vmstat.nr_dirty_threshold
>    3265800            -1.1%    3228879        proc-vmstat.nr_free_pages
>     374410           +10.2%     412613        proc-vmstat.nr_inactive_anon
>       1770            +4.4%       1848        proc-vmstat.nr_page_table_pages
>     374410           +10.2%     412613        proc-vmstat.nr_zone_inactive_anon
>     852082           -34.9%     555093        proc-vmstat.numa_hit
>     852125           -34.9%     555018        proc-vmstat.numa_local
>    1078293           -26.6%     791038        proc-vmstat.pgfault
>     112693            +2.9%     116004        proc-vmstat.pgreuse
>       9025            +7.6%       9713        proc-vmstat.thp_fault_alloc
>       3.63 ą  6%      +0.6        4.25 ą  9%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
>       0.25 ą 55%      -0.2        0.08 ą 68%  perf-profile.children.cycles-pp.ret_from_fork_asm
>       0.25 ą 55%      -0.2        0.08 ą 68%  perf-profile.children.cycles-pp.ret_from_fork
>       0.23 ą 56%      -0.2        0.07 ą 69%  perf-profile.children.cycles-pp.kthread
>       0.14 ą 36%      -0.1        0.05 ą120%  perf-profile.children.cycles-pp.do_anonymous_page
>       0.14 ą 35%      -0.1        0.05 ą 76%  perf-profile.children.cycles-pp.copy_mc_enhanced_fast_string
>       0.04 ą 72%      +0.0        0.08 ą 19%  perf-profile.children.cycles-pp.try_to_wake_up
>       0.04 ą118%      +0.1        0.10 ą 36%  perf-profile.children.cycles-pp.update_rq_clock
>       0.07 ą 79%      +0.1        0.17 ą 21%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       7.99 ą 11%      +1.0        9.02 ą  5%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>       0.23 ą 28%      -0.1        0.14 ą 49%  perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
>       0.14 ą 35%      -0.1        0.05 ą 76%  perf-profile.self.cycles-pp.copy_mc_enhanced_fast_string
>       0.06 ą 79%      +0.1        0.16 ą 21%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.21 ą 34%      +0.2        0.36 ą 18%  perf-profile.self.cycles-pp.ktime_get
>  1.187e+08            -4.6%  1.133e+08        perf-stat.i.branch-instructions
>       3.36            +0.1        3.42        perf-stat.i.branch-miss-rate%
>    5492420            -3.9%    5275592        perf-stat.i.branch-misses
>  4.148e+08            -2.8%  4.034e+08        perf-stat.i.cache-misses
>  5.251e+08            -2.6%  5.114e+08        perf-stat.i.cache-references
>       6880            -2.5%       6711        perf-stat.i.context-switches
>       4.30            +2.9%       4.43        perf-stat.i.cpi
>       0.10 ą  7%      -0.0        0.09 ą  2%  perf-stat.i.dTLB-load-miss-rate%
>     472268 ą  6%     -19.9%     378489        perf-stat.i.dTLB-load-misses
>  8.107e+08            -3.4%  7.831e+08        perf-stat.i.dTLB-loads
>       0.02 ą 16%      -0.0        0.01 ą  2%  perf-stat.i.dTLB-store-miss-rate%
>      90535 ą 11%     -59.8%      36371 ą  2%  perf-stat.i.dTLB-store-misses
>  5.323e+08            -3.3%  5.145e+08        perf-stat.i.dTLB-stores
>     129981            -3.0%     126061        perf-stat.i.iTLB-load-misses
>     143662            -3.1%     139223        perf-stat.i.iTLB-loads
>  2.253e+09            -3.6%  2.172e+09        perf-stat.i.instructions
>       0.26            -3.2%       0.25        perf-stat.i.ipc
>       4.71 ą  2%      -6.4%       4.41 ą  2%  perf-stat.i.major-faults
>     180.03            -3.0%     174.57        perf-stat.i.metric.M/sec
>       3627           -30.8%       2510 ą  2%  perf-stat.i.minor-faults
>       3632           -30.8%       2514 ą  2%  perf-stat.i.page-faults
>       3.88            +3.6%       4.02        perf-stat.overall.cpi
>      21.08            +2.7%      21.65        perf-stat.overall.cycles-between-cache-misses
>       0.06 ą  6%      -0.0        0.05        perf-stat.overall.dTLB-load-miss-rate%
>       0.02 ą 11%      -0.0        0.01 ą  2%  perf-stat.overall.dTLB-store-miss-rate%
>       0.26            -3.5%       0.25        perf-stat.overall.ipc
>  1.182e+08            -4.6%  1.128e+08        perf-stat.ps.branch-instructions
>    5468166            -4.0%    5251939        perf-stat.ps.branch-misses
>  4.135e+08            -2.7%  4.021e+08        perf-stat.ps.cache-misses
>  5.234e+08            -2.6%  5.098e+08        perf-stat.ps.cache-references
>       6859            -2.5%       6685        perf-stat.ps.context-switches
>     470567 ą  6%     -19.9%     377127        perf-stat.ps.dTLB-load-misses
>  8.079e+08            -3.4%  7.805e+08        perf-stat.ps.dTLB-loads
>      90221 ą 11%     -59.8%      36239 ą  2%  perf-stat.ps.dTLB-store-misses
>  5.305e+08            -3.3%  5.128e+08        perf-stat.ps.dTLB-stores
>     129499            -3.0%     125601        perf-stat.ps.iTLB-load-misses
>     143121            -3.1%     138638        perf-stat.ps.iTLB-loads
>  2.246e+09            -3.6%  2.165e+09        perf-stat.ps.instructions
>       4.69 ą  2%      -6.3%       4.39 ą  2%  perf-stat.ps.major-faults
>       3613           -30.8%       2500 ą  2%  perf-stat.ps.minor-faults
>       3617           -30.8%       2504 ą  2%  perf-stat.ps.page-faults
>
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-20  5:27 ` Yang Shi
@ 2023-12-20  8:29   ` Yin Fengwei
  2023-12-20 15:42     ` Christoph Lameter (Ampere)
  2023-12-20 20:09     ` Yang Shi
  0 siblings, 2 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-12-20  8:29 UTC (permalink / raw)
  To: Yang Shi, kernel test robot
  Cc: Rik van Riel, oe-lkp, lkp, Linux Memory Management List,
	Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang,
	feng.tang



On 2023/12/20 13:27, Yang Shi wrote:
> On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote:
>>
>>
>>
>> Hello,
>>
>> for this commit, we reported
>> "[mm]  96db82a66d:  will-it-scale.per_process_ops -95.3% regression"
>> in Aug, 2022 when it's in linux-next/master
>> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
>>
>> later, we reported
>> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
>> in Oct, 2022 when it's in linus/master
>> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
>>
>> and the commit was reverted finally by
>> commit 0ba09b1733878afe838fe35c310715fda3d46428
>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>> Date:   Sun Dec 4 12:51:59 2022 -0800
>>
>> now we noticed it goes into linux-next/master again.
>>
>> we are not sure if there is an agreement that the benefit of this commit
>> has already overweight performance drop in some mirco benchmark.
>>
>> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
>> that
>> "This patch was applied to v6.1, but was reverted due to a regression
>> report.  However it turned out the regression was not due to this patch.
>> I ping'ed Andrew to reapply this patch, Andrew may forget it.  This
>> patch helps promote THP, so I rebased it onto the latest mm-unstable."
> 
> IIRC, Huang Ying's analysis showed the regression for will-it-scale
> micro benchmark is fine, it was actually reverted due to kernel build
> regression with LLVM reported by Nathan Chancellor. Then the
> regression was resolved by commit
> 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
> if page in deferred queue already"). And this patch did improve kernel
> build with GCC by ~3% if I remember correctly.
> 
>>
>> however, unfortunately, in our latest tests, we still observed below regression
>> upon this commit. just FYI.
>>
>>
>>
>> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
> 
> Interesting, wasn't the same regression seen last time? And I'm a
> little bit confused about how pthread got regressed. I didn't see the
> pthread benchmark do any intensive memory alloc/free operations. Do
> the pthread APIs do any intensive memory operations? I saw the
> benchmark does allocate memory for thread stack, but it should be just
> 8K per thread, so it should not trigger what this patch does. With
> 1024 threads, the thread stacks may get merged into one single VMA (8M
> total), but it may do so even though the patch is not applied.
stress-ng.pthread test code is strange here:

https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573

Even it allocates its own stack, but that attr is not passed
to pthread_create. So it's still glibc to allocate stack for
pthread which is 8M size. This is why this patch can impact
the stress-ng.pthread testing.


My understanding is this is different regression (if it's a valid
regression). The previous hotspot was in:
    deferred_split_huge_page
       deferred_split_huge_page
          deferred_split_huge_page
             spin_lock

while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
    - 55.02% zap_pmd_range.isra.0
       - 53.42% __split_huge_pmd
          - 51.74% _raw_spin_lock
             - 51.73% native_queued_spin_lock_slowpath
                + 3.03% asm_sysvec_call_function
          - 1.67% __split_huge_pmd_locked
             - 0.87% pmdp_invalidate
                + 0.86% flush_tlb_mm_range
       - 1.60% zap_pte_range
          - 1.04% page_remove_rmap
               0.55% __mod_lruvec_page_state


> 
>>
>>
>> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>> testcase: stress-ng
>> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
>> parameters:
>>
>>          nr_threads: 1
>>          disk: 1HDD
>>          testtime: 60s
>>          fs: ext4
>>          class: os
>>          test: pthread
>>          cpufreq_governor: performance
>>
>>
>> In addition to that, the commit also has significant impact on the following tests:
>>
>> +------------------+-----------------------------------------------------------------------------------------------+
>> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression                                         |
>> | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory   |
>> | test parameters  | array_size=50000000                                                                           |
>> |                  | cpufreq_governor=performance                                                                  |
>> |                  | iterations=10x                                                                                |
>> |                  | loop=100                                                                                      |
>> |                  | nr_threads=25%                                                                                |
>> |                  | omp=true                                                                                      |
>> +------------------+-----------------------------------------------------------------------------------------------+
>> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression       |
>> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
>> | test parameters  | cpufreq_governor=performance                                                                  |
>> |                  | option_a=Average                                                                              |
>> |                  | option_b=Integer                                                                              |
>> |                  | test=ramspeed-1.4.3                                                                           |
>> +------------------+-----------------------------------------------------------------------------------------------+
>> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
>> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
>> | test parameters  | cpufreq_governor=performance                                                                  |
>> |                  | option_a=Average                                                                              |
>> |                  | option_b=Floating Point                                                                       |
>> |                  | test=ramspeed-1.4.3                                                                           |
>> +------------------+-----------------------------------------------------------------------------------------------+
>>
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>> The kernel config and materials to reproduce are available at:
>> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
>>
>> =========================================================================================
>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>>    os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
>>
>> commit:
>>    30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>>    1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>>
>> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
>> ---------------- ---------------------------
>>           %stddev     %change         %stddev
>>               \          |                \
>>    13405796           -65.5%    4620124        cpuidle..usage
>>        8.00            +8.2%       8.66 ą  2%  iostat.cpu.system
>>        1.61           -60.6%       0.63        iostat.cpu.user
>>      597.50 ą 14%     -64.3%     213.50 ą 14%  perf-c2c.DRAM.local
>>        1882 ą 14%     -74.7%     476.83 ą  7%  perf-c2c.HITM.local
>>     3768436           -12.9%    3283395        vmstat.memory.cache
>>      355105           -75.7%      86344 ą  3%  vmstat.system.cs
>>      385435           -20.7%     305714 ą  3%  vmstat.system.in
>>        1.13            -0.2        0.88        mpstat.cpu.all.irq%
>>        0.29            -0.2        0.10 ą  2%  mpstat.cpu.all.soft%
>>        6.76 ą  2%      +1.1        7.88 ą  2%  mpstat.cpu.all.sys%
>>        1.62            -1.0        0.62 ą  2%  mpstat.cpu.all.usr%
>>     2234397           -84.3%     350161 ą  5%  stress-ng.pthread.ops
>>       37237           -84.3%       5834 ą  5%  stress-ng.pthread.ops_per_sec
>>      294706 ą  2%     -68.0%      94191 ą  6%  stress-ng.time.involuntary_context_switches
>>       41442 ą  2%   +5023.4%    2123284        stress-ng.time.maximum_resident_set_size
>>     4466457           -83.9%     717053 ą  5%  stress-ng.time.minor_page_faults
> 
> The larger RSS and fewer page faults are expected.
> 
>>      243.33           +13.5%     276.17 ą  3%  stress-ng.time.percent_of_cpu_this_job_got
>>      131.64           +27.7%     168.11 ą  3%  stress-ng.time.system_time
>>       19.73           -82.1%       3.53 ą  4%  stress-ng.time.user_time
> 
> Much less user time. And it seems to match the drop of the pthread metric.
> 
>>     7715609           -80.2%    1530125 ą  4%  stress-ng.time.voluntary_context_switches
>>       76728           -80.8%      14724 ą  4%  perf-stat.i.minor-faults
>>     5600408           -61.4%    2160997 ą  5%  perf-stat.i.node-loads
>>     8873996           +52.1%   13499744 ą  5%  perf-stat.i.node-stores
>>      112409           -81.9%      20305 ą  4%  perf-stat.i.page-faults
>>        2.55           +89.6%       4.83        perf-stat.overall.MPKI
> 
> Much more TLB misses.
> 
>>        1.51            -0.4        1.13        perf-stat.overall.branch-miss-rate%
>>       19.26           +24.5       43.71        perf-stat.overall.cache-miss-rate%
>>        1.70           +56.4%       2.65        perf-stat.overall.cpi
>>      665.84           -17.5%     549.51 ą  2%  perf-stat.overall.cycles-between-cache-misses
>>        0.12 ą  4%      -0.1        0.04        perf-stat.overall.dTLB-load-miss-rate%
>>        0.08 ą  2%      -0.0        0.03        perf-stat.overall.dTLB-store-miss-rate%
>>       59.16            +0.9       60.04        perf-stat.overall.iTLB-load-miss-rate%
>>        1278           +86.1%       2379 ą  2%  perf-stat.overall.instructions-per-iTLB-miss
>>        0.59           -36.1%       0.38        perf-stat.overall.ipc
> 
> Worse IPC and CPI.
> 
>>   2.078e+09           -48.3%  1.074e+09 ą  4%  perf-stat.ps.branch-instructions
>>    31292687           -61.2%   12133349 ą  2%  perf-stat.ps.branch-misses
>>    26057291            -5.9%   24512034 ą  4%  perf-stat.ps.cache-misses
>>   1.353e+08           -58.6%   56072195 ą  4%  perf-stat.ps.cache-references
>>      365254           -75.8%      88464 ą  3%  perf-stat.ps.context-switches
>>   1.735e+10           -22.4%  1.346e+10 ą  2%  perf-stat.ps.cpu-cycles
>>       60838           -79.1%      12727 ą  6%  perf-stat.ps.cpu-migrations
>>     3056601 ą  4%     -81.5%     565354 ą  4%  perf-stat.ps.dTLB-load-misses
>>   2.636e+09           -50.7%    1.3e+09 ą  4%  perf-stat.ps.dTLB-loads
>>     1155253 ą  2%     -83.0%     196581 ą  5%  perf-stat.ps.dTLB-store-misses
>>   1.473e+09           -57.4%  6.268e+08 ą  3%  perf-stat.ps.dTLB-stores
>>     7997726           -73.3%    2131477 ą  3%  perf-stat.ps.iTLB-load-misses
>>     5521346           -74.3%    1418623 ą  2%  perf-stat.ps.iTLB-loads
>>   1.023e+10           -50.4%  5.073e+09 ą  4%  perf-stat.ps.instructions
>>       75671           -80.9%      14479 ą  4%  perf-stat.ps.minor-faults
>>     5549722           -61.4%    2141750 ą  4%  perf-stat.ps.node-loads
>>     8769156           +51.6%   13296579 ą  5%  perf-stat.ps.node-stores
>>      110795           -82.0%      19977 ą  4%  perf-stat.ps.page-faults
>>   6.482e+11           -50.7%  3.197e+11 ą  4%  perf-stat.total.instructions
>>        0.00 ą 37%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>>        0.01 ą 18%   +8373.1%       0.73 ą 49%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>>        0.01 ą 16%   +4600.0%       0.38 ą 24%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
> 
> More time spent in madvise and munmap. but I'm not sure whether this
> is caused by tearing down the address space when exiting the test. If
> so it should not count in the regression.
It's not for the whole address space tearing down. It's for pthread
stack tearing down when pthread exit (can be treated as address space
tearing down? I suppose so).

https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576

Another thing is whether it's worthy to make stack use THP? It may be
useful for some apps which need large stack size?


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-20  8:29   ` Yin Fengwei
@ 2023-12-20 15:42     ` Christoph Lameter (Ampere)
  2023-12-20 20:14       ` Yang Shi
  2023-12-20 20:09     ` Yang Shi
  1 sibling, 1 reply; 24+ messages in thread
From: Christoph Lameter (Ampere) @ 2023-12-20 15:42 UTC (permalink / raw)
  To: Yin Fengwei
  Cc: Yang Shi, kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	ying.huang, feng.tang

On Wed, 20 Dec 2023, Yin Fengwei wrote:

>> Interesting, wasn't the same regression seen last time? And I'm a
>> little bit confused about how pthread got regressed. I didn't see the
>> pthread benchmark do any intensive memory alloc/free operations. Do
>> the pthread APIs do any intensive memory operations? I saw the
>> benchmark does allocate memory for thread stack, but it should be just
>> 8K per thread, so it should not trigger what this patch does. With
>> 1024 threads, the thread stacks may get merged into one single VMA (8M
>> total), but it may do so even though the patch is not applied.
> stress-ng.pthread test code is strange here:
>
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
>
> Even it allocates its own stack, but that attr is not passed
> to pthread_create. So it's still glibc to allocate stack for
> pthread which is 8M size. This is why this patch can impact
> the stress-ng.pthread testing.

Hmmm... The use of calloc()  for 8M triggers an mmap I guess.

Why is that memory slower if we align the adress to a 2M boundary? Because 
THP can act faster and creates more overhead?

> while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
>   - 55.02% zap_pmd_range.isra.0
>      - 53.42% __split_huge_pmd
>         - 51.74% _raw_spin_lock
>            - 51.73% native_queued_spin_lock_slowpath
>               + 3.03% asm_sysvec_call_function
>         - 1.67% __split_huge_pmd_locked
>            - 0.87% pmdp_invalidate
>               + 0.86% flush_tlb_mm_range
>      - 1.60% zap_pte_range
>         - 1.04% page_remove_rmap
>              0.55% __mod_lruvec_page_state

Ok so we have 2M mappings and they are split because of some action on 4K 
segments? Guess because of the guard pages?

>> More time spent in madvise and munmap. but I'm not sure whether this
>> is caused by tearing down the address space when exiting the test. If
>> so it should not count in the regression.
> It's not for the whole address space tearing down. It's for pthread
> stack tearing down when pthread exit (can be treated as address space
> tearing down? I suppose so).
>
> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
> https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
>
> Another thing is whether it's worthy to make stack use THP? It may be
> useful for some apps which need large stack size?

No can do since a calloc is used to allocate the stack. How can the kernel 
distinguish the allocation?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-20  8:29   ` Yin Fengwei
  2023-12-20 15:42     ` Christoph Lameter (Ampere)
@ 2023-12-20 20:09     ` Yang Shi
  2023-12-21  0:26       ` Yang Shi
  1 sibling, 1 reply; 24+ messages in thread
From: Yang Shi @ 2023-12-20 20:09 UTC (permalink / raw)
  To: Yin Fengwei
  Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang

On Wed, Dec 20, 2023 at 12:34 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 2023/12/20 13:27, Yang Shi wrote:
> > On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote:
> >>
> >>
> >>
> >> Hello,
> >>
> >> for this commit, we reported
> >> "[mm]  96db82a66d:  will-it-scale.per_process_ops -95.3% regression"
> >> in Aug, 2022 when it's in linux-next/master
> >> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
> >>
> >> later, we reported
> >> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
> >> in Oct, 2022 when it's in linus/master
> >> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
> >>
> >> and the commit was reverted finally by
> >> commit 0ba09b1733878afe838fe35c310715fda3d46428
> >> Author: Linus Torvalds <torvalds@linux-foundation.org>
> >> Date:   Sun Dec 4 12:51:59 2022 -0800
> >>
> >> now we noticed it goes into linux-next/master again.
> >>
> >> we are not sure if there is an agreement that the benefit of this commit
> >> has already overweight performance drop in some mirco benchmark.
> >>
> >> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
> >> that
> >> "This patch was applied to v6.1, but was reverted due to a regression
> >> report.  However it turned out the regression was not due to this patch.
> >> I ping'ed Andrew to reapply this patch, Andrew may forget it.  This
> >> patch helps promote THP, so I rebased it onto the latest mm-unstable."
> >
> > IIRC, Huang Ying's analysis showed the regression for will-it-scale
> > micro benchmark is fine, it was actually reverted due to kernel build
> > regression with LLVM reported by Nathan Chancellor. Then the
> > regression was resolved by commit
> > 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
> > if page in deferred queue already"). And this patch did improve kernel
> > build with GCC by ~3% if I remember correctly.
> >
> >>
> >> however, unfortunately, in our latest tests, we still observed below regression
> >> upon this commit. just FYI.
> >>
> >>
> >>
> >> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
> >
> > Interesting, wasn't the same regression seen last time? And I'm a
> > little bit confused about how pthread got regressed. I didn't see the
> > pthread benchmark do any intensive memory alloc/free operations. Do
> > the pthread APIs do any intensive memory operations? I saw the
> > benchmark does allocate memory for thread stack, but it should be just
> > 8K per thread, so it should not trigger what this patch does. With
> > 1024 threads, the thread stacks may get merged into one single VMA (8M
> > total), but it may do so even though the patch is not applied.
> stress-ng.pthread test code is strange here:
>
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
>
> Even it allocates its own stack, but that attr is not passed
> to pthread_create. So it's still glibc to allocate stack for
> pthread which is 8M size. This is why this patch can impact
> the stress-ng.pthread testing.

Aha, nice catch, I overlooked that.

>
>
> My understanding is this is different regression (if it's a valid
> regression). The previous hotspot was in:
>     deferred_split_huge_page
>        deferred_split_huge_page
>           deferred_split_huge_page
>              spin_lock
>
> while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
>     - 55.02% zap_pmd_range.isra.0
>        - 53.42% __split_huge_pmd
>           - 51.74% _raw_spin_lock
>              - 51.73% native_queued_spin_lock_slowpath
>                 + 3.03% asm_sysvec_call_function
>           - 1.67% __split_huge_pmd_locked
>              - 0.87% pmdp_invalidate
>                 + 0.86% flush_tlb_mm_range
>        - 1.60% zap_pte_range
>           - 1.04% page_remove_rmap
>                0.55% __mod_lruvec_page_state
>
>
> >
> >>
> >>
> >> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
> >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>
> >> testcase: stress-ng
> >> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
> >> parameters:
> >>
> >>          nr_threads: 1
> >>          disk: 1HDD
> >>          testtime: 60s
> >>          fs: ext4
> >>          class: os
> >>          test: pthread
> >>          cpufreq_governor: performance
> >>
> >>
> >> In addition to that, the commit also has significant impact on the following tests:
> >>
> >> +------------------+-----------------------------------------------------------------------------------------------+
> >> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression                                         |
> >> | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory   |
> >> | test parameters  | array_size=50000000                                                                           |
> >> |                  | cpufreq_governor=performance                                                                  |
> >> |                  | iterations=10x                                                                                |
> >> |                  | loop=100                                                                                      |
> >> |                  | nr_threads=25%                                                                                |
> >> |                  | omp=true                                                                                      |
> >> +------------------+-----------------------------------------------------------------------------------------------+
> >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression       |
> >> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
> >> | test parameters  | cpufreq_governor=performance                                                                  |
> >> |                  | option_a=Average                                                                              |
> >> |                  | option_b=Integer                                                                              |
> >> |                  | test=ramspeed-1.4.3                                                                           |
> >> +------------------+-----------------------------------------------------------------------------------------------+
> >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
> >> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
> >> | test parameters  | cpufreq_governor=performance                                                                  |
> >> |                  | option_a=Average                                                                              |
> >> |                  | option_b=Floating Point                                                                       |
> >> |                  | test=ramspeed-1.4.3                                                                           |
> >> +------------------+-----------------------------------------------------------------------------------------------+
> >>
> >>
> >> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> >> the same patch/commit), kindly add following tags
> >> | Reported-by: kernel test robot <oliver.sang@intel.com>
> >> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
> >>
> >>
> >> Details are as below:
> >> -------------------------------------------------------------------------------------------------->
> >>
> >>
> >> The kernel config and materials to reproduce are available at:
> >> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
> >>
> >> =========================================================================================
> >> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> >>    os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
> >>
> >> commit:
> >>    30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
> >>    1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
> >>
> >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> >> ---------------- ---------------------------
> >>           %stddev     %change         %stddev
> >>               \          |                \
> >>    13405796           -65.5%    4620124        cpuidle..usage
> >>        8.00            +8.2%       8.66 ą  2%  iostat.cpu.system
> >>        1.61           -60.6%       0.63        iostat.cpu.user
> >>      597.50 ą 14%     -64.3%     213.50 ą 14%  perf-c2c.DRAM.local
> >>        1882 ą 14%     -74.7%     476.83 ą  7%  perf-c2c.HITM.local
> >>     3768436           -12.9%    3283395        vmstat.memory.cache
> >>      355105           -75.7%      86344 ą  3%  vmstat.system.cs
> >>      385435           -20.7%     305714 ą  3%  vmstat.system.in
> >>        1.13            -0.2        0.88        mpstat.cpu.all.irq%
> >>        0.29            -0.2        0.10 ą  2%  mpstat.cpu.all.soft%
> >>        6.76 ą  2%      +1.1        7.88 ą  2%  mpstat.cpu.all.sys%
> >>        1.62            -1.0        0.62 ą  2%  mpstat.cpu.all.usr%
> >>     2234397           -84.3%     350161 ą  5%  stress-ng.pthread.ops
> >>       37237           -84.3%       5834 ą  5%  stress-ng.pthread.ops_per_sec
> >>      294706 ą  2%     -68.0%      94191 ą  6%  stress-ng.time.involuntary_context_switches
> >>       41442 ą  2%   +5023.4%    2123284        stress-ng.time.maximum_resident_set_size
> >>     4466457           -83.9%     717053 ą  5%  stress-ng.time.minor_page_faults
> >
> > The larger RSS and fewer page faults are expected.
> >
> >>      243.33           +13.5%     276.17 ą  3%  stress-ng.time.percent_of_cpu_this_job_got
> >>      131.64           +27.7%     168.11 ą  3%  stress-ng.time.system_time
> >>       19.73           -82.1%       3.53 ą  4%  stress-ng.time.user_time
> >
> > Much less user time. And it seems to match the drop of the pthread metric.
> >
> >>     7715609           -80.2%    1530125 ą  4%  stress-ng.time.voluntary_context_switches
> >>       76728           -80.8%      14724 ą  4%  perf-stat.i.minor-faults
> >>     5600408           -61.4%    2160997 ą  5%  perf-stat.i.node-loads
> >>     8873996           +52.1%   13499744 ą  5%  perf-stat.i.node-stores
> >>      112409           -81.9%      20305 ą  4%  perf-stat.i.page-faults
> >>        2.55           +89.6%       4.83        perf-stat.overall.MPKI
> >
> > Much more TLB misses.
> >
> >>        1.51            -0.4        1.13        perf-stat.overall.branch-miss-rate%
> >>       19.26           +24.5       43.71        perf-stat.overall.cache-miss-rate%
> >>        1.70           +56.4%       2.65        perf-stat.overall.cpi
> >>      665.84           -17.5%     549.51 ą  2%  perf-stat.overall.cycles-between-cache-misses
> >>        0.12 ą  4%      -0.1        0.04        perf-stat.overall.dTLB-load-miss-rate%
> >>        0.08 ą  2%      -0.0        0.03        perf-stat.overall.dTLB-store-miss-rate%
> >>       59.16            +0.9       60.04        perf-stat.overall.iTLB-load-miss-rate%
> >>        1278           +86.1%       2379 ą  2%  perf-stat.overall.instructions-per-iTLB-miss
> >>        0.59           -36.1%       0.38        perf-stat.overall.ipc
> >
> > Worse IPC and CPI.
> >
> >>   2.078e+09           -48.3%  1.074e+09 ą  4%  perf-stat.ps.branch-instructions
> >>    31292687           -61.2%   12133349 ą  2%  perf-stat.ps.branch-misses
> >>    26057291            -5.9%   24512034 ą  4%  perf-stat.ps.cache-misses
> >>   1.353e+08           -58.6%   56072195 ą  4%  perf-stat.ps.cache-references
> >>      365254           -75.8%      88464 ą  3%  perf-stat.ps.context-switches
> >>   1.735e+10           -22.4%  1.346e+10 ą  2%  perf-stat.ps.cpu-cycles
> >>       60838           -79.1%      12727 ą  6%  perf-stat.ps.cpu-migrations
> >>     3056601 ą  4%     -81.5%     565354 ą  4%  perf-stat.ps.dTLB-load-misses
> >>   2.636e+09           -50.7%    1.3e+09 ą  4%  perf-stat.ps.dTLB-loads
> >>     1155253 ą  2%     -83.0%     196581 ą  5%  perf-stat.ps.dTLB-store-misses
> >>   1.473e+09           -57.4%  6.268e+08 ą  3%  perf-stat.ps.dTLB-stores
> >>     7997726           -73.3%    2131477 ą  3%  perf-stat.ps.iTLB-load-misses
> >>     5521346           -74.3%    1418623 ą  2%  perf-stat.ps.iTLB-loads
> >>   1.023e+10           -50.4%  5.073e+09 ą  4%  perf-stat.ps.instructions
> >>       75671           -80.9%      14479 ą  4%  perf-stat.ps.minor-faults
> >>     5549722           -61.4%    2141750 ą  4%  perf-stat.ps.node-loads
> >>     8769156           +51.6%   13296579 ą  5%  perf-stat.ps.node-stores
> >>      110795           -82.0%      19977 ą  4%  perf-stat.ps.page-faults
> >>   6.482e+11           -50.7%  3.197e+11 ą  4%  perf-stat.total.instructions
> >>        0.00 ą 37%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
> >>        0.01 ą 18%   +8373.1%       0.73 ą 49%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
> >>        0.01 ą 16%   +4600.0%       0.38 ą 24%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
> >
> > More time spent in madvise and munmap. but I'm not sure whether this
> > is caused by tearing down the address space when exiting the test. If
> > so it should not count in the regression.
> It's not for the whole address space tearing down. It's for pthread
> stack tearing down when pthread exit (can be treated as address space
> tearing down? I suppose so).
>
> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
> https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576

It explains the problem. The madvise() does have some extra overhead
for handling THP (splitting pmd, deferred split queue, etc).

>
> Another thing is whether it's worthy to make stack use THP? It may be
> useful for some apps which need large stack size?

Kernel actually doesn't apply THP to stack (see
vma_is_temporary_stack()). But kernel can't know whether the VMA is
stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc
doesn't set the proper flags to tell kernel the area is stack, kernel
just treats it as normal anonymous area. So glibc should set up stack
properly IMHO.

>
>
> Regards
> Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-20 15:42     ` Christoph Lameter (Ampere)
@ 2023-12-20 20:14       ` Yang Shi
  0 siblings, 0 replies; 24+ messages in thread
From: Yang Shi @ 2023-12-20 20:14 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: Yin Fengwei, kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	ying.huang, feng.tang

On Wed, Dec 20, 2023 at 7:42 AM Christoph Lameter (Ampere) <cl@linux.com> wrote:
>
> On Wed, 20 Dec 2023, Yin Fengwei wrote:
>
> >> Interesting, wasn't the same regression seen last time? And I'm a
> >> little bit confused about how pthread got regressed. I didn't see the
> >> pthread benchmark do any intensive memory alloc/free operations. Do
> >> the pthread APIs do any intensive memory operations? I saw the
> >> benchmark does allocate memory for thread stack, but it should be just
> >> 8K per thread, so it should not trigger what this patch does. With
> >> 1024 threads, the thread stacks may get merged into one single VMA (8M
> >> total), but it may do so even though the patch is not applied.
> > stress-ng.pthread test code is strange here:
> >
> > https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
> >
> > Even it allocates its own stack, but that attr is not passed
> > to pthread_create. So it's still glibc to allocate stack for
> > pthread which is 8M size. This is why this patch can impact
> > the stress-ng.pthread testing.
>
> Hmmm... The use of calloc()  for 8M triggers an mmap I guess.
>
> Why is that memory slower if we align the adress to a 2M boundary? Because
> THP can act faster and creates more overhead?

glibc calls madvise() to free unused stack, that may have higher cost
due to THP (splitting pmd, deferred split queue, etc).

>
> > while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
> >   - 55.02% zap_pmd_range.isra.0
> >      - 53.42% __split_huge_pmd
> >         - 51.74% _raw_spin_lock
> >            - 51.73% native_queued_spin_lock_slowpath
> >               + 3.03% asm_sysvec_call_function
> >         - 1.67% __split_huge_pmd_locked
> >            - 0.87% pmdp_invalidate
> >               + 0.86% flush_tlb_mm_range
> >      - 1.60% zap_pte_range
> >         - 1.04% page_remove_rmap
> >              0.55% __mod_lruvec_page_state
>
> Ok so we have 2M mappings and they are split because of some action on 4K
> segments? Guess because of the guard pages?

It should not relate to guard pages, just due to free unused stack
which may be partial 2M.

>
> >> More time spent in madvise and munmap. but I'm not sure whether this
> >> is caused by tearing down the address space when exiting the test. If
> >> so it should not count in the regression.
> > It's not for the whole address space tearing down. It's for pthread
> > stack tearing down when pthread exit (can be treated as address space
> > tearing down? I suppose so).
> >
> > https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
> > https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
> >
> > Another thing is whether it's worthy to make stack use THP? It may be
> > useful for some apps which need large stack size?
>
> No can do since a calloc is used to allocate the stack. How can the kernel
> distinguish the allocation?

Just by VM_GROWSDOWN | VM_GROWSUP. The user space needs to tell kernel
this area is stack by setting proper flags. For example,

ffffca1df000-ffffca200000 rw-p 00000000 00:00 0                          [stack]
Size:                132 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  60 kB
Pss:                  60 kB
Pss_Dirty:            60 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        60 kB
Referenced:           60 kB
Anonymous:            60 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd wr mr mw me gd ac

The "gd" flag means GROWSDOWN. But it totally depends on glibc in
terms of how it considers about "stack". So glibc just uses calloc()
to allocate stack area.

>
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-20 20:09     ` Yang Shi
@ 2023-12-21  0:26       ` Yang Shi
  2023-12-21  0:58         ` Yin Fengwei
  0 siblings, 1 reply; 24+ messages in thread
From: Yang Shi @ 2023-12-21  0:26 UTC (permalink / raw)
  To: Yin Fengwei
  Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang

On Wed, Dec 20, 2023 at 12:09 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Wed, Dec 20, 2023 at 12:34 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
> >
> >
> >
> > On 2023/12/20 13:27, Yang Shi wrote:
> > > On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote:
> > >>
> > >>
> > >>
> > >> Hello,
> > >>
> > >> for this commit, we reported
> > >> "[mm]  96db82a66d:  will-it-scale.per_process_ops -95.3% regression"
> > >> in Aug, 2022 when it's in linux-next/master
> > >> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
> > >>
> > >> later, we reported
> > >> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
> > >> in Oct, 2022 when it's in linus/master
> > >> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
> > >>
> > >> and the commit was reverted finally by
> > >> commit 0ba09b1733878afe838fe35c310715fda3d46428
> > >> Author: Linus Torvalds <torvalds@linux-foundation.org>
> > >> Date:   Sun Dec 4 12:51:59 2022 -0800
> > >>
> > >> now we noticed it goes into linux-next/master again.
> > >>
> > >> we are not sure if there is an agreement that the benefit of this commit
> > >> has already overweight performance drop in some mirco benchmark.
> > >>
> > >> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
> > >> that
> > >> "This patch was applied to v6.1, but was reverted due to a regression
> > >> report.  However it turned out the regression was not due to this patch.
> > >> I ping'ed Andrew to reapply this patch, Andrew may forget it.  This
> > >> patch helps promote THP, so I rebased it onto the latest mm-unstable."
> > >
> > > IIRC, Huang Ying's analysis showed the regression for will-it-scale
> > > micro benchmark is fine, it was actually reverted due to kernel build
> > > regression with LLVM reported by Nathan Chancellor. Then the
> > > regression was resolved by commit
> > > 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
> > > if page in deferred queue already"). And this patch did improve kernel
> > > build with GCC by ~3% if I remember correctly.
> > >
> > >>
> > >> however, unfortunately, in our latest tests, we still observed below regression
> > >> upon this commit. just FYI.
> > >>
> > >>
> > >>
> > >> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
> > >
> > > Interesting, wasn't the same regression seen last time? And I'm a
> > > little bit confused about how pthread got regressed. I didn't see the
> > > pthread benchmark do any intensive memory alloc/free operations. Do
> > > the pthread APIs do any intensive memory operations? I saw the
> > > benchmark does allocate memory for thread stack, but it should be just
> > > 8K per thread, so it should not trigger what this patch does. With
> > > 1024 threads, the thread stacks may get merged into one single VMA (8M
> > > total), but it may do so even though the patch is not applied.
> > stress-ng.pthread test code is strange here:
> >
> > https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
> >
> > Even it allocates its own stack, but that attr is not passed
> > to pthread_create. So it's still glibc to allocate stack for
> > pthread which is 8M size. This is why this patch can impact
> > the stress-ng.pthread testing.
>
> Aha, nice catch, I overlooked that.
>
> >
> >
> > My understanding is this is different regression (if it's a valid
> > regression). The previous hotspot was in:
> >     deferred_split_huge_page
> >        deferred_split_huge_page
> >           deferred_split_huge_page
> >              spin_lock
> >
> > while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
> >     - 55.02% zap_pmd_range.isra.0
> >        - 53.42% __split_huge_pmd
> >           - 51.74% _raw_spin_lock
> >              - 51.73% native_queued_spin_lock_slowpath
> >                 + 3.03% asm_sysvec_call_function
> >           - 1.67% __split_huge_pmd_locked
> >              - 0.87% pmdp_invalidate
> >                 + 0.86% flush_tlb_mm_range
> >        - 1.60% zap_pte_range
> >           - 1.04% page_remove_rmap
> >                0.55% __mod_lruvec_page_state
> >
> >
> > >
> > >>
> > >>
> > >> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
> > >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >>
> > >> testcase: stress-ng
> > >> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
> > >> parameters:
> > >>
> > >>          nr_threads: 1
> > >>          disk: 1HDD
> > >>          testtime: 60s
> > >>          fs: ext4
> > >>          class: os
> > >>          test: pthread
> > >>          cpufreq_governor: performance
> > >>
> > >>
> > >> In addition to that, the commit also has significant impact on the following tests:
> > >>
> > >> +------------------+-----------------------------------------------------------------------------------------------+
> > >> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression                                         |
> > >> | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory   |
> > >> | test parameters  | array_size=50000000                                                                           |
> > >> |                  | cpufreq_governor=performance                                                                  |
> > >> |                  | iterations=10x                                                                                |
> > >> |                  | loop=100                                                                                      |
> > >> |                  | nr_threads=25%                                                                                |
> > >> |                  | omp=true                                                                                      |
> > >> +------------------+-----------------------------------------------------------------------------------------------+
> > >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression       |
> > >> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
> > >> | test parameters  | cpufreq_governor=performance                                                                  |
> > >> |                  | option_a=Average                                                                              |
> > >> |                  | option_b=Integer                                                                              |
> > >> |                  | test=ramspeed-1.4.3                                                                           |
> > >> +------------------+-----------------------------------------------------------------------------------------------+
> > >> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
> > >> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
> > >> | test parameters  | cpufreq_governor=performance                                                                  |
> > >> |                  | option_a=Average                                                                              |
> > >> |                  | option_b=Floating Point                                                                       |
> > >> |                  | test=ramspeed-1.4.3                                                                           |
> > >> +------------------+-----------------------------------------------------------------------------------------------+
> > >>
> > >>
> > >> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > >> the same patch/commit), kindly add following tags
> > >> | Reported-by: kernel test robot <oliver.sang@intel.com>
> > >> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
> > >>
> > >>
> > >> Details are as below:
> > >> -------------------------------------------------------------------------------------------------->
> > >>
> > >>
> > >> The kernel config and materials to reproduce are available at:
> > >> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
> > >>
> > >> =========================================================================================
> > >> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> > >>    os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
> > >>
> > >> commit:
> > >>    30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
> > >>    1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
> > >>
> > >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
> > >> ---------------- ---------------------------
> > >>           %stddev     %change         %stddev
> > >>               \          |                \
> > >>    13405796           -65.5%    4620124        cpuidle..usage
> > >>        8.00            +8.2%       8.66 ą  2%  iostat.cpu.system
> > >>        1.61           -60.6%       0.63        iostat.cpu.user
> > >>      597.50 ą 14%     -64.3%     213.50 ą 14%  perf-c2c.DRAM.local
> > >>        1882 ą 14%     -74.7%     476.83 ą  7%  perf-c2c.HITM.local
> > >>     3768436           -12.9%    3283395        vmstat.memory.cache
> > >>      355105           -75.7%      86344 ą  3%  vmstat.system.cs
> > >>      385435           -20.7%     305714 ą  3%  vmstat.system.in
> > >>        1.13            -0.2        0.88        mpstat.cpu.all.irq%
> > >>        0.29            -0.2        0.10 ą  2%  mpstat.cpu.all.soft%
> > >>        6.76 ą  2%      +1.1        7.88 ą  2%  mpstat.cpu.all.sys%
> > >>        1.62            -1.0        0.62 ą  2%  mpstat.cpu.all.usr%
> > >>     2234397           -84.3%     350161 ą  5%  stress-ng.pthread.ops
> > >>       37237           -84.3%       5834 ą  5%  stress-ng.pthread.ops_per_sec
> > >>      294706 ą  2%     -68.0%      94191 ą  6%  stress-ng.time.involuntary_context_switches
> > >>       41442 ą  2%   +5023.4%    2123284        stress-ng.time.maximum_resident_set_size
> > >>     4466457           -83.9%     717053 ą  5%  stress-ng.time.minor_page_faults
> > >
> > > The larger RSS and fewer page faults are expected.
> > >
> > >>      243.33           +13.5%     276.17 ą  3%  stress-ng.time.percent_of_cpu_this_job_got
> > >>      131.64           +27.7%     168.11 ą  3%  stress-ng.time.system_time
> > >>       19.73           -82.1%       3.53 ą  4%  stress-ng.time.user_time
> > >
> > > Much less user time. And it seems to match the drop of the pthread metric.
> > >
> > >>     7715609           -80.2%    1530125 ą  4%  stress-ng.time.voluntary_context_switches
> > >>       76728           -80.8%      14724 ą  4%  perf-stat.i.minor-faults
> > >>     5600408           -61.4%    2160997 ą  5%  perf-stat.i.node-loads
> > >>     8873996           +52.1%   13499744 ą  5%  perf-stat.i.node-stores
> > >>      112409           -81.9%      20305 ą  4%  perf-stat.i.page-faults
> > >>        2.55           +89.6%       4.83        perf-stat.overall.MPKI
> > >
> > > Much more TLB misses.
> > >
> > >>        1.51            -0.4        1.13        perf-stat.overall.branch-miss-rate%
> > >>       19.26           +24.5       43.71        perf-stat.overall.cache-miss-rate%
> > >>        1.70           +56.4%       2.65        perf-stat.overall.cpi
> > >>      665.84           -17.5%     549.51 ą  2%  perf-stat.overall.cycles-between-cache-misses
> > >>        0.12 ą  4%      -0.1        0.04        perf-stat.overall.dTLB-load-miss-rate%
> > >>        0.08 ą  2%      -0.0        0.03        perf-stat.overall.dTLB-store-miss-rate%
> > >>       59.16            +0.9       60.04        perf-stat.overall.iTLB-load-miss-rate%
> > >>        1278           +86.1%       2379 ą  2%  perf-stat.overall.instructions-per-iTLB-miss
> > >>        0.59           -36.1%       0.38        perf-stat.overall.ipc
> > >
> > > Worse IPC and CPI.
> > >
> > >>   2.078e+09           -48.3%  1.074e+09 ą  4%  perf-stat.ps.branch-instructions
> > >>    31292687           -61.2%   12133349 ą  2%  perf-stat.ps.branch-misses
> > >>    26057291            -5.9%   24512034 ą  4%  perf-stat.ps.cache-misses
> > >>   1.353e+08           -58.6%   56072195 ą  4%  perf-stat.ps.cache-references
> > >>      365254           -75.8%      88464 ą  3%  perf-stat.ps.context-switches
> > >>   1.735e+10           -22.4%  1.346e+10 ą  2%  perf-stat.ps.cpu-cycles
> > >>       60838           -79.1%      12727 ą  6%  perf-stat.ps.cpu-migrations
> > >>     3056601 ą  4%     -81.5%     565354 ą  4%  perf-stat.ps.dTLB-load-misses
> > >>   2.636e+09           -50.7%    1.3e+09 ą  4%  perf-stat.ps.dTLB-loads
> > >>     1155253 ą  2%     -83.0%     196581 ą  5%  perf-stat.ps.dTLB-store-misses
> > >>   1.473e+09           -57.4%  6.268e+08 ą  3%  perf-stat.ps.dTLB-stores
> > >>     7997726           -73.3%    2131477 ą  3%  perf-stat.ps.iTLB-load-misses
> > >>     5521346           -74.3%    1418623 ą  2%  perf-stat.ps.iTLB-loads
> > >>   1.023e+10           -50.4%  5.073e+09 ą  4%  perf-stat.ps.instructions
> > >>       75671           -80.9%      14479 ą  4%  perf-stat.ps.minor-faults
> > >>     5549722           -61.4%    2141750 ą  4%  perf-stat.ps.node-loads
> > >>     8769156           +51.6%   13296579 ą  5%  perf-stat.ps.node-stores
> > >>      110795           -82.0%      19977 ą  4%  perf-stat.ps.page-faults
> > >>   6.482e+11           -50.7%  3.197e+11 ą  4%  perf-stat.total.instructions
> > >>        0.00 ą 37%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
> > >>        0.01 ą 18%   +8373.1%       0.73 ą 49%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
> > >>        0.01 ą 16%   +4600.0%       0.38 ą 24%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
> > >
> > > More time spent in madvise and munmap. but I'm not sure whether this
> > > is caused by tearing down the address space when exiting the test. If
> > > so it should not count in the regression.
> > It's not for the whole address space tearing down. It's for pthread
> > stack tearing down when pthread exit (can be treated as address space
> > tearing down? I suppose so).
> >
> > https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
> > https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
>
> It explains the problem. The madvise() does have some extra overhead
> for handling THP (splitting pmd, deferred split queue, etc).
>
> >
> > Another thing is whether it's worthy to make stack use THP? It may be
> > useful for some apps which need large stack size?
>
> Kernel actually doesn't apply THP to stack (see
> vma_is_temporary_stack()). But kernel can't know whether the VMA is
> stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc
> doesn't set the proper flags to tell kernel the area is stack, kernel
> just treats it as normal anonymous area. So glibc should set up stack
> properly IMHO.

If I read the code correctly, nptl allocates stack by the below code:

mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE,
                        MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);

See https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L563

The MAP_STACK is used, but it is a no-op on Linux. So the alternative
is to make MAP_STACK useful on Linux instead of changing glibc. But
the blast radius seems much wider.

>
> >
> >
> > Regards
> > Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-21  0:26       ` Yang Shi
@ 2023-12-21  0:58         ` Yin Fengwei
  2023-12-21  1:02           ` Yin Fengwei
                             ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-12-21  0:58 UTC (permalink / raw)
  To: Yang Shi
  Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang



On 2023/12/21 08:26, Yang Shi wrote:
> On Wed, Dec 20, 2023 at 12:09 PM Yang Shi <shy828301@gmail.com> wrote:
>>
>> On Wed, Dec 20, 2023 at 12:34 AM Yin Fengwei <fengwei.yin@intel.com> wrote:
>>>
>>>
>>>
>>> On 2023/12/20 13:27, Yang Shi wrote:
>>>> On Tue, Dec 19, 2023 at 7:41 AM kernel test robot <oliver.sang@intel.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> for this commit, we reported
>>>>> "[mm]  96db82a66d:  will-it-scale.per_process_ops -95.3% regression"
>>>>> in Aug, 2022 when it's in linux-next/master
>>>>> https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/
>>>>>
>>>>> later, we reported
>>>>> "[mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression"
>>>>> in Oct, 2022 when it's in linus/master
>>>>> https://lore.kernel.org/all/202210181535.7144dd15-yujie.liu@intel.com/
>>>>>
>>>>> and the commit was reverted finally by
>>>>> commit 0ba09b1733878afe838fe35c310715fda3d46428
>>>>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>>>>> Date:   Sun Dec 4 12:51:59 2022 -0800
>>>>>
>>>>> now we noticed it goes into linux-next/master again.
>>>>>
>>>>> we are not sure if there is an agreement that the benefit of this commit
>>>>> has already overweight performance drop in some mirco benchmark.
>>>>>
>>>>> we also noticed from https://lore.kernel.org/all/20231214223423.1133074-1-yang@os.amperecomputing.com/
>>>>> that
>>>>> "This patch was applied to v6.1, but was reverted due to a regression
>>>>> report.  However it turned out the regression was not due to this patch.
>>>>> I ping'ed Andrew to reapply this patch, Andrew may forget it.  This
>>>>> patch helps promote THP, so I rebased it onto the latest mm-unstable."
>>>>
>>>> IIRC, Huang Ying's analysis showed the regression for will-it-scale
>>>> micro benchmark is fine, it was actually reverted due to kernel build
>>>> regression with LLVM reported by Nathan Chancellor. Then the
>>>> regression was resolved by commit
>>>> 81e506bec9be1eceaf5a2c654e28ba5176ef48d8 ("mm/thp: check and bail out
>>>> if page in deferred queue already"). And this patch did improve kernel
>>>> build with GCC by ~3% if I remember correctly.
>>>>
>>>>>
>>>>> however, unfortunately, in our latest tests, we still observed below regression
>>>>> upon this commit. just FYI.
>>>>>
>>>>>
>>>>>
>>>>> kernel test robot noticed a -84.3% regression of stress-ng.pthread.ops_per_sec on:
>>>>
>>>> Interesting, wasn't the same regression seen last time? And I'm a
>>>> little bit confused about how pthread got regressed. I didn't see the
>>>> pthread benchmark do any intensive memory alloc/free operations. Do
>>>> the pthread APIs do any intensive memory operations? I saw the
>>>> benchmark does allocate memory for thread stack, but it should be just
>>>> 8K per thread, so it should not trigger what this patch does. With
>>>> 1024 threads, the thread stacks may get merged into one single VMA (8M
>>>> total), but it may do so even though the patch is not applied.
>>> stress-ng.pthread test code is strange here:
>>>
>>> https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
>>>
>>> Even it allocates its own stack, but that attr is not passed
>>> to pthread_create. So it's still glibc to allocate stack for
>>> pthread which is 8M size. This is why this patch can impact
>>> the stress-ng.pthread testing.
>>
>> Aha, nice catch, I overlooked that.
>>
>>>
>>>
>>> My understanding is this is different regression (if it's a valid
>>> regression). The previous hotspot was in:
>>>      deferred_split_huge_page
>>>         deferred_split_huge_page
>>>            deferred_split_huge_page
>>>               spin_lock
>>>
>>> while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
>>>      - 55.02% zap_pmd_range.isra.0
>>>         - 53.42% __split_huge_pmd
>>>            - 51.74% _raw_spin_lock
>>>               - 51.73% native_queued_spin_lock_slowpath
>>>                  + 3.03% asm_sysvec_call_function
>>>            - 1.67% __split_huge_pmd_locked
>>>               - 0.87% pmdp_invalidate
>>>                  + 0.86% flush_tlb_mm_range
>>>         - 1.60% zap_pte_range
>>>            - 1.04% page_remove_rmap
>>>                 0.55% __mod_lruvec_page_state
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>> commit: 1111d46b5cbad57486e7a3fab75888accac2f072 ("mm: align larger anonymous mappings on THP boundaries")
>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>>
>>>>> testcase: stress-ng
>>>>> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory
>>>>> parameters:
>>>>>
>>>>>           nr_threads: 1
>>>>>           disk: 1HDD
>>>>>           testtime: 60s
>>>>>           fs: ext4
>>>>>           class: os
>>>>>           test: pthread
>>>>>           cpufreq_governor: performance
>>>>>
>>>>>
>>>>> In addition to that, the commit also has significant impact on the following tests:
>>>>>
>>>>> +------------------+-----------------------------------------------------------------------------------------------+
>>>>> | testcase: change | stream: stream.triad_bandwidth_MBps -12.1% regression                                         |
>>>>> | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory   |
>>>>> | test parameters  | array_size=50000000                                                                           |
>>>>> |                  | cpufreq_governor=performance                                                                  |
>>>>> |                  | iterations=10x                                                                                |
>>>>> |                  | loop=100                                                                                      |
>>>>> |                  | nr_threads=25%                                                                                |
>>>>> |                  | omp=true                                                                                      |
>>>>> +------------------+-----------------------------------------------------------------------------------------------+
>>>>> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.Integer.mb_s -3.5% regression       |
>>>>> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
>>>>> | test parameters  | cpufreq_governor=performance                                                                  |
>>>>> |                  | option_a=Average                                                                              |
>>>>> |                  | option_b=Integer                                                                              |
>>>>> |                  | test=ramspeed-1.4.3                                                                           |
>>>>> +------------------+-----------------------------------------------------------------------------------------------+
>>>>> | testcase: change | phoronix-test-suite: phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s -3.0% regression |
>>>>> | test machine     | 12 threads 1 sockets Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with 16G memory    |
>>>>> | test parameters  | cpufreq_governor=performance                                                                  |
>>>>> |                  | option_a=Average                                                                              |
>>>>> |                  | option_b=Floating Point                                                                       |
>>>>> |                  | test=ramspeed-1.4.3                                                                           |
>>>>> +------------------+-----------------------------------------------------------------------------------------------+
>>>>>
>>>>>
>>>>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>>>>> the same patch/commit), kindly add following tags
>>>>> | Reported-by: kernel test robot <oliver.sang@intel.com>
>>>>> | Closes: https://lore.kernel.org/oe-lkp/202312192310.56367035-oliver.sang@intel.com
>>>>>
>>>>>
>>>>> Details are as below:
>>>>> -------------------------------------------------------------------------------------------------->
>>>>>
>>>>>
>>>>> The kernel config and materials to reproduce are available at:
>>>>> https://download.01.org/0day-ci/archive/20231219/202312192310.56367035-oliver.sang@intel.com
>>>>>
>>>>> =========================================================================================
>>>>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>>>>>     os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
>>>>>
>>>>> commit:
>>>>>     30749e6fbb ("mm/memory: replace kmap() with kmap_local_page()")
>>>>>     1111d46b5c ("mm: align larger anonymous mappings on THP boundaries")
>>>>>
>>>>> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75
>>>>> ---------------- ---------------------------
>>>>>            %stddev     %change         %stddev
>>>>>                \          |                \
>>>>>     13405796           -65.5%    4620124        cpuidle..usage
>>>>>         8.00            +8.2%       8.66 ą  2%  iostat.cpu.system
>>>>>         1.61           -60.6%       0.63        iostat.cpu.user
>>>>>       597.50 ą 14%     -64.3%     213.50 ą 14%  perf-c2c.DRAM.local
>>>>>         1882 ą 14%     -74.7%     476.83 ą  7%  perf-c2c.HITM.local
>>>>>      3768436           -12.9%    3283395        vmstat.memory.cache
>>>>>       355105           -75.7%      86344 ą  3%  vmstat.system.cs
>>>>>       385435           -20.7%     305714 ą  3%  vmstat.system.in
>>>>>         1.13            -0.2        0.88        mpstat.cpu.all.irq%
>>>>>         0.29            -0.2        0.10 ą  2%  mpstat.cpu.all.soft%
>>>>>         6.76 ą  2%      +1.1        7.88 ą  2%  mpstat.cpu.all.sys%
>>>>>         1.62            -1.0        0.62 ą  2%  mpstat.cpu.all.usr%
>>>>>      2234397           -84.3%     350161 ą  5%  stress-ng.pthread.ops
>>>>>        37237           -84.3%       5834 ą  5%  stress-ng.pthread.ops_per_sec
>>>>>       294706 ą  2%     -68.0%      94191 ą  6%  stress-ng.time.involuntary_context_switches
>>>>>        41442 ą  2%   +5023.4%    2123284        stress-ng.time.maximum_resident_set_size
>>>>>      4466457           -83.9%     717053 ą  5%  stress-ng.time.minor_page_faults
>>>>
>>>> The larger RSS and fewer page faults are expected.
>>>>
>>>>>       243.33           +13.5%     276.17 ą  3%  stress-ng.time.percent_of_cpu_this_job_got
>>>>>       131.64           +27.7%     168.11 ą  3%  stress-ng.time.system_time
>>>>>        19.73           -82.1%       3.53 ą  4%  stress-ng.time.user_time
>>>>
>>>> Much less user time. And it seems to match the drop of the pthread metric.
>>>>
>>>>>      7715609           -80.2%    1530125 ą  4%  stress-ng.time.voluntary_context_switches
>>>>>        76728           -80.8%      14724 ą  4%  perf-stat.i.minor-faults
>>>>>      5600408           -61.4%    2160997 ą  5%  perf-stat.i.node-loads
>>>>>      8873996           +52.1%   13499744 ą  5%  perf-stat.i.node-stores
>>>>>       112409           -81.9%      20305 ą  4%  perf-stat.i.page-faults
>>>>>         2.55           +89.6%       4.83        perf-stat.overall.MPKI
>>>>
>>>> Much more TLB misses.
>>>>
>>>>>         1.51            -0.4        1.13        perf-stat.overall.branch-miss-rate%
>>>>>        19.26           +24.5       43.71        perf-stat.overall.cache-miss-rate%
>>>>>         1.70           +56.4%       2.65        perf-stat.overall.cpi
>>>>>       665.84           -17.5%     549.51 ą  2%  perf-stat.overall.cycles-between-cache-misses
>>>>>         0.12 ą  4%      -0.1        0.04        perf-stat.overall.dTLB-load-miss-rate%
>>>>>         0.08 ą  2%      -0.0        0.03        perf-stat.overall.dTLB-store-miss-rate%
>>>>>        59.16            +0.9       60.04        perf-stat.overall.iTLB-load-miss-rate%
>>>>>         1278           +86.1%       2379 ą  2%  perf-stat.overall.instructions-per-iTLB-miss
>>>>>         0.59           -36.1%       0.38        perf-stat.overall.ipc
>>>>
>>>> Worse IPC and CPI.
>>>>
>>>>>    2.078e+09           -48.3%  1.074e+09 ą  4%  perf-stat.ps.branch-instructions
>>>>>     31292687           -61.2%   12133349 ą  2%  perf-stat.ps.branch-misses
>>>>>     26057291            -5.9%   24512034 ą  4%  perf-stat.ps.cache-misses
>>>>>    1.353e+08           -58.6%   56072195 ą  4%  perf-stat.ps.cache-references
>>>>>       365254           -75.8%      88464 ą  3%  perf-stat.ps.context-switches
>>>>>    1.735e+10           -22.4%  1.346e+10 ą  2%  perf-stat.ps.cpu-cycles
>>>>>        60838           -79.1%      12727 ą  6%  perf-stat.ps.cpu-migrations
>>>>>      3056601 ą  4%     -81.5%     565354 ą  4%  perf-stat.ps.dTLB-load-misses
>>>>>    2.636e+09           -50.7%    1.3e+09 ą  4%  perf-stat.ps.dTLB-loads
>>>>>      1155253 ą  2%     -83.0%     196581 ą  5%  perf-stat.ps.dTLB-store-misses
>>>>>    1.473e+09           -57.4%  6.268e+08 ą  3%  perf-stat.ps.dTLB-stores
>>>>>      7997726           -73.3%    2131477 ą  3%  perf-stat.ps.iTLB-load-misses
>>>>>      5521346           -74.3%    1418623 ą  2%  perf-stat.ps.iTLB-loads
>>>>>    1.023e+10           -50.4%  5.073e+09 ą  4%  perf-stat.ps.instructions
>>>>>        75671           -80.9%      14479 ą  4%  perf-stat.ps.minor-faults
>>>>>      5549722           -61.4%    2141750 ą  4%  perf-stat.ps.node-loads
>>>>>      8769156           +51.6%   13296579 ą  5%  perf-stat.ps.node-stores
>>>>>       110795           -82.0%      19977 ą  4%  perf-stat.ps.page-faults
>>>>>    6.482e+11           -50.7%  3.197e+11 ą  4%  perf-stat.total.instructions
>>>>>         0.00 ą 37%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__kmem_cache_alloc_node.__kmalloc_node.memcg_alloc_slab_cgroups.allocate_slab
>>>>>         0.01 ą 18%   +8373.1%       0.73 ą 49%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.do_madvise.__x64_sys_madvise.do_syscall_64
>>>>>         0.01 ą 16%   +4600.0%       0.38 ą 24%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.exit_mm.do_exit.__x64_sys_exit
>>>>
>>>> More time spent in madvise and munmap. but I'm not sure whether this
>>>> is caused by tearing down the address space when exiting the test. If
>>>> so it should not count in the regression.
>>> It's not for the whole address space tearing down. It's for pthread
>>> stack tearing down when pthread exit (can be treated as address space
>>> tearing down? I suppose so).
>>>
>>> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
>>> https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
>>
>> It explains the problem. The madvise() does have some extra overhead
>> for handling THP (splitting pmd, deferred split queue, etc).
>>
>>>
>>> Another thing is whether it's worthy to make stack use THP? It may be
>>> useful for some apps which need large stack size?
>>
>> Kernel actually doesn't apply THP to stack (see
>> vma_is_temporary_stack()). But kernel can't know whether the VMA is
>> stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc
>> doesn't set the proper flags to tell kernel the area is stack, kernel
>> just treats it as normal anonymous area. So glibc should set up stack
>> properly IMHO.
> 
> If I read the code correctly, nptl allocates stack by the below code:
> 
> mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE,
>                          MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
> 
> See https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L563
> 
> The MAP_STACK is used, but it is a no-op on Linux. So the alternative
> is to make MAP_STACK useful on Linux instead of changing glibc. But
> the blast radius seems much wider.
Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
filter out of the MAP_STACK mapping based on this patch. The regression
in stress-ng.pthread was gone. I suppose this is kind of safe because
the madvise call is only applied to glibc allocated stack.


But what I am not sure was whether it's worthy to do such kind of change
as the regression only is seen obviously in micro-benchmark. No evidence
showed the other regressionsin this report is related with madvise. At
least from the perf statstics. Need to check more on stream/ramspeed. 
Thanks.


Regards
Yin, Fengwei

> 
>>
>>>
>>>
>>> Regards
>>> Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-21  0:58         ` Yin Fengwei
@ 2023-12-21  1:02           ` Yin Fengwei
  2023-12-21  4:49           ` Matthew Wilcox
  2023-12-21 13:39           ` Yin, Fengwei
  2 siblings, 0 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-12-21  1:02 UTC (permalink / raw)
  To: Yang Shi
  Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang


>>>>
>>>> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
>>>> https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
>>>
>>> It explains the problem. The madvise() does have some extra overhead
>>> for handling THP (splitting pmd, deferred split queue, etc).
>>>
>>>>
>>>> Another thing is whether it's worthy to make stack use THP? It may be
>>>> useful for some apps which need large stack size?
>>>
>>> Kernel actually doesn't apply THP to stack (see
>>> vma_is_temporary_stack()). But kernel can't know whether the VMA is
>>> stack or not by checking VM_GROWSDOWN | VM_GROWSUP flags. So if glibc
>>> doesn't set the proper flags to tell kernel the area is stack, kernel
>>> just treats it as normal anonymous area. So glibc should set up stack
>>> properly IMHO.
>>
>> If I read the code correctly, nptl allocates stack by the below code:
>>
>> mem = __mmap (NULL, size, (guardsize == 0) ? prot : PROT_NONE,
>>                          MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
>>
>> See 
>> https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L563
>>
>> The MAP_STACK is used, but it is a no-op on Linux. So the alternative
>> is to make MAP_STACK useful on Linux instead of changing glibc. But
>> the blast radius seems much wider.
> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
> filter out of the MAP_STACK mapping based on this patch. The regression
> in stress-ng.pthread was gone. I suppose this is kind of safe because
> the madvise call is only applied to glibc allocated stack.

The patch I tested against stress-ng.pthread:

diff --git a/mm/mmap.c b/mm/mmap.c
index b78e83d351d2..1fd510aef82e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1829,7 +1829,8 @@ get_unmapped_area(struct file *file, unsigned long 
addr, unsigned long len,
                  */
                 pgoff = 0;
                 get_area = shmem_get_unmapped_area;
-       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+       } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+                               !(flags & MAP_STACK)) {
                 /* Ensures that larger anonymous mappings are THP 
aligned. */
                 get_area = thp_get_unmapped_area;
         }


> 
> 
> But what I am not sure was whether it's worthy to do such kind of change
> as the regression only is seen obviously in micro-benchmark. No evidence
> showed the other regressionsin this report is related with madvise. At
> least from the perf statstics. Need to check more on stream/ramspeed. 
> Thanks.
> 
> 
> Regards
> Yin, Fengwei
> 
>>
>>>
>>>>
>>>>
>>>> Regards
>>>> Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-21  0:58         ` Yin Fengwei
  2023-12-21  1:02           ` Yin Fengwei
@ 2023-12-21  4:49           ` Matthew Wilcox
  2023-12-21  4:58             ` Yin Fengwei
  2023-12-21 18:07             ` Yang Shi
  2023-12-21 13:39           ` Yin, Fengwei
  2 siblings, 2 replies; 24+ messages in thread
From: Matthew Wilcox @ 2023-12-21  4:49 UTC (permalink / raw)
  To: Yin Fengwei
  Cc: Yang Shi, kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Christopher Lameter,
	ying.huang, feng.tang

On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
> filter out of the MAP_STACK mapping based on this patch. The regression
> in stress-ng.pthread was gone. I suppose this is kind of safe because
> the madvise call is only applied to glibc allocated stack.
> 
> 
> But what I am not sure was whether it's worthy to do such kind of change
> as the regression only is seen obviously in micro-benchmark. No evidence
> showed the other regressionsin this report is related with madvise. At
> least from the perf statstics. Need to check more on stream/ramspeed.

FWIW, we had a customer report a significant performance problem when
inadvertently using 2MB pages for stacks.  They were able to avoid it by
using 2044KiB sized stacks ...


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-21  4:49           ` Matthew Wilcox
@ 2023-12-21  4:58             ` Yin Fengwei
  2023-12-21 18:07             ` Yang Shi
  1 sibling, 0 replies; 24+ messages in thread
From: Yin Fengwei @ 2023-12-21  4:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yang Shi, kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Christopher Lameter,
	ying.huang, feng.tang



On 2023/12/21 12:49, Matthew Wilcox wrote:
> On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
>> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
>> filter out of the MAP_STACK mapping based on this patch. The regression
>> in stress-ng.pthread was gone. I suppose this is kind of safe because
>> the madvise call is only applied to glibc allocated stack.
>>
>>
>> But what I am not sure was whether it's worthy to do such kind of change
>> as the regression only is seen obviously in micro-benchmark. No evidence
>> showed the other regressionsin this report is related with madvise. At
>> least from the perf statstics. Need to check more on stream/ramspeed.
> 
> FWIW, we had a customer report a significant performance problem when
> inadvertently using 2MB pages for stacks.  They were able to avoid it by
> using 2044KiB sized stacks ...
Looks like related with this regression. So we may need to consider
avoiding THP for stack.


Regards
Yin, Fengwei



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-21  0:58         ` Yin Fengwei
  2023-12-21  1:02           ` Yin Fengwei
  2023-12-21  4:49           ` Matthew Wilcox
@ 2023-12-21 13:39           ` Yin, Fengwei
  2023-12-21 18:11             ` Yang Shi
  2 siblings, 1 reply; 24+ messages in thread
From: Yin, Fengwei @ 2023-12-21 13:39 UTC (permalink / raw)
  To: Yang Shi
  Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang



On 12/21/2023 8:58 AM, Yin Fengwei wrote:
> But what I am not sure was whether it's worthy to do such kind of change
> as the regression only is seen obviously in micro-benchmark. No evidence
> showed the other regressionsin this report is related with madvise. At
> least from the perf statstics. Need to check more on stream/ramspeed. 
> Thanks.

With debugging patch (filter out the stack mapping from THP aligned),
the result of stream can be restored to around 2%:

commit:
   30749e6fbb3d391a7939ac347e9612afe8c26e94
   1111d46b5cbad57486e7a3fab75888accac2f072
   89f60532d82b9ecd39303a74589f76e4758f176f  -> 1111d46b5cbad with 
debugging patch

30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589
---------------- --------------------------- ---------------------------
     350993           -15.6%     296081 ±  2%      -1.5%     345689 
   stream.add_bandwidth_MBps
     349830           -16.1%     293492 ±  2%      -2.3%     341860 ± 
2%  stream.add_bandwidth_MBps_harmonicMean
     333973           -20.5%     265439 ±  3%      -1.7%     328403 
   stream.copy_bandwidth_MBps
     332930           -21.7%     260548 ±  3%      -2.5%     324711 ± 
2%  stream.copy_bandwidth_MBps_harmonicMean
     302788           -16.2%     253817 ±  2%      -1.4%     298421 
   stream.scale_bandwidth_MBps
     302157           -17.1%     250577 ±  2%      -2.0%     296054 
   stream.scale_bandwidth_MBps_harmonicMean
     339047           -12.1%     298061            -1.4%     334206 
   stream.triad_bandwidth_MBps
     338186           -12.4%     296218            -2.0%     331469 
   stream.triad_bandwidth_MBps_harmonicMean


The regression of ramspeed is still there.


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-21  4:49           ` Matthew Wilcox
  2023-12-21  4:58             ` Yin Fengwei
@ 2023-12-21 18:07             ` Yang Shi
  2023-12-21 18:14               ` Matthew Wilcox
  1 sibling, 1 reply; 24+ messages in thread
From: Yang Shi @ 2023-12-21 18:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yin Fengwei, kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Christopher Lameter,
	ying.huang, feng.tang

On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
> > Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
> > filter out of the MAP_STACK mapping based on this patch. The regression
> > in stress-ng.pthread was gone. I suppose this is kind of safe because
> > the madvise call is only applied to glibc allocated stack.
> >
> >
> > But what I am not sure was whether it's worthy to do such kind of change
> > as the regression only is seen obviously in micro-benchmark. No evidence
> > showed the other regressionsin this report is related with madvise. At
> > least from the perf statstics. Need to check more on stream/ramspeed.
>
> FWIW, we had a customer report a significant performance problem when
> inadvertently using 2MB pages for stacks.  They were able to avoid it by
> using 2044KiB sized stacks ...

Thanks for the report. This provided more justification regarding
honoring MAP_STACK on Linux. Some applications, for example, pthread,
just allocate a fixed size area for stack. This confuses kernel
because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP.

But I'm still a little confused by why THP for stack could result in
significant performance problems. Unless the applications resize the
stack quite often.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-21 13:39           ` Yin, Fengwei
@ 2023-12-21 18:11             ` Yang Shi
  2023-12-22  1:13               ` Yin, Fengwei
  0 siblings, 1 reply; 24+ messages in thread
From: Yang Shi @ 2023-12-21 18:11 UTC (permalink / raw)
  To: Yin, Fengwei
  Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang

On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 12/21/2023 8:58 AM, Yin Fengwei wrote:
> > But what I am not sure was whether it's worthy to do such kind of change
> > as the regression only is seen obviously in micro-benchmark. No evidence
> > showed the other regressionsin this report is related with madvise. At
> > least from the perf statstics. Need to check more on stream/ramspeed.
> > Thanks.
>
> With debugging patch (filter out the stack mapping from THP aligned),
> the result of stream can be restored to around 2%:
>
> commit:
>    30749e6fbb3d391a7939ac347e9612afe8c26e94
>    1111d46b5cbad57486e7a3fab75888accac2f072
>    89f60532d82b9ecd39303a74589f76e4758f176f  -> 1111d46b5cbad with
> debugging patch
>
> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589
> ---------------- --------------------------- ---------------------------
>      350993           -15.6%     296081 ±  2%      -1.5%     345689
>    stream.add_bandwidth_MBps
>      349830           -16.1%     293492 ±  2%      -2.3%     341860 ±
> 2%  stream.add_bandwidth_MBps_harmonicMean
>      333973           -20.5%     265439 ±  3%      -1.7%     328403
>    stream.copy_bandwidth_MBps
>      332930           -21.7%     260548 ±  3%      -2.5%     324711 ±
> 2%  stream.copy_bandwidth_MBps_harmonicMean
>      302788           -16.2%     253817 ±  2%      -1.4%     298421
>    stream.scale_bandwidth_MBps
>      302157           -17.1%     250577 ±  2%      -2.0%     296054
>    stream.scale_bandwidth_MBps_harmonicMean
>      339047           -12.1%     298061            -1.4%     334206
>    stream.triad_bandwidth_MBps
>      338186           -12.4%     296218            -2.0%     331469
>    stream.triad_bandwidth_MBps_harmonicMean
>
>
> The regression of ramspeed is still there.

Thanks for the debugging patch and the test. If no one has objection
to honor MAP_STACK, I'm going to come up with a more formal patch.
Even though thp_get_unmapped_area() is not called for MAP_STACK, stack
area still may be allocated at 2M aligned address theoretically. And
it may be worse with multi-sized THP, for 1M.

Do you have any instructions regarding how to run ramspeed? Anyway I
may not have time debug it until after holidays.

>
>
> Regards
> Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-21 18:07             ` Yang Shi
@ 2023-12-21 18:14               ` Matthew Wilcox
  2023-12-22  1:06                 ` Yin, Fengwei
  0 siblings, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2023-12-21 18:14 UTC (permalink / raw)
  To: Yang Shi
  Cc: Yin Fengwei, kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Christopher Lameter,
	ying.huang, feng.tang

On Thu, Dec 21, 2023 at 10:07:09AM -0800, Yang Shi wrote:
> On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
> > > Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
> > > filter out of the MAP_STACK mapping based on this patch. The regression
> > > in stress-ng.pthread was gone. I suppose this is kind of safe because
> > > the madvise call is only applied to glibc allocated stack.
> > >
> > >
> > > But what I am not sure was whether it's worthy to do such kind of change
> > > as the regression only is seen obviously in micro-benchmark. No evidence
> > > showed the other regressionsin this report is related with madvise. At
> > > least from the perf statstics. Need to check more on stream/ramspeed.
> >
> > FWIW, we had a customer report a significant performance problem when
> > inadvertently using 2MB pages for stacks.  They were able to avoid it by
> > using 2044KiB sized stacks ...
> 
> Thanks for the report. This provided more justification regarding
> honoring MAP_STACK on Linux. Some applications, for example, pthread,
> just allocate a fixed size area for stack. This confuses kernel
> because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP.
> 
> But I'm still a little confused by why THP for stack could result in
> significant performance problems. Unless the applications resize the
> stack quite often.

We didn't delve into what was causing the problem, only that it was
happening.  The application had many threads, so it could have been as
simple as consuming all the available THP and leaving fewer available
for other uses.  Or it could have been a memory consumption problem;
maybe the app would only have been using 16-32kB per thread but was
now using 2MB per thread and if there were, say, 100 threads, that's an
extra 199MB of memory in use.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-21 18:14               ` Matthew Wilcox
@ 2023-12-22  1:06                 ` Yin, Fengwei
  2023-12-22  2:23                   ` Huang, Ying
  0 siblings, 1 reply; 24+ messages in thread
From: Yin, Fengwei @ 2023-12-22  1:06 UTC (permalink / raw)
  To: Matthew Wilcox, Yang Shi
  Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Christopher Lameter,
	ying.huang, feng.tang



On 12/22/2023 2:14 AM, Matthew Wilcox wrote:
> On Thu, Dec 21, 2023 at 10:07:09AM -0800, Yang Shi wrote:
>> On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote:
>>>
>>> On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
>>>> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
>>>> filter out of the MAP_STACK mapping based on this patch. The regression
>>>> in stress-ng.pthread was gone. I suppose this is kind of safe because
>>>> the madvise call is only applied to glibc allocated stack.
>>>>
>>>>
>>>> But what I am not sure was whether it's worthy to do such kind of change
>>>> as the regression only is seen obviously in micro-benchmark. No evidence
>>>> showed the other regressionsin this report is related with madvise. At
>>>> least from the perf statstics. Need to check more on stream/ramspeed.
>>>
>>> FWIW, we had a customer report a significant performance problem when
>>> inadvertently using 2MB pages for stacks.  They were able to avoid it by
>>> using 2044KiB sized stacks ...
>>
>> Thanks for the report. This provided more justification regarding
>> honoring MAP_STACK on Linux. Some applications, for example, pthread,
>> just allocate a fixed size area for stack. This confuses kernel
>> because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP.
>>
>> But I'm still a little confused by why THP for stack could result in
>> significant performance problems. Unless the applications resize the
>> stack quite often.
> 
> We didn't delve into what was causing the problem, only that it was
> happening.  The application had many threads, so it could have been as
> simple as consuming all the available THP and leaving fewer available
> for other uses.  Or it could have been a memory consumption problem;
> maybe the app would only have been using 16-32kB per thread but was
> now using 2MB per thread and if there were, say, 100 threads, that's an
> extra 199MB of memory in use.
One thing I know is related with the memory zeroing. This is from
the perf data in this report:

       0.00           +16.7       16.69 ±  7% 
perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault

Zeroing 2M memory costs much more CPU than zeroing 16-32KB memory if
there are many threads.


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-21 18:11             ` Yang Shi
@ 2023-12-22  1:13               ` Yin, Fengwei
  2024-01-04  1:32                 ` Yang Shi
  0 siblings, 1 reply; 24+ messages in thread
From: Yin, Fengwei @ 2023-12-22  1:13 UTC (permalink / raw)
  To: Yang Shi
  Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang



On 12/22/2023 2:11 AM, Yang Shi wrote:
> On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>
>>
>>
>> On 12/21/2023 8:58 AM, Yin Fengwei wrote:
>>> But what I am not sure was whether it's worthy to do such kind of change
>>> as the regression only is seen obviously in micro-benchmark. No evidence
>>> showed the other regressionsin this report is related with madvise. At
>>> least from the perf statstics. Need to check more on stream/ramspeed.
>>> Thanks.
>>
>> With debugging patch (filter out the stack mapping from THP aligned),
>> the result of stream can be restored to around 2%:
>>
>> commit:
>>     30749e6fbb3d391a7939ac347e9612afe8c26e94
>>     1111d46b5cbad57486e7a3fab75888accac2f072
>>     89f60532d82b9ecd39303a74589f76e4758f176f  -> 1111d46b5cbad with
>> debugging patch
>>
>> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589
>> ---------------- --------------------------- ---------------------------
>>       350993           -15.6%     296081 ±  2%      -1.5%     345689
>>     stream.add_bandwidth_MBps
>>       349830           -16.1%     293492 ±  2%      -2.3%     341860 ±
>> 2%  stream.add_bandwidth_MBps_harmonicMean
>>       333973           -20.5%     265439 ±  3%      -1.7%     328403
>>     stream.copy_bandwidth_MBps
>>       332930           -21.7%     260548 ±  3%      -2.5%     324711 ±
>> 2%  stream.copy_bandwidth_MBps_harmonicMean
>>       302788           -16.2%     253817 ±  2%      -1.4%     298421
>>     stream.scale_bandwidth_MBps
>>       302157           -17.1%     250577 ±  2%      -2.0%     296054
>>     stream.scale_bandwidth_MBps_harmonicMean
>>       339047           -12.1%     298061            -1.4%     334206
>>     stream.triad_bandwidth_MBps
>>       338186           -12.4%     296218            -2.0%     331469
>>     stream.triad_bandwidth_MBps_harmonicMean
>>
>>
>> The regression of ramspeed is still there.
> 
> Thanks for the debugging patch and the test. If no one has objection
> to honor MAP_STACK, I'm going to come up with a more formal patch.
> Even though thp_get_unmapped_area() is not called for MAP_STACK, stack
> area still may be allocated at 2M aligned address theoretically. And
> it may be worse with multi-sized THP, for 1M.
Right. Filtering out MAP_STACK can't make sure no THP for stack. Just
reduce the possibility of using THP for stack.

> 
> Do you have any instructions regarding how to run ramspeed? Anyway I
> may not have time debug it until after holidays.
0Day leverages phoronix-test-suite to run ramspeed. So I don't have
direct answer here.

I suppose we could check the configuration of ramspeed in phoronix-test-
suite to understand what's the build options and command options to run
ramspeed:
https://openbenchmarking.org/test/pts/ramspeed


Regards
Yin, Fengwei

> 
>>
>>
>> Regards
>> Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-22  1:06                 ` Yin, Fengwei
@ 2023-12-22  2:23                   ` Huang, Ying
  0 siblings, 0 replies; 24+ messages in thread
From: Huang, Ying @ 2023-12-22  2:23 UTC (permalink / raw)
  To: Yin, Fengwei
  Cc: Matthew Wilcox, Yang Shi, kernel test robot, Rik van Riel,
	oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Christopher Lameter, feng.tang

"Yin, Fengwei" <fengwei.yin@intel.com> writes:

> On 12/22/2023 2:14 AM, Matthew Wilcox wrote:
>> On Thu, Dec 21, 2023 at 10:07:09AM -0800, Yang Shi wrote:
>>> On Wed, Dec 20, 2023 at 8:49 PM Matthew Wilcox <willy@infradead.org> wrote:
>>>>
>>>> On Thu, Dec 21, 2023 at 08:58:42AM +0800, Yin Fengwei wrote:
>>>>> Yes. MAP_STACK is also mentioned in manpage of mmap. I did test to
>>>>> filter out of the MAP_STACK mapping based on this patch. The regression
>>>>> in stress-ng.pthread was gone. I suppose this is kind of safe because
>>>>> the madvise call is only applied to glibc allocated stack.
>>>>>
>>>>>
>>>>> But what I am not sure was whether it's worthy to do such kind of change
>>>>> as the regression only is seen obviously in micro-benchmark. No evidence
>>>>> showed the other regressionsin this report is related with madvise. At
>>>>> least from the perf statstics. Need to check more on stream/ramspeed.
>>>>
>>>> FWIW, we had a customer report a significant performance problem when
>>>> inadvertently using 2MB pages for stacks.  They were able to avoid it by
>>>> using 2044KiB sized stacks ...
>>>
>>> Thanks for the report. This provided more justification regarding
>>> honoring MAP_STACK on Linux. Some applications, for example, pthread,
>>> just allocate a fixed size area for stack. This confuses kernel
>>> because kernel tell stack by VM_GROWSDOWN | VM_GROWSUP.
>>>
>>> But I'm still a little confused by why THP for stack could result in
>>> significant performance problems. Unless the applications resize the
>>> stack quite often.
>> We didn't delve into what was causing the problem, only that it was
>> happening.  The application had many threads, so it could have been as
>> simple as consuming all the available THP and leaving fewer available
>> for other uses.  Or it could have been a memory consumption problem;
>> maybe the app would only have been using 16-32kB per thread but was
>> now using 2MB per thread and if there were, say, 100 threads, that's an
>> extra 199MB of memory in use.
> One thing I know is related with the memory zeroing. This is from
> the perf data in this report:
>
>       0.00           +16.7       16.69 ±  7%
>       perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>
> Zeroing 2M memory costs much more CPU than zeroing 16-32KB memory if
> there are many threads.

Using 2M stack may hurt performance of short-live threads with shallow
stack depth.  Imagine a network server which creates a new thread for
each incoming connection.  I understand that the performance will not be
great in this way anyway.  IIUC we should not make it too bad.

But, whether this is import depends on whether the use case is
important.  TBH, I don't know that.

--
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2023-12-22  1:13               ` Yin, Fengwei
@ 2024-01-04  1:32                 ` Yang Shi
  2024-01-04  8:18                   ` Yin Fengwei
  0 siblings, 1 reply; 24+ messages in thread
From: Yang Shi @ 2024-01-04  1:32 UTC (permalink / raw)
  To: Yin, Fengwei
  Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang

On Thu, Dec 21, 2023 at 5:13 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 12/22/2023 2:11 AM, Yang Shi wrote:
> > On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> >>
> >>
> >>
> >> On 12/21/2023 8:58 AM, Yin Fengwei wrote:
> >>> But what I am not sure was whether it's worthy to do such kind of change
> >>> as the regression only is seen obviously in micro-benchmark. No evidence
> >>> showed the other regressionsin this report is related with madvise. At
> >>> least from the perf statstics. Need to check more on stream/ramspeed.
> >>> Thanks.
> >>
> >> With debugging patch (filter out the stack mapping from THP aligned),
> >> the result of stream can be restored to around 2%:
> >>
> >> commit:
> >>     30749e6fbb3d391a7939ac347e9612afe8c26e94
> >>     1111d46b5cbad57486e7a3fab75888accac2f072
> >>     89f60532d82b9ecd39303a74589f76e4758f176f  -> 1111d46b5cbad with
> >> debugging patch
> >>
> >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589
> >> ---------------- --------------------------- ---------------------------
> >>       350993           -15.6%     296081 ±  2%      -1.5%     345689
> >>     stream.add_bandwidth_MBps
> >>       349830           -16.1%     293492 ±  2%      -2.3%     341860 ±
> >> 2%  stream.add_bandwidth_MBps_harmonicMean
> >>       333973           -20.5%     265439 ±  3%      -1.7%     328403
> >>     stream.copy_bandwidth_MBps
> >>       332930           -21.7%     260548 ±  3%      -2.5%     324711 ±
> >> 2%  stream.copy_bandwidth_MBps_harmonicMean
> >>       302788           -16.2%     253817 ±  2%      -1.4%     298421
> >>     stream.scale_bandwidth_MBps
> >>       302157           -17.1%     250577 ±  2%      -2.0%     296054
> >>     stream.scale_bandwidth_MBps_harmonicMean
> >>       339047           -12.1%     298061            -1.4%     334206
> >>     stream.triad_bandwidth_MBps
> >>       338186           -12.4%     296218            -2.0%     331469
> >>     stream.triad_bandwidth_MBps_harmonicMean
> >>
> >>
> >> The regression of ramspeed is still there.
> >
> > Thanks for the debugging patch and the test. If no one has objection
> > to honor MAP_STACK, I'm going to come up with a more formal patch.
> > Even though thp_get_unmapped_area() is not called for MAP_STACK, stack
> > area still may be allocated at 2M aligned address theoretically. And
> > it may be worse with multi-sized THP, for 1M.
> Right. Filtering out MAP_STACK can't make sure no THP for stack. Just
> reduce the possibility of using THP for stack.

Can you please help test the below patch?

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 40d94411d492..dc7048824be8 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
        return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
               _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
               _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
+              _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
               arch_calc_vm_flag_bits(flags);
 }

But I can't reproduce the pthread regression on my aarch64 VM. It
might be due to the guard stack (the 64K guard stack is at 2M aligned,
the 8M stack is right next to it which starts at 2M + 64K). But I can
see the stack area is not THP eligible anymore with this patch. See:

fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
Size:               8192 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  12 kB
Pss:                  12 kB
Pss_Dirty:            12 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        12 kB
Referenced:           12 kB
Anonymous:            12 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac nh

The "nh" flag is set.

>
> >
> > Do you have any instructions regarding how to run ramspeed? Anyway I
> > may not have time debug it until after holidays.
> 0Day leverages phoronix-test-suite to run ramspeed. So I don't have
> direct answer here.
>
> I suppose we could check the configuration of ramspeed in phoronix-test-
> suite to understand what's the build options and command options to run
> ramspeed:
> https://openbenchmarking.org/test/pts/ramspeed

Downloaded the test suite. It looks phronix just runs test 3 (int) and
6 (float). They basically does 4 sub tests to benchmark memory
bandwidth:

 * copy
 * scale copy
 * add copy
 * triad copy

The source buffer is initialized (page fault is triggered), but the
destination area is not. So the page fault + page clear time is
accounted to the result. Clearing huge page may take a little bit more
time. But I didn't see noticeable regression on my aarch64 VM either.
Anyway I'm supposed such test should be run with THP off.

>
>
> Regards
> Yin, Fengwei
>
> >
> >>
> >>
> >> Regards
> >> Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2024-01-04  1:32                 ` Yang Shi
@ 2024-01-04  8:18                   ` Yin Fengwei
  2024-01-04  8:39                     ` Oliver Sang
  0 siblings, 1 reply; 24+ messages in thread
From: Yin Fengwei @ 2024-01-04  8:18 UTC (permalink / raw)
  To: Yang Shi
  Cc: kernel test robot, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang



On 2024/1/4 09:32, Yang Shi wrote:
> On Thu, Dec 21, 2023 at 5:13 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>
>>
>>
>> On 12/22/2023 2:11 AM, Yang Shi wrote:
>>> On Thu, Dec 21, 2023 at 5:40 AM Yin, Fengwei <fengwei.yin@intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 12/21/2023 8:58 AM, Yin Fengwei wrote:
>>>>> But what I am not sure was whether it's worthy to do such kind of change
>>>>> as the regression only is seen obviously in micro-benchmark. No evidence
>>>>> showed the other regressionsin this report is related with madvise. At
>>>>> least from the perf statstics. Need to check more on stream/ramspeed.
>>>>> Thanks.
>>>>
>>>> With debugging patch (filter out the stack mapping from THP aligned),
>>>> the result of stream can be restored to around 2%:
>>>>
>>>> commit:
>>>>      30749e6fbb3d391a7939ac347e9612afe8c26e94
>>>>      1111d46b5cbad57486e7a3fab75888accac2f072
>>>>      89f60532d82b9ecd39303a74589f76e4758f176f  -> 1111d46b5cbad with
>>>> debugging patch
>>>>
>>>> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a74589
>>>> ---------------- --------------------------- ---------------------------
>>>>        350993           -15.6%     296081 ±  2%      -1.5%     345689
>>>>      stream.add_bandwidth_MBps
>>>>        349830           -16.1%     293492 ±  2%      -2.3%     341860 ±
>>>> 2%  stream.add_bandwidth_MBps_harmonicMean
>>>>        333973           -20.5%     265439 ±  3%      -1.7%     328403
>>>>      stream.copy_bandwidth_MBps
>>>>        332930           -21.7%     260548 ±  3%      -2.5%     324711 ±
>>>> 2%  stream.copy_bandwidth_MBps_harmonicMean
>>>>        302788           -16.2%     253817 ±  2%      -1.4%     298421
>>>>      stream.scale_bandwidth_MBps
>>>>        302157           -17.1%     250577 ±  2%      -2.0%     296054
>>>>      stream.scale_bandwidth_MBps_harmonicMean
>>>>        339047           -12.1%     298061            -1.4%     334206
>>>>      stream.triad_bandwidth_MBps
>>>>        338186           -12.4%     296218            -2.0%     331469
>>>>      stream.triad_bandwidth_MBps_harmonicMean
>>>>
>>>>
>>>> The regression of ramspeed is still there.
>>>
>>> Thanks for the debugging patch and the test. If no one has objection
>>> to honor MAP_STACK, I'm going to come up with a more formal patch.
>>> Even though thp_get_unmapped_area() is not called for MAP_STACK, stack
>>> area still may be allocated at 2M aligned address theoretically. And
>>> it may be worse with multi-sized THP, for 1M.
>> Right. Filtering out MAP_STACK can't make sure no THP for stack. Just
>> reduce the possibility of using THP for stack.
> 
> Can you please help test the below patch?
I can't access the testing box now. Oliver will help to test your patch.


Regards
Yin, Fengwei

> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 40d94411d492..dc7048824be8 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
>          return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>                 _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>                 _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
> +              _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
>                 arch_calc_vm_flag_bits(flags);
>   }
> 
> But I can't reproduce the pthread regression on my aarch64 VM. It
> might be due to the guard stack (the 64K guard stack is at 2M aligned,
> the 8M stack is right next to it which starts at 2M + 64K). But I can
> see the stack area is not THP eligible anymore with this patch. See:
> 
> fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
> Size:               8192 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Rss:                  12 kB
> Pss:                  12 kB
> Pss_Dirty:            12 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:        12 kB
> Referenced:           12 kB
> Anonymous:            12 kB
> KSM:                   0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> FilePmdMapped:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> Locked:                0 kB
> THPeligible:           0
> VmFlags: rd wr mr mw me ac nh
> 
> The "nh" flag is set.
> 
>>
>>>
>>> Do you have any instructions regarding how to run ramspeed? Anyway I
>>> may not have time debug it until after holidays.
>> 0Day leverages phoronix-test-suite to run ramspeed. So I don't have
>> direct answer here.
>>
>> I suppose we could check the configuration of ramspeed in phoronix-test-
>> suite to understand what's the build options and command options to run
>> ramspeed:
>> https://openbenchmarking.org/test/pts/ramspeed
> 
> Downloaded the test suite. It looks phronix just runs test 3 (int) and
> 6 (float). They basically does 4 sub tests to benchmark memory
> bandwidth:
> 
>   * copy
>   * scale copy
>   * add copy
>   * triad copy
> 
> The source buffer is initialized (page fault is triggered), but the
> destination area is not. So the page fault + page clear time is
> accounted to the result. Clearing huge page may take a little bit more
> time. But I didn't see noticeable regression on my aarch64 VM either.
> Anyway I'm supposed such test should be run with THP off.
> 
>>
>>
>> Regards
>> Yin, Fengwei
>>
>>>
>>>>
>>>>
>>>> Regards
>>>> Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2024-01-04  8:18                   ` Yin Fengwei
@ 2024-01-04  8:39                     ` Oliver Sang
  2024-01-05  9:29                       ` Oliver Sang
  0 siblings, 1 reply; 24+ messages in thread
From: Oliver Sang @ 2024-01-04  8:39 UTC (permalink / raw)
  To: Yin Fengwei
  Cc: Yang Shi, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang, oliver.sang

hi, Fengwei, hi, Yang Shi,

On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote:
> 
> On 2024/1/4 09:32, Yang Shi wrote:

...

> > Can you please help test the below patch?
> I can't access the testing box now. Oliver will help to test your patch.
> 

since now the commit-id of
  'mm: align larger anonymous mappings on THP boundaries'
in linux-next/master is efa7df3e3bb5d
I applied the patch like below:

* d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
* efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries
* 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi

our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression
so far, I will test d8d7b1dae6f03 for all these tests. Thanks



commit d8d7b1dae6f0311d528b289cda7b317520f9a984
Author: 0day robot <lkp@intel.com>
Date:   Thu Jan 4 12:51:10 2024 +0800

    fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 40d94411d4920..91197bd387730 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
        return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
               _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
               _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
+              _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
               arch_calc_vm_flag_bits(flags);
 }


> 
> Regards
> Yin, Fengwei
> 
> > 
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 40d94411d492..dc7048824be8 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> >          return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
> >                 _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> >                 _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
> > +              _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
> >                 arch_calc_vm_flag_bits(flags);
> >   }
> > 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2024-01-04  8:39                     ` Oliver Sang
@ 2024-01-05  9:29                       ` Oliver Sang
  2024-01-05 14:52                         ` Yin, Fengwei
  2024-01-05 18:49                         ` Yang Shi
  0 siblings, 2 replies; 24+ messages in thread
From: Oliver Sang @ 2024-01-05  9:29 UTC (permalink / raw)
  To: Yang Shi
  Cc: Yin Fengwei, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang, oliver.sang

[-- Attachment #1: Type: text/plain, Size: 16841 bytes --]

hi, Yang Shi,

On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote:
> hi, Fengwei, hi, Yang Shi,
> 
> On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote:
> > 
> > On 2024/1/4 09:32, Yang Shi wrote:
> 
> ...
> 
> > > Can you please help test the below patch?
> > I can't access the testing box now. Oliver will help to test your patch.
> > 
> 
> since now the commit-id of
>   'mm: align larger anonymous mappings on THP boundaries'
> in linux-next/master is efa7df3e3bb5d
> I applied the patch like below:
> 
> * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
> * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries
> * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi
> 
> our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression
> so far, I will test d8d7b1dae6f03 for all these tests. Thanks
>

we got 12 regressions and 1 improvement results for efa7df3e3b so far.
(4 regressions are just similar to what we reported for 1111d46b5c).
by your patch, 6 of those regressions are fixed, others are not impacted.

below is a summary:

No.  testsuite       test                            status-on-efa7df3e3b  fix-by-d8d7b1dae6 ?
===  =========       ====                            ====================  ===================
(1)  stress-ng       numa                            regression            NO
(2)                  pthread                         regression            yes (on a Ice Lake server)
(3)                  pthread                         regression            yes (on a Cascade Lake desktop)
(4)  will-it-scale   malloc1                         regression            NO
(5)                  page_fault1                     improvement           no (so still improvement)
(6)  vm-scalability  anon-w-seq-mt                   regression            yes
(7)  stream          nr_threads=25%                  regression            yes
(8)                  nr_threads=50%                  regression            yes
(9)  phoronix        osbench.CreateThreads           regression            yes (on a Cascade Lake server)
(10)                 ramspeed.Add.Integer            regression            NO (and below 3, on a Coffee Lake desktop)
(11)                 ramspeed.Average.FloatingPoint  regression            NO
(12)                 ramspeed.Triad.Integer          regression            NO
(13)                 ramspeed.Average.Integer        regression            NO


below are details, for those regressions not fixed by d8d7b1dae6, attached
full comparison.


(1) detail comparison is attached as 'stress-ng-regression'

Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    251.12           -48.2%     130.00           -47.9%     130.75        stress-ng.numa.ops
      4.10           -49.4%       2.08           -49.2%       2.09        stress-ng.numa.ops_per_sec


(2)
Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/pthread/stress-ng/60s

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   3272223           -87.8%     400430            +0.5%    3287322        stress-ng.pthread.ops
     54516           -87.8%       6664            +0.5%      54772        stress-ng.pthread.ops_per_sec  

 
(3)
Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with memory: 128G
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   2250845           -85.2%     332370 ±  6%      -0.8%    2232820        stress-ng.pthread.ops
     37510           -85.2%       5538 ±  6%      -0.8%      37209        stress-ng.pthread.ops_per_sec  


(4) full comparison attached as 'will-it-scale-regression'

Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     10994           -86.7%       1466           -86.7%       1460        will-it-scale.per_process_ops
   1231431           -86.7%     164315           -86.7%     163624        will-it-scale.workload

	  
(5)
Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault1/will-it-scale

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
  18858970           +44.8%   27298921           +44.9%   27330479        will-it-scale.224.threads
     56.06           +13.3%      63.53           +13.8%      63.81        will-it-scale.224.threads_idle
     84191           +44.8%     121869           +44.9%     122010        will-it-scale.per_thread_ops
  18858970           +44.8%   27298921           +44.9%   27330479        will-it-scale.workload


(6)
Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    345968            -6.5%     323566            +0.1%     346304        vm-scalability.median
      1.91 ± 10%      -0.5        1.38 ± 20%      -0.2        1.75 ± 13%  vm-scalability.median_stddev%
  79708409            -7.4%   73839640            -0.1%   79613742        vm-scalability.throughput


(7)
Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G
=========================================================================================
array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
  50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    349414           -16.2%     292854 ±  2%      -0.4%     348048        stream.add_bandwidth_MBps
    347727 ±  2%     -16.5%     290470 ±  2%      -0.6%     345750 ±  2%  stream.add_bandwidth_MBps_harmonicMean
    332206           -21.6%     260428 ±  3%      -0.4%     330838        stream.copy_bandwidth_MBps
    330746 ±  2%     -22.6%     255915 ±  3%      -0.6%     328725 ±  2%  stream.copy_bandwidth_MBps_harmonicMean
    301178           -16.9%     250209 ±  2%      -0.4%     299920        stream.scale_bandwidth_MBps
    300262           -17.7%     247151 ±  2%      -0.6%     298586 ±  2%  stream.scale_bandwidth_MBps_harmonicMean
    337408           -12.5%     295287 ±  2%      -0.3%     336304        stream.triad_bandwidth_MBps
    336153           -12.7%     293621            -0.5%     334624 ±  2%  stream.triad_bandwidth_MBps_harmonicMean			 


(8)
Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G
=========================================================================================
array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
  50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/50%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    345632           -19.7%     277550 ±  3%      +0.4%     347067 ±  2%  stream.add_bandwidth_MBps
    342263 ±  2%     -19.7%     274704 ±  2%      +0.4%     343609 ±  2%  stream.add_bandwidth_MBps_harmonicMean
    343820           -17.3%     284428 ±  3%      +0.1%     344248        stream.copy_bandwidth_MBps
    341759 ±  2%     -17.8%     280934 ±  3%      +0.1%     342025 ±  2%  stream.copy_bandwidth_MBps_harmonicMean
    343270           -17.8%     282330 ±  3%      +0.3%     344276 ±  2%  stream.scale_bandwidth_MBps
    340812 ±  2%     -18.3%     278284 ±  3%      +0.3%     341672 ±  2%  stream.scale_bandwidth_MBps_harmonicMean
    364596           -19.7%     292831 ±  3%      +0.4%     366145 ±  2%  stream.triad_bandwidth_MBps
    360643 ±  2%     -19.9%     289034 ±  3%      +0.4%     362004 ±  2%  stream.triad_bandwidth_MBps_harmonicMean			 


(9)
Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with memory: 512G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Create Threads/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     26.82         +1348.4%     388.43            +4.0%      27.88        phoronix-test-suite.osbench.CreateThreads.us_per_event


**** for below (10) - (13), full comparison is attached as phoronix-regressions
(they all happen on a Coffee Lake desktop)
(10)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     20115            -4.5%      19211            -4.5%      19217        phoronix-test-suite.ramspeed.Add.Integer.mb_s


(11)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     19960            -2.9%      19378            -3.0%      19366        phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s


(12)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     19667            -6.4%      18399            -6.4%      18413        phoronix-test-suite.ramspeed.Triad.Integer.mb_s


(13)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     19799            -3.5%      19106            -3.4%      19117        phoronix-test-suite.ramspeed.Average.Integer.mb_s



> 
> 
> commit d8d7b1dae6f0311d528b289cda7b317520f9a984
> Author: 0day robot <lkp@intel.com>
> Date:   Thu Jan 4 12:51:10 2024 +0800
> 
>     fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 40d94411d4920..91197bd387730 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
>         return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>                _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>                _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
> +              _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
>                arch_calc_vm_flag_bits(flags);
>  }
> 
> 
> > 
> > Regards
> > Yin, Fengwei
> > 
> > > 
> > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > index 40d94411d492..dc7048824be8 100644
> > > --- a/include/linux/mman.h
> > > +++ b/include/linux/mman.h
> > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > >          return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
> > >                 _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> > >                 _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
> > > +              _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
> > >                 arch_calc_vm_flag_bits(flags);
> > >   }
> > > 

[-- Attachment #2: stress-ng-regression --]
[-- Type: text/plain, Size: 15787 bytes --]

(1)
Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     55848 ± 28%    +236.5%     187927 ±  3%    +259.4%     200733 ±  2%  meminfo.AnonHugePages
      1.80 ±  5%      -0.2        1.60 ±  5%      -0.2        1.60 ±  7%  mpstat.cpu.all.usr%
      8077 ±  7%     +11.8%       9030 ±  5%      +4.6%       8451 ±  7%  numa-vmstat.node0.nr_kernel_stack
    120605 ±  3%     -10.0%     108597 ±  3%     -10.5%     107928 ±  3%  vmstat.system.in
      1868 ± 32%     +75.1%       3271 ± 14%     +87.1%       3495 ± 20%  turbostat.C1
   9123408 ±  5%     -13.8%    7863298 ±  7%     -14.0%    7846843 ±  6%  turbostat.IRQ
     59.62 ± 49%    +125.4%     134.38 ± 88%    +267.9%     219.38 ± 85%  turbostat.POLL
     24.33 ± 43%     +69.1%      41.14 ± 35%      +9.0%      26.51 ± 53%  sched_debug.cfs_rq:/.removed.load_avg.avg
    104.44 ± 21%     +29.2%     134.94 ± 17%      +3.2%     107.78 ± 26%  sched_debug.cfs_rq:/.removed.load_avg.stddev
    106.26 ± 16%     -17.6%      87.53 ± 21%     -24.6%      80.11 ± 21%  sched_debug.cfs_rq:/.util_est_enqueued.stddev
     35387 ± 59%    +127.7%      80580 ± 53%    +249.2%     123565 ± 57%  sched_debug.cpu.avg_idle.min
      1156 ±  7%     -21.9%     903.06 ±  5%     -23.2%     888.25 ± 15%  sched_debug.cpu.nr_switches.min
     20719 ±111%     -51.1%      10123 ± 71%     -56.6%       8996 ± 29%  numa-meminfo.node0.Active
     20639 ±111%     -51.5%      10001 ± 72%     -56.8%       8916 ± 29%  numa-meminfo.node0.Active(anon)
     31253 ± 70%    +142.7%      75839 ± 20%    +214.1%      98180 ± 22%  numa-meminfo.node0.AnonHugePages
      8076 ±  7%     +11.8%       9029 ±  5%      +4.7%       8451 ±  7%  numa-meminfo.node0.KernelStack
     24260 ± 62%    +360.8%     111783 ± 17%    +321.2%     102184 ± 21%  numa-meminfo.node1.AnonHugePages
    283702 ± 16%     +40.9%     399633 ± 18%     +35.9%     385485 ± 11%  numa-meminfo.node1.AnonPages.max
    251.12           -48.2%     130.00           -47.9%     130.75        stress-ng.numa.ops
      4.10           -49.4%       2.08           -49.2%       2.09        stress-ng.numa.ops_per_sec
     61658           -53.5%      28697           -53.3%      28768        stress-ng.time.minor_page_faults
      3727            +2.8%       3832            +2.9%       3833        stress-ng.time.system_time
     10.41           -48.6%       5.35           -48.7%       5.34        stress-ng.time.user_time
      4313 ±  4%     -47.0%       2285 ±  8%     -48.3%       2230 ±  7%  stress-ng.time.voluntary_context_switches
     63.61            +2.5%      65.20            +2.7%      65.30        time.elapsed_time
     63.61            +2.5%      65.20            +2.7%      65.30        time.elapsed_time.max
     61658           -53.5%      28697           -53.3%      28768        time.minor_page_faults
      3727            +2.8%       3832            +2.9%       3833        time.system_time
     10.41           -48.6%       5.35           -48.7%       5.34        time.user_time
      4313 ±  4%     -47.0%       2285 ±  8%     -48.3%       2230 ±  7%  time.voluntary_context_switches
    120325            +6.1%     127672 ±  6%      +0.9%     121431        proc-vmstat.nr_anon_pages
     27.33 ± 29%    +236.0%      91.83 ±  3%    +258.6%      98.02 ±  2%  proc-vmstat.nr_anon_transparent_hugepages
    148677            +6.2%     157844 ±  4%      +0.7%     149763        proc-vmstat.nr_inactive_anon
     98.10 ± 25%     -52.8%      46.30 ± 69%     -55.3%      43.82 ± 64%  proc-vmstat.nr_isolated_file
      2809            +9.0%       3063 ± 28%      -3.9%       2698 ±  2%  proc-vmstat.nr_page_table_pages
    148670            +6.2%     157837 ±  4%      +0.7%     149765        proc-vmstat.nr_zone_inactive_anon
   2580003            -5.8%    2431297            -5.8%    2431173        proc-vmstat.numa_hit
   1488693            -5.8%    1402808            -5.8%    1401633        proc-vmstat.numa_local
   1091291            -5.8%    1028489            -5.7%    1029540        proc-vmstat.numa_other
  9.56e+08            +2.1%  9.757e+08            +2.1%  9.761e+08        proc-vmstat.pgalloc_normal
    469554            -7.6%     433894            -7.3%     435076        proc-vmstat.pgfault
 9.559e+08            +2.1%  9.756e+08            +2.1%   9.76e+08        proc-vmstat.pgfree
     17127 ± 21%     -55.4%       7647 ± 64%     -55.0%       7700 ± 52%  proc-vmstat.pgmigrate_fail
 9.554e+08            +2.1%  9.751e+08            +2.1%  9.754e+08        proc-vmstat.pgmigrate_success
   1865641            +2.1%    1904388            +2.1%    1905158        proc-vmstat.thp_migration_success
      0.43 ±  8%      -0.1        0.30 ± 10%      -0.2        0.28 ± 12%  perf-profile.children.cycles-pp.queue_pages_range
      0.43 ±  8%      -0.1        0.30 ± 10%      -0.2        0.28 ± 12%  perf-profile.children.cycles-pp.walk_page_range
      0.32 ±  8%      -0.1        0.21 ± 11%      -0.1        0.19 ± 13%  perf-profile.children.cycles-pp.__walk_page_range
      0.30 ±  8%      -0.1        0.19 ± 12%      -0.1        0.17 ± 13%  perf-profile.children.cycles-pp.walk_pud_range
      0.31 ±  9%      -0.1        0.20 ± 12%      -0.1        0.19 ± 12%  perf-profile.children.cycles-pp.walk_pgd_range
      0.30 ±  8%      -0.1        0.20 ± 11%      -0.1        0.18 ± 13%  perf-profile.children.cycles-pp.walk_p4d_range
      0.29 ±  8%      -0.1        0.18 ± 11%      -0.1        0.17 ± 13%  perf-profile.children.cycles-pp.walk_pmd_range
      0.28 ±  8%      -0.1        0.17 ± 11%      -0.1        0.16 ± 13%  perf-profile.children.cycles-pp.queue_folios_pte_range
      0.13 ± 12%      -0.1        0.07 ± 11%      -0.1        0.06 ± 17%  perf-profile.children.cycles-pp.vm_normal_folio
      0.18 ±  4%      -0.0        0.15 ±  3%      -0.0        0.16 ±  3%  perf-profile.children.cycles-pp.add_page_for_migration
      0.12 ±  4%      -0.0        0.12 ±  5%      -0.0        0.11 ±  4%  perf-profile.children.cycles-pp.__cond_resched
     98.65            +0.2       98.82            +0.2       98.88        perf-profile.children.cycles-pp.migrate_pages_batch
     98.66            +0.2       98.83            +0.2       98.89        perf-profile.children.cycles-pp.migrate_pages_sync
     98.68            +0.2       98.85            +0.2       98.91        perf-profile.children.cycles-pp.migrate_pages
      0.10 ± 11%      -0.0        0.05 ± 12%      -0.1        0.04 ± 79%  perf-profile.self.cycles-pp.vm_normal_folio
      0.13 ±  8%      -0.0        0.08 ± 14%      -0.0        0.08 ± 14%  perf-profile.self.cycles-pp.queue_folios_pte_range
      0.17 ± 89%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
      0.45 ± 59%    +124.4%       1.01 ± 81%   +1094.5%       5.40 ±120%  perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
     27.27 ± 95%     -75.2%       6.77 ± 83%     -48.4%      14.08 ± 77%  perf-sched.sch_delay.max.ms.__cond_resched.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch
      2.00 ± 88%    -100.0%       0.00          -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
      4.30 ± 86%     -50.9%       2.11 ± 67%     -90.0%       0.43 ±261%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      3.31 ± 53%     -55.8%       1.46 ±218%     -81.0%       0.63 ±182%  perf-sched.sch_delay.max.ms.synchronize_rcu_expedited.lru_cache_disable.do_pages_move.kernel_move_pages
    190.22 ± 41%    +125.2%     428.42 ± 60%     +72.7%     328.46 ± 21%  perf-sched.wait_and_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
    294.56 ± 10%     +44.0%     424.28 ± 16%     +62.5%     478.70 ± 13%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0
    322.33 ±  5%     +46.1%     470.78 ± 10%     +40.8%     453.90 ± 10%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    117.25 ± 11%     -13.3%     101.62 ± 34%     -24.6%      88.38 ± 17%  perf-sched.wait_and_delay.count.__cond_resched.down_read.add_page_for_migration.do_pages_move.kernel_move_pages
    307.25 ±  7%     -54.6%     139.62 ±  4%     -55.2%     137.62 ±  5%  perf-sched.wait_and_delay.count.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_pages_move.kernel_move_pages
    406.25 ±  3%     -57.7%     171.88 ± 10%     -59.0%     166.75 ±  3%  perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.__flush_work.isra.0
    142.50 ± 33%     -76.8%      33.00 ±139%     -65.8%      48.75 ± 83%  perf-sched.wait_and_delay.count.synchronize_rcu_expedited.lru_cache_disable.do_pages_move.kernel_move_pages
      1196 ±  3%     -37.9%     743.38 ± 10%     -38.5%     736.00 ±  9%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1749 ± 19%     +45.1%       2537 ±  6%     +76.0%       3078 ± 18%  perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0
      2691 ± 15%     +48.8%       4003 ±  6%     +44.6%       3892 ± 11%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2.82 ± 14%    -100.0%       0.00           -81.1%       0.53 ±264%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.migrate_to_node.do_migrate_pages.kernel_migrate_pages
    199.40 ± 29%    +114.8%     428.41 ± 60%     +64.7%     328.44 ± 21%  perf-sched.wait_time.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      3.09 ± 16%    -100.0%       0.00           -84.4%       0.48 ±264%  perf-sched.wait_time.avg.ms.__cond_resched.queue_folios_pte_range.walk_pmd_range.isra.0
      1.94 ± 50%    -100.0%       0.00           -74.2%       0.50 ±264%  perf-sched.wait_time.avg.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
    294.30 ± 10%     +44.1%     424.17 ± 16%     +62.6%     478.57 ± 13%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0
      0.98 ±107%    -100.0%       0.00           -95.8%       0.04 ±264%  perf-sched.wait_time.avg.ms.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
    321.84 ±  5%     +46.1%     470.35 ± 10%     +40.8%     453.02 ± 10%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      7.31 ± 53%    -100.0%       0.00           -87.7%       0.90 ±264%  perf-sched.wait_time.max.ms.__cond_resched.down_read.migrate_to_node.do_migrate_pages.kernel_migrate_pages
      6.45 ± 16%    -100.0%       0.00           -84.5%       1.00 ±264%  perf-sched.wait_time.max.ms.__cond_resched.queue_folios_pte_range.walk_pmd_range.isra.0
      6.17 ± 45%    -100.0%       0.00           -91.9%       0.50 ±264%  perf-sched.wait_time.max.ms.__cond_resched.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
     11.63 ±118%     -93.3%       0.78 ±178%     -89.3%       1.24 ±245%  perf-sched.wait_time.max.ms.exp_funnel_lock.synchronize_rcu_expedited.lru_cache_disable.do_pages_move
      1749 ± 19%     +45.1%       2537 ±  6%     +76.0%       3078 ± 18%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.__flush_work.isra.0
      2.49 ± 88%    -100.0%       0.00           -98.4%       0.04 ±264%  perf-sched.wait_time.max.ms.synchronize_rcu_expedited.lru_cache_disable.do_migrate_pages.kernel_migrate_pages
      2691 ± 15%     +48.8%       4003 ±  6%     +44.6%       3892 ± 11%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    340.81           +38.9%     473.47           +38.4%     471.58        perf-stat.i.MPKI
 1.131e+09           -25.0%  8.485e+08           -25.2%  8.465e+08 ±  2%  perf-stat.i.branch-instructions
     68.31            +1.1       69.37            +1.1       69.37        perf-stat.i.cache-miss-rate%
     46.16           +38.1%      63.73           +37.5%      63.45        perf-stat.i.cpi
    157.48            -7.7%     145.30 ±  2%      -8.1%     144.76 ±  2%  perf-stat.i.cpu-migrations
      0.02 ±  2%      +0.0        0.02 ± 16%      +0.0        0.02        perf-stat.i.dTLB-load-miss-rate%
    165432 ±  2%      -2.9%     160583 ± 12%      -8.3%     151664        perf-stat.i.dTLB-load-misses
 1.133e+09           -21.9%  8.846e+08           -22.1%  8.823e+08 ±  2%  perf-stat.i.dTLB-loads
      0.02            -0.0        0.01 ±  3%      -0.0        0.01        perf-stat.i.dTLB-store-miss-rate%
     98452           -31.8%      67127 ±  2%     -32.2%      66739 ±  2%  perf-stat.i.dTLB-store-misses
 5.668e+08           -13.7%  4.891e+08           -13.9%  4.879e+08        perf-stat.i.dTLB-stores
 5.684e+09           -24.5%  4.292e+09           -24.7%  4.282e+09 ±  2%  perf-stat.i.instructions
      0.07 ±  2%     -14.5%       0.06 ±  3%     -14.6%       0.06 ±  5%  perf-stat.i.ipc
     88.20           -10.7%      78.73           -11.0%      78.53        perf-stat.i.metric.M/sec
 1.242e+08            +0.9%  1.254e+08            +1.0%  1.255e+08        perf-stat.i.node-load-misses
  76214273            +1.0%   76999051            +1.2%   77103845        perf-stat.i.node-loads
    247.93           +32.1%     327.57 ±  2%     +32.1%     327.56 ±  2%  perf-stat.overall.MPKI
      0.92 ±  4%      +0.2        1.13 ±  5%      +0.2        1.12 ±  5%  perf-stat.overall.branch-miss-rate%
     69.51            +0.9       70.45            +1.0       70.50        perf-stat.overall.cache-miss-rate%
     33.77           +31.3%      44.35 ±  2%     +31.3%      44.35 ±  2%  perf-stat.overall.cpi
      0.01 ±  2%      +0.0        0.02 ± 13%      +0.0        0.02 ±  2%  perf-stat.overall.dTLB-load-miss-rate%
      0.02            -0.0        0.01 ±  2%      -0.0        0.01        perf-stat.overall.dTLB-store-miss-rate%
      0.03           -23.9%       0.02 ±  2%     -23.9%       0.02        perf-stat.overall.ipc
 1.084e+09           -24.2%  8.217e+08 ±  2%     -24.2%  8.216e+08 ±  2%  perf-stat.ps.branch-instructions
    154.44            -8.0%     142.02 ±  2%      -8.6%     141.20 ±  2%  perf-stat.ps.cpu-migrations
    163178 ±  3%      -3.1%     158185 ± 12%      -8.0%     150107 ±  2%  perf-stat.ps.dTLB-load-misses
 1.089e+09           -21.1%  8.585e+08           -21.2%  8.581e+08        perf-stat.ps.dTLB-loads
     96861           -31.9%      65975 ±  2%     -32.1%      65796 ±  2%  perf-stat.ps.dTLB-store-misses
 5.503e+08           -13.1%  4.781e+08           -13.2%  4.776e+08        perf-stat.ps.dTLB-stores
 5.447e+09           -23.7%  4.157e+09           -23.7%  4.157e+09        perf-stat.ps.instructions
 1.223e+08            +1.0%  1.235e+08            +1.0%  1.235e+08        perf-stat.ps.node-load-misses
  75118302            +1.1%   75929311            +1.1%   75927016        perf-stat.ps.node-loads
 3.496e+11           -21.7%  2.737e+11           -21.7%  2.739e+11 ±  2%  perf-stat.total.instructions

[-- Attachment #3: will-it-scale-regression --]
[-- Type: text/plain, Size: 57536 bytes --]

(4)
Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
      3161           +46.4%       4627           +47.5%       4662        vmstat.system.cs
      0.58 ±  2%      +0.7        1.27            +0.7        1.26        mpstat.cpu.all.irq%
      0.55 ±  3%      -0.5        0.09 ±  2%      -0.5        0.09 ±  2%  mpstat.cpu.all.soft%
      1.00 ± 13%      -0.7        0.29            -0.7        0.28        mpstat.cpu.all.usr%
   1231431           -86.7%     164315           -86.7%     163624        will-it-scale.112.processes
     10994           -86.7%       1466           -86.7%       1460        will-it-scale.per_process_ops
   1231431           -86.7%     164315           -86.7%     163624        will-it-scale.workload
      0.03           -66.7%       0.01           -66.7%       0.01        turbostat.IPC
     81.38            -2.8%      79.12            -2.2%      79.62        turbostat.PkgTmp
    764.02           +17.1%     894.78           +17.0%     893.81        turbostat.PkgWatt
     19.80          +135.4%      46.59          +135.1%      46.53        turbostat.RAMWatt
    771.38 ±  5%    +249.5%       2696 ± 14%    +231.9%       2560 ± 10%  perf-c2c.DRAM.local
      3050 ±  5%     -69.8%     922.75 ±  6%     -71.5%     869.88 ±  8%  perf-c2c.DRAM.remote
     11348 ±  4%     -90.2%       1107 ±  5%     -90.6%       1065 ±  3%  perf-c2c.HITM.local
    357.50 ± 21%     -44.0%     200.38 ±  7%     -48.2%     185.25 ± 13%  perf-c2c.HITM.remote
     11706 ±  4%     -88.8%       1307 ±  4%     -89.3%       1250 ±  3%  perf-c2c.HITM.total
 1.717e+08 ±  9%     -85.5%   24955542           -85.5%   24880885        numa-numastat.node0.local_node
 1.718e+08 ±  9%     -85.4%   25046901           -85.5%   24972867        numa-numastat.node0.numa_hit
 1.945e+08 ±  7%     -87.0%   25203631           -87.1%   25104844        numa-numastat.node1.local_node
 1.946e+08 ±  7%     -87.0%   25300536           -87.1%   25180465        numa-numastat.node1.numa_hit
 2.001e+08 ±  2%     -87.5%   25098699           -87.5%   25011079        numa-numastat.node2.local_node
 2.002e+08 ±  2%     -87.4%   25173132           -87.5%   25119438        numa-numastat.node2.numa_hit
 1.956e+08 ±  6%     -87.3%   24922332           -87.3%   24784408        numa-numastat.node3.local_node
 1.957e+08 ±  6%     -87.2%   25008002           -87.3%   24874399        numa-numastat.node3.numa_hit
    766959           -45.9%     414816           -46.2%     412898        meminfo.Active
    766881           -45.9%     414742           -46.2%     412824        meminfo.Active(anon)
    391581           +12.1%     438946            +8.4%     424669        meminfo.AnonPages
    421982           +20.7%     509155           +14.8%     484430        meminfo.Inactive
    421800           +20.7%     508969           +14.8%     484244        meminfo.Inactive(anon)
     68496 ±  7%     +88.9%     129357 ±  2%     +82.9%     125252 ±  2%  meminfo.Mapped
    569270           -24.0%     432709           -24.1%     431884        meminfo.SUnreclaim
    797185           -40.2%     476420           -40.8%     471912        meminfo.Shmem
    730111           -18.8%     593041           -18.9%     592400        meminfo.Slab
    148082 ±  2%     -20.3%     118055 ±  4%     -21.7%     115994 ±  6%  numa-meminfo.node0.SUnreclaim
    197311 ± 16%     -22.5%     152829 ± 19%     -29.8%     138546 ±  9%  numa-meminfo.node0.Slab
    144635 ±  5%     -25.8%     107254 ±  4%     -25.3%     107973 ±  6%  numa-meminfo.node1.SUnreclaim
    137974 ±  2%     -24.5%     104205 ±  6%     -25.7%     102563 ±  4%  numa-meminfo.node2.SUnreclaim
    167889 ± 13%     -26.1%     124127 ±  9%     -15.0%     142771 ± 18%  numa-meminfo.node2.Slab
    607639 ± 20%     -46.2%     326998 ± 15%     -46.8%     323458 ± 13%  numa-meminfo.node3.Active
    607611 ± 20%     -46.2%     326968 ± 15%     -46.8%     323438 ± 13%  numa-meminfo.node3.Active(anon)
    679476 ± 21%     -31.3%     466619 ± 19%     -38.5%     418074 ± 16%  numa-meminfo.node3.FilePages
     20150 ± 22%    +128.4%      46020 ± 11%    +123.0%      44932 ±  8%  numa-meminfo.node3.Mapped
    138148 ±  2%     -25.3%     103148 ±  4%     -23.8%     105326 ±  7%  numa-meminfo.node3.SUnreclaim
    631930 ± 20%     -40.9%     373456 ± 15%     -41.5%     369883 ± 13%  numa-meminfo.node3.Shmem
    166777 ±  7%     -19.6%     134013 ±  9%     -20.7%     132332 ±  7%  numa-meminfo.node3.Slab
     37030 ±  2%     -20.3%      29511 ±  4%     -21.7%      28993 ±  6%  numa-vmstat.node0.nr_slab_unreclaimable
 1.718e+08 ±  9%     -85.4%   25047066           -85.5%   24973455        numa-vmstat.node0.numa_hit
 1.717e+08 ±  9%     -85.5%   24955707           -85.5%   24881472        numa-vmstat.node0.numa_local
     36158 ±  5%     -25.8%      26811 ±  4%     -25.4%      26990 ±  6%  numa-vmstat.node1.nr_slab_unreclaimable
 1.946e+08 ±  7%     -87.0%   25300606           -87.1%   25181038        numa-vmstat.node1.numa_hit
 1.945e+08 ±  7%     -87.0%   25203699           -87.1%   25105417        numa-vmstat.node1.numa_local
     34499 ±  2%     -24.5%      26050 ±  6%     -25.7%      25638 ±  4%  numa-vmstat.node2.nr_slab_unreclaimable
 2.002e+08 ±  2%     -87.4%   25173363           -87.5%   25119830        numa-vmstat.node2.numa_hit
 2.001e+08 ±  2%     -87.5%   25098930           -87.5%   25011471        numa-vmstat.node2.numa_local
    151851 ± 20%     -46.2%      81720 ± 15%     -46.8%      80848 ± 13%  numa-vmstat.node3.nr_active_anon
    169827 ± 21%     -31.3%     116645 ± 19%     -38.5%     104502 ± 16%  numa-vmstat.node3.nr_file_pages
      4991 ± 23%    +131.5%      11555 ± 11%    +125.4%      11249 ±  8%  numa-vmstat.node3.nr_mapped
    157941 ± 20%     -40.9%      93355 ± 15%     -41.5%      92454 ± 13%  numa-vmstat.node3.nr_shmem
     34570 ±  2%     -25.4%      25780 ±  4%     -23.8%      26327 ±  7%  numa-vmstat.node3.nr_slab_unreclaimable
    151851 ± 20%     -46.2%      81720 ± 15%     -46.8%      80848 ± 13%  numa-vmstat.node3.nr_zone_active_anon
 1.957e+08 ±  6%     -87.2%   25008117           -87.3%   24874649        numa-vmstat.node3.numa_hit
 1.956e+08 ±  6%     -87.3%   24922447           -87.3%   24784657        numa-vmstat.node3.numa_local
    191746           -45.9%     103734           -46.2%     103228        proc-vmstat.nr_active_anon
     97888           +12.1%     109757            +8.5%     106185        proc-vmstat.nr_anon_pages
    947825            -8.5%     867659            -8.6%     866533        proc-vmstat.nr_file_pages
    105444           +20.7%     127227           +14.9%     121113        proc-vmstat.nr_inactive_anon
     17130 ±  7%     +88.9%      32365 ±  2%     +83.4%      31420 ±  2%  proc-vmstat.nr_mapped
      4007            +4.2%       4176            +4.1%       4170        proc-vmstat.nr_page_table_pages
    199322           -40.2%     119155           -40.8%     118031        proc-vmstat.nr_shmem
    142294           -24.0%     108161           -24.1%     107954        proc-vmstat.nr_slab_unreclaimable
    191746           -45.9%     103734           -46.2%     103228        proc-vmstat.nr_zone_active_anon
    105444           +20.7%     127223           +14.9%     121106        proc-vmstat.nr_zone_inactive_anon
     40186 ± 13%     +65.0%      66320 ±  5%     +60.2%      64374 ± 13%  proc-vmstat.numa_hint_faults
     20248 ± 39%    +108.3%      42185 ± 12%    +102.6%      41033 ± 10%  proc-vmstat.numa_hint_faults_local
 7.623e+08           -86.8%  1.005e+08           -86.9%  1.002e+08        proc-vmstat.numa_hit
  7.62e+08           -86.9%  1.002e+08           -86.9%   99786408        proc-vmstat.numa_local
    181538 ±  6%     +49.5%     271428 ±  3%     +48.9%     270328 ±  6%  proc-vmstat.numa_pte_updates
    152652 ±  7%     -28.6%     108996           -29.6%     107396        proc-vmstat.pgactivate
 7.993e+08         +3068.4%  2.533e+10         +3055.6%  2.522e+10        proc-vmstat.pgalloc_normal
  3.72e+08           -86.4%   50632612           -86.4%   50429200        proc-vmstat.pgfault
  7.99e+08         +3069.7%  2.533e+10         +3056.9%  2.522e+10        proc-vmstat.pgfree
     48.75 ±  2%    +1e+08%   49362627          +1e+08%   49162408        proc-vmstat.thp_fault_alloc
  21789703 ± 10%     -20.1%   17410551 ±  7%     -18.9%   17673460 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.max
    427573 ± 99%   +1126.7%    5245182 ± 17%   +1104.4%    5149659 ± 13%  sched_debug.cfs_rq:/.avg_vruntime.min
   4757464 ± 10%     -48.3%    2458136 ± 19%     -46.6%    2539001 ± 11%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.44 ±  2%     -15.9%       0.37 ±  2%     -16.6%       0.37 ±  3%  sched_debug.cfs_rq:/.h_nr_running.stddev
    299205 ± 38%     +59.3%     476493 ± 27%     +50.6%     450561 ± 42%  sched_debug.cfs_rq:/.load.max
  21789703 ± 10%     -20.1%   17410551 ±  7%     -18.9%   17673460 ±  4%  sched_debug.cfs_rq:/.min_vruntime.max
    427573 ± 99%   +1126.7%    5245182 ± 17%   +1104.4%    5149659 ± 13%  sched_debug.cfs_rq:/.min_vruntime.min
   4757464 ± 10%     -48.3%    2458136 ± 19%     -46.6%    2539001 ± 11%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.44 ±  2%     -16.0%       0.37 ±  2%     -17.2%       0.36 ±  2%  sched_debug.cfs_rq:/.nr_running.stddev
    446.75 ±  2%     -18.4%     364.71 ±  2%     -19.3%     360.46 ±  2%  sched_debug.cfs_rq:/.runnable_avg.stddev
    445.25 ±  2%     -18.4%     363.46 ±  2%     -19.3%     359.33 ±  2%  sched_debug.cfs_rq:/.util_avg.stddev
    946.71 ±  3%     -14.7%     807.54 ±  4%     -15.4%     800.58 ±  7%  sched_debug.cfs_rq:/.util_est_enqueued.max
    281.39 ±  7%     -31.2%     193.63 ±  4%     -32.0%     191.24 ±  7%  sched_debug.cfs_rq:/.util_est_enqueued.stddev
   1131635 ±  7%     +73.7%    1965577 ±  6%     +76.5%    1997455 ±  7%  sched_debug.cpu.avg_idle.max
    223539 ± 16%    +165.4%     593172 ±  7%    +146.0%     549906 ± 11%  sched_debug.cpu.avg_idle.min
     83325 ±  4%     +64.3%     136927 ±  9%     +69.7%     141399 ± 11%  sched_debug.cpu.avg_idle.stddev
     17.57 ±  6%    +594.5%     122.01 ±  3%    +588.0%     120.88 ±  3%  sched_debug.cpu.clock.stddev
    873.33           -11.1%     776.19           -11.8%     770.20        sched_debug.cpu.clock_task.stddev
      2870           -18.1%       2351           -17.4%       2371        sched_debug.cpu.curr->pid.avg
      3003           -12.5%       2627           -12.4%       2630        sched_debug.cpu.curr->pid.stddev
    550902 ±  6%     +74.4%     960871 ±  6%     +79.8%     990291 ±  8%  sched_debug.cpu.max_idle_balance_cost.max
      4451 ± 59%   +1043.9%      50917 ± 15%   +1129.4%      54721 ± 15%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.00 ± 17%    +385.8%       0.00 ± 34%    +315.7%       0.00 ±  3%  sched_debug.cpu.next_balance.stddev
      0.43           -17.5%       0.35           -16.8%       0.35        sched_debug.cpu.nr_running.avg
      1.15 ±  8%     +25.0%       1.44 ±  8%     +30.4%       1.50 ± 13%  sched_debug.cpu.nr_running.max
      0.45           -14.4%       0.39           -14.2%       0.39 ±  2%  sched_debug.cpu.nr_running.stddev
      3280 ±  5%     +32.5%       4345           +34.5%       4412        sched_debug.cpu.nr_switches.avg
    846.82 ± 11%    +109.9%       1777 ± 12%    +112.4%       1799 ±  4%  sched_debug.cpu.nr_switches.min
      0.03 ±173%    +887.2%       0.30 ± 73%    +521.1%       0.19 ± 35%  sched_debug.rt_rq:.rt_time.avg
      6.79 ±173%    +887.2%      67.01 ± 73%    +521.1%      42.16 ± 35%  sched_debug.rt_rq:.rt_time.max
      0.45 ±173%    +887.2%       4.47 ± 73%    +521.1%       2.81 ± 35%  sched_debug.rt_rq:.rt_time.stddev
      4.65           +28.0%       5.96           +28.5%       5.98        perf-stat.i.MPKI
 8.721e+09           -71.0%  2.532e+09           -71.1%  2.523e+09        perf-stat.i.branch-instructions
      0.34            +0.1        0.48            +0.1        0.48        perf-stat.i.branch-miss-rate%
  30145441           -58.6%   12471062           -58.6%   12487542        perf-stat.i.branch-misses
     33.52           -15.3       18.20           -15.2       18.27        perf-stat.i.cache-miss-rate%
 1.819e+08           -58.8%   74947458           -58.8%   74903072        perf-stat.i.cache-misses
 5.429e+08 ±  2%     -24.1%  4.123e+08           -24.4%  4.103e+08        perf-stat.i.cache-references
      3041           +48.6%       4518           +49.7%       4552        perf-stat.i.context-switches
     10.96          +212.9%      34.28          +214.1%      34.41        perf-stat.i.cpi
    309.29           -11.2%     274.59           -11.3%     274.20        perf-stat.i.cpu-migrations
      2354          +144.6%       5758          +144.7%       5761        perf-stat.i.cycles-between-cache-misses
      0.13            -0.1        0.01 ±  3%      -0.1        0.01 ±  3%  perf-stat.i.dTLB-load-miss-rate%
  12852209 ±  2%     -98.0%     261197 ±  3%     -97.9%     263864 ±  3%  perf-stat.i.dTLB-load-misses
  9.56e+09           -69.3%  2.932e+09           -69.4%  2.922e+09        perf-stat.i.dTLB-loads
      0.12            -0.1        0.03            -0.1        0.03        perf-stat.i.dTLB-store-miss-rate%
   5083186           -86.3%     693971           -86.4%     690328        perf-stat.i.dTLB-store-misses
 4.209e+09           -44.9%  2.317e+09           -45.2%  2.308e+09        perf-stat.i.dTLB-stores
     76.33           -39.7       36.61           -39.7       36.59        perf-stat.i.iTLB-load-miss-rate%
  18717931           -80.1%    3715941           -80.2%    3698121        perf-stat.i.iTLB-load-misses
   5758034            +7.7%    6202790            +7.4%    6183041        perf-stat.i.iTLB-loads
 3.914e+10           -67.8%  1.261e+10           -67.9%  1.256e+10        perf-stat.i.instructions
      2107           +73.9%       3663           +73.6%       3658        perf-stat.i.instructions-per-iTLB-miss
      0.09           -67.9%       0.03           -68.1%       0.03        perf-stat.i.ipc
    269.39           +10.6%     297.91           +10.7%     298.33        perf-stat.i.metric.K/sec
    102.78           -64.5%      36.54           -64.6%      36.40        perf-stat.i.metric.M/sec
   1234832           -86.4%     167556           -86.5%     166848        perf-stat.i.minor-faults
     87.25           -41.9       45.32           -42.2       45.09        perf-stat.i.node-load-miss-rate%
  25443233           -83.0%    4326696 ±  3%     -83.4%    4227985 ±  2%  perf-stat.i.node-load-misses
   3723342 ±  3%     +45.4%    5414430           +44.3%    5372545        perf-stat.i.node-loads
     79.20           -74.4        4.78           -74.5        4.74        perf-stat.i.node-store-miss-rate%
  14161911 ±  2%     -83.1%    2394469           -83.2%    2382317        perf-stat.i.node-store-misses
   3727955 ±  3%   +1181.6%   47776544         +1188.5%   48035797        perf-stat.i.node-stores
   1234832           -86.4%     167556           -86.5%     166849        perf-stat.i.page-faults
      4.65           +28.0%       5.95           +28.4%       5.97        perf-stat.overall.MPKI
      0.35            +0.1        0.49            +0.1        0.49        perf-stat.overall.branch-miss-rate%
     33.51           -15.3       18.19           -15.3       18.26        perf-stat.overall.cache-miss-rate%
     10.94          +212.3%      34.16          +213.4%      34.28        perf-stat.overall.cpi
      2354          +143.9%       5741          +144.1%       5746        perf-stat.overall.cycles-between-cache-misses
      0.13            -0.1        0.01 ±  3%      -0.1        0.01 ±  5%  perf-stat.overall.dTLB-load-miss-rate%
      0.12            -0.1        0.03            -0.1        0.03        perf-stat.overall.dTLB-store-miss-rate%
     76.49           -39.2       37.31           -39.2       37.29        perf-stat.overall.iTLB-load-miss-rate%
      2090           +63.4%       3416           +63.5%       3417        perf-stat.overall.instructions-per-iTLB-miss
      0.09           -68.0%       0.03           -68.1%       0.03        perf-stat.overall.ipc
     87.22           -43.1       44.12 ±  2%     -43.5       43.76        perf-stat.overall.node-load-miss-rate%
     79.16           -74.4        4.77           -74.4        4.72        perf-stat.overall.node-store-miss-rate%
   9549728          +140.9%   23005172          +141.1%   23022843        perf-stat.overall.path-length
 8.691e+09           -71.0%  2.519e+09           -71.1%   2.51e+09        perf-stat.ps.branch-instructions
  30118940           -59.1%   12319517           -59.1%   12327993        perf-stat.ps.branch-misses
 1.813e+08           -58.8%   74623919           -58.9%   74563289        perf-stat.ps.cache-misses
  5.41e+08 ±  2%     -24.2%  4.103e+08           -24.5%  4.085e+08        perf-stat.ps.cache-references
      3031           +47.9%       4485           +49.1%       4519        perf-stat.ps.context-switches
    307.72           -12.7%     268.59           -12.7%     268.66        perf-stat.ps.cpu-migrations
  12806734 ±  2%     -98.0%     260740 ±  4%     -97.9%     267782 ±  5%  perf-stat.ps.dTLB-load-misses
 9.528e+09           -69.4%  2.917e+09           -69.5%  2.907e+09        perf-stat.ps.dTLB-loads
   5063992           -86.4%     690720           -86.4%     687415        perf-stat.ps.dTLB-store-misses
 4.195e+09           -45.0%  2.306e+09           -45.2%  2.297e+09        perf-stat.ps.dTLB-stores
  18661026           -80.3%    3672024           -80.4%    3658006        perf-stat.ps.iTLB-load-misses
   5735379            +7.6%    6169096            +7.3%    6151755        perf-stat.ps.iTLB-loads
 3.901e+10           -67.8%  1.254e+10           -68.0%   1.25e+10        perf-stat.ps.instructions
   1230175           -86.4%     166708           -86.5%     166045        perf-stat.ps.minor-faults
  25346347           -83.0%    4299946 ±  2%     -83.4%    4203636 ±  2%  perf-stat.ps.node-load-misses
   3713652 ±  3%     +46.6%    5444481           +45.5%    5401831        perf-stat.ps.node-loads
  14107969 ±  2%     -83.1%    2381707           -83.2%    2368146        perf-stat.ps.node-store-misses
   3716359 ±  3%   +1179.6%   47556224         +1186.1%   47797289        perf-stat.ps.node-stores
   1230175           -86.4%     166708           -86.5%     166046        perf-stat.ps.page-faults
 1.176e+13           -67.9%   3.78e+12           -68.0%  3.767e+12        perf-stat.total.instructions
      0.01 ± 42%    +385.1%       0.03 ±  8%    +566.0%       0.04 ± 42%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.01 ± 17%    +354.3%       0.05 ±  8%    +402.1%       0.06 ±  8%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ± 19%    +323.1%       0.06 ± 27%    +347.1%       0.06 ± 17%  perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      0.01 ± 14%  +2.9e+05%      25.06 ±172%  +1.6e+05%      13.94 ±263%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.00 ±129%   +7133.3%       0.03 ±  7%   +7200.0%       0.03 ±  4%  perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
      0.01 ±  8%    +396.8%       0.06 ±  2%    +402.1%       0.06 ±  2%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.01 ±  9%    +256.9%       0.03 ± 10%    +232.8%       0.02 ± 13%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      0.01 ± 15%    +324.0%       0.05 ± 17%    +320.8%       0.05 ± 17%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
      0.01 ± 19%    +338.6%       0.06 ±  7%    +305.0%       0.05 ±  8%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.01 ±  9%    +298.4%       0.03 ±  2%    +304.8%       0.03        perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.01 ±  7%    +265.8%       0.03 ±  5%  +17282.9%       1.65 ±258%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.19 ± 11%     -89.3%       0.02 ± 10%     -89.4%       0.02 ± 10%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 28%    +319.8%       0.05 ± 19%    +303.0%       0.05 ± 18%  perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      0.01 ± 14%    +338.9%       0.03 ±  9%    +318.5%       0.03 ±  4%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
      0.02 ± 20%    +674.2%       0.12 ±137%    +267.5%       0.06 ± 15%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.01 ± 46%    +256.9%       0.03 ± 11%   +1095.8%       0.11 ±112%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.02 ± 28%    +324.6%       0.07 ±  8%    +353.2%       0.07 ±  9%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.02 ± 21%    +318.4%       0.07 ± 25%    +389.6%       0.08 ± 26%  perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      0.01 ± 26%  +1.9e+06%     250.13 ±173%  +9.7e+05%     125.09 ±264%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.02 ± 25%    +585.6%       0.11 ± 63%    +454.5%       0.09 ± 31%  perf-sched.sch_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
      0.04 ± 39%    +159.0%       0.11 ±  6%    +190.0%       0.13 ± 10%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.01 ± 29%    +312.9%       0.06 ± 19%    +401.7%       0.07 ± 13%  perf-sched.sch_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
      0.02 ± 25%    +216.8%       0.06 ± 36%    +166.4%       0.05 ±  7%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      0.01 ± 21%    +345.8%       0.07 ± 26%    +298.3%       0.06 ± 18%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
      0.03 ± 35%    +190.2%       0.07 ± 16%    +187.8%       0.07 ± 11%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.02 ± 19%    +220.8%       0.07 ± 23%  +2.9e+05%      63.06 ±263%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      4.60 ±  5%     -10.7%       4.11 ±  8%     -13.4%       3.99        perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ± 32%    +368.0%       0.07 ± 25%    +346.9%       0.07 ± 20%  perf-sched.sch_delay.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
    189.60           -32.9%     127.16           -33.0%     126.98        perf-sched.total_wait_and_delay.average.ms
     11265 ±  3%     +73.7%      19568 ±  3%     +71.1%      19274        perf-sched.total_wait_and_delay.count.ms
    189.18           -32.9%     126.97           -33.0%     126.81        perf-sched.total_wait_time.average.ms
      0.50 ± 20%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.50 ± 11%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
      0.43 ± 16%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0
     52.33 ± 31%    +223.4%     169.23 ±  7%    +226.5%     170.86 ±  2%  perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
      0.51 ± 18%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
     28.05 ±  4%     +27.8%      35.84 ±  4%     +26.0%      35.34 ±  8%  perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      2.08 ±  3%     +33.2%       2.76           +32.9%       2.76 ±  2%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    491.80           -53.6%     227.96 ±  3%     -53.5%     228.58 ±  2%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    222.00 ±  9%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      8.75 ± 33%     -84.3%       1.38 ±140%     -82.9%       1.50 ± 57%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1065 ±  3%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
    538.25 ±  9%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.unmap_vmas.unmap_region.constprop.0
    307.75 ±  6%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      2458 ±  3%     -20.9%       1944 ±  4%     -20.5%       1954 ±  7%  perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
      2577 ±  5%    +168.6%       6921 ±  4%    +165.0%       6829 ±  2%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      7.07 ±172%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      1730 ± 24%     -77.9%     382.66 ±117%     -50.1%     862.68 ± 89%  perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     34.78 ± 43%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
      8.04 ±179%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0
      9.47 ±134%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      3.96 ±  6%     +60.6%       6.36 ±  5%     +58.3%       6.27 ±  6%  perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.42 ± 27%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
      0.50 ± 20%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.51 ± 17%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault
      0.59 ± 17%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault
      0.46 ± 31%     -63.3%       0.17 ± 18%     -67.7%       0.15 ± 15%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap
      0.50 ± 11%     -67.8%       0.16 ±  8%     -67.6%       0.16 ±  4%  perf-sched.wait_time.avg.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
      0.43 ± 16%     -63.5%       0.16 ± 10%     -62.6%       0.16 ±  4%  perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0
      0.50 ± 19%     -67.0%       0.17 ±  5%     -69.0%       0.16 ± 11%  perf-sched.wait_time.avg.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
      1.71 ±  5%     +55.9%       2.66 ±  3%     +47.3%       2.52 ±  6%  perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
     52.33 ± 31%    +223.4%     169.20 ±  7%    +226.5%     170.83 ±  2%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
      0.51 ± 18%     -67.7%       0.16 ±  5%     -68.0%       0.16 ±  6%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      0.53 ± 17%     -65.4%       0.18 ± 56%     -66.5%       0.18 ± 10%  perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
     27.63 ±  4%     +29.7%      35.83 ±  4%     +27.6%      35.27 ±  8%  perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      2.07 ±  3%     +32.1%       2.73           +31.9%       2.73 ±  2%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    491.61           -53.6%     227.94 ±  3%     -53.5%     228.56 ±  2%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1.72 ±  5%     +58.1%       2.73 ±  3%     +50.4%       2.59 ±  7%  perf-sched.wait_time.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      1.42 ± 21%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
      7.07 ±172%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      1.66 ± 27%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault
      2.05 ± 57%    -100.0%       0.00          -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault
      1.69 ± 20%     -84.6%       0.26 ± 25%     -86.0%       0.24 ±  6%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap
      1730 ± 24%     -76.3%     409.21 ±104%     -50.1%     862.65 ± 89%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     34.78 ± 43%     -98.9%       0.38 ± 12%     -98.8%       0.41 ± 10%  perf-sched.wait_time.max.ms.__cond_resched.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.constprop
      8.04 ±179%     -96.0%       0.32 ± 18%     -95.7%       0.35 ± 19%  perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.unmap_region.constprop.0
      4.68 ±155%     -93.4%       0.31 ± 24%     -93.9%       0.28 ± 21%  perf-sched.wait_time.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
      3.42 ±  5%     +55.9%       5.33 ±  3%     +47.3%       5.03 ±  6%  perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
      9.47 ±134%     -96.3%       0.35 ± 17%     -96.1%       0.37 ±  8%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_exc_page_fault
      1.87 ± 10%     -60.9%       0.73 ±164%     -85.3%       0.28 ± 24%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
      2.39 ±185%     -97.8%       0.05 ±165%     -98.0%       0.05 ±177%  perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi
      3.95 ±  6%     +59.9%       6.32 ±  5%     +57.6%       6.23 ±  6%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      3.45 ±  5%     +58.1%       5.45 ±  3%     +50.4%       5.19 ±  7%  perf-sched.wait_time.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
     56.55 ±  2%     -55.1        1.45 ±  2%     -55.1        1.44 ±  2%  perf-profile.calltrace.cycles-pp.__munmap
     56.06 ±  2%     -55.1        0.96 ±  2%     -55.1        0.96 ±  2%  perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
     56.50 ±  2%     -55.1        1.44           -55.1        1.44 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     56.50 ±  2%     -55.1        1.44 ±  2%     -55.1        1.43 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     56.47 ±  2%     -55.0        1.43           -55.0        1.42 ±  2%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     56.48 ±  2%     -55.0        1.44 ±  2%     -55.0        1.43 ±  2%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     56.45 ±  2%     -55.0        1.42           -55.0        1.42 ±  2%  perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     56.40 ±  2%     -55.0        1.40 ±  2%     -55.0        1.39 ±  2%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     35.28           -34.6        0.66           -34.6        0.66        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
     35.17           -34.6        0.57           -34.6        0.57 ±  2%  perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
     35.11           -34.5        0.57           -34.5        0.56        perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
     18.40 ±  7%     -18.4        0.00           -18.4        0.00        perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     17.42 ±  7%     -17.4        0.00           -17.4        0.00        perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
     17.42 ±  7%     -17.4        0.00           -17.4        0.00        perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap.do_vmi_munmap
     17.41 ±  7%     -17.4        0.00           -17.4        0.00        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.do_vmi_align_munmap
     17.23 ±  6%     -17.2        0.00           -17.2        0.00        perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region
     16.09 ±  8%     -16.1        0.00           -16.1        0.00        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region
     16.02 ±  8%     -16.0        0.00           -16.0        0.00        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region
     15.95 ±  8%     -16.0        0.00           -16.0        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
     15.89 ±  8%     -15.9        0.00           -15.9        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush
     15.86 ±  8%     -15.9        0.00           -15.9        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
     15.82 ±  8%     -15.8        0.00           -15.8        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
      9.32 ±  9%      -9.3        0.00            -9.3        0.00        perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
      8.52 ±  8%      -8.5        0.00            -8.5        0.00        perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      7.90 ±  4%      -7.9        0.00            -7.9        0.00        perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
      7.56 ±  6%      -7.6        0.00            -7.6        0.00        perf-profile.calltrace.cycles-pp.__pte_alloc.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      7.55 ±  6%      -7.6        0.00            -7.6        0.00        perf-profile.calltrace.cycles-pp.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      6.51 ±  8%      -6.5        0.00            -6.5        0.00        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.pte_alloc_one.__pte_alloc.do_anonymous_page.__handle_mm_fault
      6.51 ±  8%      -6.5        0.00            -6.5        0.00        perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc.do_anonymous_page
      6.41 ±  8%      -6.4        0.00            -6.4        0.00        perf-profile.calltrace.cycles-pp.__memcg_kmem_charge_page.__alloc_pages.alloc_pages_mpol.pte_alloc_one.__pte_alloc
      0.00            +0.5        0.54 ±  4%      +0.6        0.55 ±  3%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page
      0.00            +0.7        0.70 ±  3%      +0.7        0.71 ±  2%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault
      0.00            +1.4        1.39            +1.4        1.38 ±  3%  perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
     19.16 ±  6%     +57.0       76.21           +57.5       76.66        perf-profile.calltrace.cycles-pp.asm_exc_page_fault
     19.09 ±  6%     +57.1       76.16           +57.5       76.61        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     19.10 ±  6%     +57.1       76.17           +57.5       76.61        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
     18.99 ±  6%     +57.1       76.14           +57.6       76.58        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     18.43 ±  7%     +57.7       76.11           +58.1       76.56        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.00           +73.0       73.00           +73.5       73.46        perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.00           +75.1       75.15           +75.6       75.60        perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.00           +75.9       75.92           +76.4       76.37        perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     58.03 ±  2%     -56.0        2.05           -56.0        2.03        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     58.02 ±  2%     -56.0        2.04           -56.0        2.02        perf-profile.children.cycles-pp.do_syscall_64
     56.57 ±  2%     -55.1        1.45 ±  2%     -55.1        1.45 ±  2%  perf-profile.children.cycles-pp.__munmap
     56.06 ±  2%     -55.1        0.97           -55.1        0.96        perf-profile.children.cycles-pp.unmap_region
     56.51 ±  2%     -55.1        1.43           -55.1        1.42 ±  2%  perf-profile.children.cycles-pp.do_vmi_munmap
     56.48 ±  2%     -55.0        1.43 ±  2%     -55.0        1.43 ±  2%  perf-profile.children.cycles-pp.__vm_munmap
     56.48 ±  2%     -55.0        1.44 ±  2%     -55.0        1.43 ±  2%  perf-profile.children.cycles-pp.__x64_sys_munmap
     56.40 ±  2%     -55.0        1.40           -55.0        1.39 ±  2%  perf-profile.children.cycles-pp.do_vmi_align_munmap
     35.28           -34.6        0.66           -34.6        0.66        perf-profile.children.cycles-pp.tlb_finish_mmu
     35.18           -34.6        0.58           -34.6        0.57        perf-profile.children.cycles-pp.tlb_batch_pages_flush
     35.16           -34.6        0.57           -34.6        0.57        perf-profile.children.cycles-pp.release_pages
     32.12 ±  8%     -32.1        0.05           -32.1        0.04 ± 37%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     31.85 ±  8%     -31.8        0.06           -31.8        0.06 ±  5%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     31.74 ±  8%     -31.7        0.00           -31.7        0.00        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     18.40 ±  7%     -18.4        0.00           -18.4        0.00        perf-profile.children.cycles-pp.do_anonymous_page
     17.43 ±  7%     -17.4        0.00           -17.4        0.00        perf-profile.children.cycles-pp.lru_add_drain
     17.43 ±  7%     -17.4        0.00           -17.4        0.00        perf-profile.children.cycles-pp.lru_add_drain_cpu
     17.43 ±  7%     -17.3        0.10 ±  5%     -17.3        0.10 ±  3%  perf-profile.children.cycles-pp.folio_batch_move_lru
     17.23 ±  6%     -17.2        0.00           -17.2        0.00        perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
      9.32 ±  9%      -9.3        0.00            -9.3        0.00        perf-profile.children.cycles-pp.uncharge_folio
      8.57 ±  8%      -8.4        0.16 ±  4%      -8.4        0.15 ±  4%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      7.90 ±  4%      -7.8        0.14 ±  5%      -7.8        0.14 ±  4%  perf-profile.children.cycles-pp.uncharge_batch
      7.57 ±  6%      -7.6        0.00            -7.6        0.00        perf-profile.children.cycles-pp.__pte_alloc
      7.55 ±  6%      -7.4        0.16 ±  3%      -7.4        0.16 ±  3%  perf-profile.children.cycles-pp.pte_alloc_one
      6.54 ±  2%      -6.5        0.00            -6.5        0.00        perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      6.59 ±  8%      -6.4        0.22 ±  2%      -6.4        0.22 ±  3%  perf-profile.children.cycles-pp.alloc_pages_mpol
      6.58 ±  8%      -6.4        0.21 ±  2%      -6.4        0.22 ±  2%  perf-profile.children.cycles-pp.__alloc_pages
      6.41 ±  8%      -6.3        0.07 ±  5%      -6.3        0.07 ±  5%  perf-profile.children.cycles-pp.__memcg_kmem_charge_page
      4.48 ±  2%      -4.3        0.18 ±  4%      -4.3        0.18 ±  3%  perf-profile.children.cycles-pp.__mod_lruvec_page_state
      3.08 ±  4%      -3.0        0.09 ±  7%      -3.0        0.09 ±  6%  perf-profile.children.cycles-pp.page_counter_uncharge
      1.74 ±  8%      -1.6        0.10            -1.6        0.10 ±  4%  perf-profile.children.cycles-pp.kmem_cache_alloc
      1.72 ±  2%      -1.5        0.23 ±  2%      -1.5        0.23 ±  4%  perf-profile.children.cycles-pp.unmap_vmas
      1.71 ±  2%      -1.5        0.22 ±  3%      -1.5        0.22 ±  4%  perf-profile.children.cycles-pp.unmap_page_range
      1.70 ±  2%      -1.5        0.21 ±  3%      -1.5        0.21 ±  4%  perf-profile.children.cycles-pp.zap_pmd_range
      1.36 ± 16%      -1.3        0.09 ±  4%      -1.3        0.09 ±  4%  perf-profile.children.cycles-pp.native_irq_return_iret
      1.18 ±  2%      -1.1        0.08 ±  7%      -1.1        0.08 ±  5%  perf-profile.children.cycles-pp.page_remove_rmap
      1.16 ±  2%      -1.1        0.08 ±  4%      -1.1        0.07 ±  6%  perf-profile.children.cycles-pp.folio_add_new_anon_rmap
      1.45 ±  6%      -1.0        0.44 ±  2%      -1.0        0.44 ±  2%  perf-profile.children.cycles-pp.__mmap
      1.05            -1.0        0.06 ±  7%      -1.0        0.06 ±  7%  perf-profile.children.cycles-pp.lru_add_fn
      1.03 ±  7%      -1.0        0.04 ± 37%      -1.0        0.04 ± 37%  perf-profile.children.cycles-pp.__anon_vma_prepare
      1.38 ±  6%      -1.0        0.42 ±  3%      -1.0        0.42 ±  2%  perf-profile.children.cycles-pp.vm_mmap_pgoff
      1.33 ±  6%      -0.9        0.40 ±  2%      -0.9        0.40 ±  2%  perf-profile.children.cycles-pp.do_mmap
      0.93 ± 11%      -0.9        0.03 ± 77%      -0.9        0.02 ±100%  perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
      1.17 ±  7%      -0.8        0.34 ±  2%      -0.8        0.34 ±  2%  perf-profile.children.cycles-pp.mmap_region
      0.87 ±  5%      -0.8        0.06 ±  5%      -0.8        0.06 ±  9%  perf-profile.children.cycles-pp.kmem_cache_free
      0.89 ±  5%      -0.7        0.19 ±  4%      -0.7        0.20 ±  2%  perf-profile.children.cycles-pp.rcu_do_batch
      0.89 ±  5%      -0.7        0.20 ±  4%      -0.7        0.20 ±  3%  perf-profile.children.cycles-pp.rcu_core
      0.90 ±  5%      -0.7        0.21 ±  4%      -0.7        0.21 ±  2%  perf-profile.children.cycles-pp.__do_softirq
      0.74 ±  6%      -0.7        0.06 ±  5%      -0.7        0.06 ±  8%  perf-profile.children.cycles-pp.irq_exit_rcu
      0.72 ± 10%      -0.7        0.06 ±  5%      -0.7        0.06 ±  7%  perf-profile.children.cycles-pp.vm_area_alloc
      1.01 ±  4%      -0.4        0.61 ±  4%      -0.4        0.61 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.14 ±  5%      -0.1        0.02 ±100%      -0.1        0.02 ±100%  perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
      0.16 ±  9%      -0.1        0.07 ±  7%      -0.1        0.07        perf-profile.children.cycles-pp.__slab_free
      0.15 ±  3%      -0.1        0.06 ±  5%      -0.1        0.06 ±  5%  perf-profile.children.cycles-pp.get_unmapped_area
      0.08 ± 22%      -0.0        0.05 ± 41%      -0.0        0.04 ± 37%  perf-profile.children.cycles-pp.generic_perform_write
      0.08 ± 22%      -0.0        0.05 ± 41%      -0.0        0.04 ± 38%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.09 ± 22%      -0.0        0.05 ± 43%      -0.0        0.05 ±  9%  perf-profile.children.cycles-pp.record__pushfn
      0.09 ± 22%      -0.0        0.05 ± 43%      -0.0        0.05 ±  9%  perf-profile.children.cycles-pp.writen
      0.09 ± 22%      -0.0        0.05 ± 43%      -0.0        0.05 ±  9%  perf-profile.children.cycles-pp.__libc_write
      0.11 ±  8%      -0.0        0.07 ±  6%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.rcu_cblist_dequeue
      0.16 ±  7%      -0.0        0.13 ±  4%      -0.0        0.13 ±  3%  perf-profile.children.cycles-pp.try_charge_memcg
      0.09 ± 22%      -0.0        0.07 ± 18%      -0.0        0.06 ±  8%  perf-profile.children.cycles-pp.vfs_write
      0.09 ± 22%      -0.0        0.07 ± 18%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.ksys_write
      0.15 ±  4%      -0.0        0.13 ±  3%      -0.0        0.13 ±  2%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.09            -0.0        0.08 ±  4%      -0.0        0.08        perf-profile.children.cycles-pp.flush_tlb_mm_range
      0.06            +0.0        0.09 ±  4%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.rcu_all_qs
      0.17 ±  6%      +0.0        0.20 ±  4%      +0.0        0.20 ±  3%  perf-profile.children.cycles-pp.kthread
      0.17 ±  6%      +0.0        0.20 ±  4%      +0.0        0.20 ±  3%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.17 ±  6%      +0.0        0.20 ±  4%      +0.0        0.20 ±  3%  perf-profile.children.cycles-pp.ret_from_fork
      0.12 ±  4%      +0.0        0.16 ±  3%      +0.0        0.16 ±  2%  perf-profile.children.cycles-pp.mas_store_prealloc
      0.08 ±  6%      +0.0        0.12 ±  2%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.vma_alloc_folio
      0.00            +0.0        0.04 ± 37%      +0.1        0.05        perf-profile.children.cycles-pp.memcg_check_events
      0.00            +0.0        0.04 ± 37%      +0.1        0.05        perf-profile.children.cycles-pp.thp_get_unmapped_area
      0.00            +0.1        0.05            +0.0        0.04 ± 57%  perf-profile.children.cycles-pp.free_tail_page_prepare
      0.00            +0.1        0.05            +0.1        0.05        perf-profile.children.cycles-pp.mas_destroy
      0.00            +0.1        0.05 ±  9%      +0.1        0.05 ±  9%  perf-profile.children.cycles-pp.update_load_avg
      0.00            +0.1        0.06 ±  7%      +0.1        0.07 ±  7%  perf-profile.children.cycles-pp.native_flush_tlb_one_user
      0.00            +0.1        0.07 ±  7%      +0.1        0.07 ±  6%  perf-profile.children.cycles-pp.__page_cache_release
      0.00            +0.1        0.07 ±  4%      +0.1        0.07 ±  5%  perf-profile.children.cycles-pp.mas_topiary_replace
      0.08 ±  5%      +0.1        0.16 ±  3%      +0.1        0.15 ±  3%  perf-profile.children.cycles-pp.mas_alloc_nodes
      0.00            +0.1        0.08 ±  4%      +0.1        0.08 ±  6%  perf-profile.children.cycles-pp.prep_compound_page
      0.08 ±  6%      +0.1        0.17 ±  5%      +0.1        0.18 ±  5%  perf-profile.children.cycles-pp.task_tick_fair
      0.00            +0.1        0.10 ±  5%      +0.1        0.10 ±  4%  perf-profile.children.cycles-pp.folio_add_lru_vma
      0.00            +0.1        0.11 ±  4%      +0.1        0.11 ±  5%  perf-profile.children.cycles-pp.__kmem_cache_alloc_bulk
      0.00            +0.1        0.12 ±  2%      +0.1        0.12 ±  3%  perf-profile.children.cycles-pp.kmem_cache_alloc_bulk
      0.00            +0.1        0.13 ±  3%      +0.1        0.13 ±  2%  perf-profile.children.cycles-pp.mas_split
      0.00            +0.1        0.13            +0.1        0.13 ±  3%  perf-profile.children.cycles-pp._raw_spin_lock
      0.11 ±  4%      +0.1        0.24 ±  3%      +0.1        0.25 ±  4%  perf-profile.children.cycles-pp.scheduler_tick
      0.00            +0.1        0.14 ±  4%      +0.1        0.14 ±  5%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge
      0.00            +0.1        0.14 ±  3%      +0.1        0.14 ±  3%  perf-profile.children.cycles-pp.mas_wr_bnode
      0.00            +0.1        0.14 ±  5%      +0.1        0.14 ±  3%  perf-profile.children.cycles-pp.destroy_large_folio
      0.00            +0.1        0.15 ±  4%      +0.1        0.15 ±  4%  perf-profile.children.cycles-pp.mas_spanning_rebalance
      0.00            +0.1        0.15 ±  2%      +0.2        0.15 ±  4%  perf-profile.children.cycles-pp.zap_huge_pmd
      0.00            +0.2        0.17 ±  3%      +0.2        0.17 ±  3%  perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page
      0.19 ±  3%      +0.2        0.38            +0.2        0.38 ±  2%  perf-profile.children.cycles-pp.mas_store_gfp
      0.00            +0.2        0.19 ±  3%      +0.2        0.18 ±  4%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.00            +0.2        0.20 ±  3%      +0.2        0.20 ±  4%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.12 ±  3%      +0.2        0.35            +0.2        0.36 ±  3%  perf-profile.children.cycles-pp.update_process_times
      0.12 ±  3%      +0.2        0.36 ±  2%      +0.2        0.36 ±  2%  perf-profile.children.cycles-pp.tick_sched_handle
      0.14 ±  3%      +0.2        0.39            +0.3        0.40 ±  4%  perf-profile.children.cycles-pp.tick_nohz_highres_handler
      0.27 ±  2%      +0.3        0.52 ±  3%      +0.3        0.52 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.27 ±  2%      +0.3        0.52 ±  4%      +0.3        0.53 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.21 ±  4%      +0.3        0.48 ±  3%      +0.3        0.48 ±  2%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.00            +0.3        0.31 ±  2%      +0.3        0.31 ±  3%  perf-profile.children.cycles-pp.mas_wr_spanning_store
      0.00            +0.4        0.38            +0.4        0.38 ±  2%  perf-profile.children.cycles-pp.free_unref_page_prepare
      0.00            +0.4        0.39            +0.4        0.40        perf-profile.children.cycles-pp.free_unref_page
      0.13 ±  4%      +1.3        1.42            +1.3        1.41 ±  3%  perf-profile.children.cycles-pp.__cond_resched
     19.19 ±  6%     +57.0       76.23           +57.5       76.68        perf-profile.children.cycles-pp.asm_exc_page_fault
     19.11 ±  6%     +57.1       76.18           +57.5       76.63        perf-profile.children.cycles-pp.exc_page_fault
     19.10 ±  6%     +57.1       76.18           +57.5       76.62        perf-profile.children.cycles-pp.do_user_addr_fault
     19.00 ±  6%     +57.1       76.15           +57.6       76.59        perf-profile.children.cycles-pp.handle_mm_fault
     18.44 ±  7%     +57.7       76.12           +58.1       76.57        perf-profile.children.cycles-pp.__handle_mm_fault
      0.06 ±  9%     +73.3       73.38           +73.8       73.84        perf-profile.children.cycles-pp.clear_page_erms
      0.00           +75.2       75.25           +75.7       75.70        perf-profile.children.cycles-pp.clear_huge_page
      0.00           +75.9       75.92           +76.4       76.37        perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
     31.74 ±  8%     -31.7        0.00           -31.7        0.00        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      9.22 ±  9%      -9.2        0.00            -9.2        0.00        perf-profile.self.cycles-pp.uncharge_folio
      6.50 ±  2%      -6.5        0.00            -6.5        0.00        perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      5.56 ±  9%      -5.6        0.00            -5.6        0.00        perf-profile.self.cycles-pp.__memcg_kmem_charge_page
      1.94 ±  4%      -1.9        0.08 ±  8%      -1.9        0.08 ±  7%  perf-profile.self.cycles-pp.page_counter_uncharge
      1.36 ± 16%      -1.3        0.09 ±  4%      -1.3        0.09 ±  4%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.16 ±  9%      -0.1        0.07 ±  7%      -0.1        0.07        perf-profile.self.cycles-pp.__slab_free
      0.10 ±  8%      -0.0        0.07 ±  6%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.rcu_cblist_dequeue
      0.07 ±  7%      +0.0        0.08 ±  5%      +0.0        0.08 ±  7%  perf-profile.self.cycles-pp.page_counter_try_charge
      0.00            +0.1        0.06 ±  7%      +0.1        0.07 ±  7%  perf-profile.self.cycles-pp.native_flush_tlb_one_user
      0.01 ±264%      +0.1        0.07 ±  4%      +0.1        0.07        perf-profile.self.cycles-pp.rcu_all_qs
      0.00            +0.1        0.07 ±  4%      +0.1        0.07 ±  4%  perf-profile.self.cycles-pp.__do_huge_pmd_anonymous_page
      0.00            +0.1        0.08 ±  6%      +0.1        0.08 ±  6%  perf-profile.self.cycles-pp.prep_compound_page
      0.00            +0.1        0.08 ±  5%      +0.1        0.08 ±  6%  perf-profile.self.cycles-pp.__kmem_cache_alloc_bulk
      0.00            +0.1        0.13 ±  2%      +0.1        0.13 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock
      0.00            +0.2        0.18 ±  3%      +0.2        0.18 ±  4%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.00            +0.3        0.30 ±  2%      +0.3        0.30        perf-profile.self.cycles-pp.free_unref_page_prepare
      0.00            +0.6        0.58 ±  3%      +0.6        0.58 ±  5%  perf-profile.self.cycles-pp.clear_huge_page
      0.08 ±  4%      +1.2        1.25            +1.2        1.24 ±  4%  perf-profile.self.cycles-pp.__cond_resched
      0.05 ±  9%     +72.8       72.81           +73.2       73.26        perf-profile.self.cycles-pp.clear_page_erms

[-- Attachment #4: phoronix-regressions --]
[-- Type: text/plain, Size: 37812 bytes --]

(10)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
      6787            -2.9%       6592            -2.9%       6589        vmstat.system.cs
      0.18 ± 23%      -0.0        0.15 ± 44%      -0.1        0.12 ± 23%  perf-profile.children.cycles-pp.get_next_timer_interrupt
      0.08 ± 49%      +0.1        0.15 ± 16%      +0.0        0.08 ± 61%  perf-profile.self.cycles-pp.ct_kernel_enter
    352936           +42.1%     501525            +6.9%     377117        meminfo.AnonHugePages
    518885           +26.2%     654716            -2.1%     508198        meminfo.AnonPages
   1334861           +11.4%    1486492            -0.9%    1322775        meminfo.Inactive(anon)
      1.51            -0.1        1.45            -0.1        1.46        turbostat.C1E%
     24.23            -1.2%      23.93            -0.7%      24.05        turbostat.CorWatt
      2.64            -4.4%       2.52            -4.3%       2.53        turbostat.Pkg%pc2
     25.40            -1.3%      25.06            -0.9%      25.18        turbostat.PkgWatt
      3.30            -2.8%       3.20            -2.9%       3.20        turbostat.RAMWatt
     20115            -4.5%      19211            -4.5%      19217        phoronix-test-suite.ramspeed.Add.Integer.mb_s
    284.00            +3.5%     293.95            +3.5%     293.96        phoronix-test-suite.time.elapsed_time
    284.00            +3.5%     293.95            +3.5%     293.96        phoronix-test-suite.time.elapsed_time.max
    120322            +1.6%     122291            -0.2%     120098        phoronix-test-suite.time.maximum_resident_set_size
    281626           -54.7%     127627           -54.7%     127530        phoronix-test-suite.time.minor_page_faults
    259.16            +4.2%     270.02            +4.1%     269.86        phoronix-test-suite.time.user_time
    284.00            +3.5%     293.95            +3.5%     293.96        time.elapsed_time
    284.00            +3.5%     293.95            +3.5%     293.96        time.elapsed_time.max
    120322            +1.6%     122291            -0.2%     120098        time.maximum_resident_set_size
    281626           -54.7%     127627           -54.7%     127530        time.minor_page_faults
      1.72            -7.6%       1.59            -7.2%       1.60        time.system_time
    259.16            +4.2%     270.02            +4.1%     269.86        time.user_time
    129720           +26.2%     163681            -2.1%     127047        proc-vmstat.nr_anon_pages
    172.33           +42.1%     244.89            +6.8%     184.14        proc-vmstat.nr_anon_transparent_hugepages
    360027            -1.0%     356428            +0.1%     360507        proc-vmstat.nr_dirty_background_threshold
    720935            -1.0%     713729            +0.1%     721897        proc-vmstat.nr_dirty_threshold
   3328684            -1.1%    3292559            +0.1%    3333390        proc-vmstat.nr_free_pages
    333715           +11.4%     371625            -0.9%     330692        proc-vmstat.nr_inactive_anon
      1732            +5.1%       1820            +4.8%       1816        proc-vmstat.nr_page_table_pages
    333715           +11.4%     371625            -0.9%     330692        proc-vmstat.nr_zone_inactive_anon
    855883           -34.6%     560138           -34.9%     557459        proc-vmstat.numa_hit
    855859           -34.6%     560157           -34.9%     557429        proc-vmstat.numa_local
   5552895            +1.1%    5611662            +0.1%    5559236        proc-vmstat.pgalloc_normal
   1080638           -26.7%     792254           -27.0%     788881        proc-vmstat.pgfault
    109646            +3.0%     112918            +2.6%     112483        proc-vmstat.pgreuse
      9026            +7.6%       9714            +6.6%       9619        proc-vmstat.thp_fault_alloc
 1.165e+08            -3.6%  1.123e+08            -3.3%  1.126e+08        perf-stat.i.branch-instructions
      3.38            +0.1        3.45            +0.1        3.49        perf-stat.i.branch-miss-rate%
  4.13e+08            -2.7%  4.018e+08            -2.9%  4.011e+08        perf-stat.i.cache-misses
 5.336e+08            -2.3%  5.212e+08            -2.4%  5.206e+08        perf-stat.i.cache-references
      6824            -2.9%       6629            -2.9%       6624        perf-stat.i.context-switches
      4.05            +3.8%       4.20            +3.7%       4.20        perf-stat.i.cpi
    447744 ±  3%     -17.3%     370369 ±  3%     -15.0%     380580        perf-stat.i.dTLB-load-misses
 1.119e+09            -3.3%  1.082e+09            -3.4%  1.081e+09        perf-stat.i.dTLB-loads
      0.02 ± 10%      -0.0        0.01 ± 14%      -0.0        0.01 ±  3%  perf-stat.i.dTLB-store-miss-rate%
     84207 ±  7%     -58.4%      35034 ± 13%     -55.8%      37210 ±  2%  perf-stat.i.dTLB-store-misses
 7.312e+08            -3.3%  7.069e+08            -3.4%  7.065e+08        perf-stat.i.dTLB-stores
    127863            -2.8%     124330            -3.6%     123263        perf-stat.i.iTLB-load-misses
    145042            -2.5%     141459            -3.0%     140719        perf-stat.i.iTLB-loads
 2.393e+09            -3.3%  2.313e+09            -3.4%  2.313e+09        perf-stat.i.instructions
      0.28            -3.9%       0.27            -3.7%       0.27        perf-stat.i.ipc
    220.56            -3.0%     213.92            -3.1%     213.80        perf-stat.i.metric.M/sec
      3580           -31.0%       2470           -30.9%       2476        perf-stat.i.minor-faults
  49017829            +2.1%   50065997            +2.1%   50037948        perf-stat.i.node-loads
  98043570            -2.7%   95377592            -2.9%   95180579        perf-stat.i.node-stores
      3585           -31.0%       2474           -30.8%       2480        perf-stat.i.page-faults
      3.64            +3.8%       3.78            +3.8%       3.78        perf-stat.overall.cpi
     21.10            +3.2%      21.77            +3.3%      21.79        perf-stat.overall.cycles-between-cache-misses
      0.04 ±  3%      -0.0        0.03 ±  3%      -0.0        0.04        perf-stat.overall.dTLB-load-miss-rate%
      0.01 ±  7%      -0.0        0.00 ± 13%      -0.0        0.01 ±  2%  perf-stat.overall.dTLB-store-miss-rate%
      0.27            -3.7%       0.26            -3.7%       0.26        perf-stat.overall.ipc
  1.16e+08            -3.6%  1.119e+08            -3.3%  1.121e+08        perf-stat.ps.branch-instructions
 4.117e+08            -2.7%  4.006e+08            -2.9%  3.999e+08        perf-stat.ps.cache-misses
 5.319e+08            -2.3%  5.195e+08            -2.4%   5.19e+08        perf-stat.ps.cache-references
      6798            -2.8%       6605            -2.9%       6600        perf-stat.ps.context-switches
    446139 ±  3%     -17.3%     369055 ±  3%     -15.0%     379224        perf-stat.ps.dTLB-load-misses
 1.115e+09            -3.3%  1.078e+09            -3.4%  1.078e+09        perf-stat.ps.dTLB-loads
     83922 ±  7%     -58.4%      34908 ± 13%     -55.8%      37075 ±  2%  perf-stat.ps.dTLB-store-misses
 7.288e+08            -3.3%  7.047e+08            -3.4%  7.042e+08        perf-stat.ps.dTLB-stores
    127384            -2.7%     123884            -3.6%     122817        perf-stat.ps.iTLB-load-misses
    144399            -2.4%     140903            -2.9%     140152        perf-stat.ps.iTLB-loads
 2.385e+09            -3.3%  2.306e+09            -3.4%  2.305e+09        perf-stat.ps.instructions
      3566           -31.0%       2460           -30.9%       2465        perf-stat.ps.minor-faults
  48864755            +2.1%   49912372            +2.1%   49884745        perf-stat.ps.node-loads
  97730481            -2.7%   95083043            -2.9%   94887981        perf-stat.ps.node-stores
      3571           -31.0%       2465           -30.8%       2470        perf-stat.ps.page-faults


(11)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
      6853            -2.6%       6678            -2.7%       6668        vmstat.system.cs
    353760           +40.0%     495232            +6.4%     376514        meminfo.AnonHugePages
    519691           +25.5%     652412            -2.1%     508766        meminfo.AnonPages
   1335612           +11.1%    1484265            -0.9%    1323541        meminfo.Inactive(anon)
      1.52            -0.0        1.48            -0.0        1.48        turbostat.C1E%
      2.65            -3.0%       2.57            -2.8%       2.58        turbostat.Pkg%pc2
      3.32            -2.6%       3.23            -2.6%       3.23        turbostat.RAMWatt
     19960            -2.9%      19378            -3.0%      19366        phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
    281.37            +3.0%     289.87            +3.1%     290.12        phoronix-test-suite.time.elapsed_time
    281.37            +3.0%     289.87            +3.1%     290.12        phoronix-test-suite.time.elapsed_time.max
    120220            +1.6%     122163            -0.1%     120158        phoronix-test-suite.time.maximum_resident_set_size
    281853           -54.7%     127777           -54.7%     127780        phoronix-test-suite.time.minor_page_faults
    257.32            +3.4%     265.97            +3.4%     265.99        phoronix-test-suite.time.user_time
    281.37            +3.0%     289.87            +3.1%     290.12        time.elapsed_time
    281.37            +3.0%     289.87            +3.1%     290.12        time.elapsed_time.max
    120220            +1.6%     122163            -0.1%     120158        time.maximum_resident_set_size
    281853           -54.7%     127777           -54.7%     127780        time.minor_page_faults
      1.74            -8.5%       1.59            -9.1%       1.58        time.system_time
    257.32            +3.4%     265.97            +3.4%     265.99        time.user_time
      0.80 ± 23%      -0.4        0.41 ± 78%      -0.3        0.54 ± 40%  perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      0.79 ± 21%      -0.4        0.40 ± 77%      -0.3        0.54 ± 39%  perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.77 ± 20%      -0.4        0.40 ± 77%      -0.3        0.52 ± 39%  perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
      1.39 ± 15%      -0.3        1.04 ± 22%      -0.2        1.20 ± 14%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
      1.39 ± 15%      -0.3        1.04 ± 21%      -0.2        1.20 ± 14%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.80 ± 23%      -0.3        0.55 ± 29%      -0.2        0.60 ± 16%  perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
      0.79 ± 21%      -0.3        0.54 ± 28%      -0.2        0.60 ± 16%  perf-profile.children.cycles-pp.clear_huge_page
      0.79 ± 20%      -0.2        0.58 ± 31%      -0.2        0.58 ± 17%  perf-profile.children.cycles-pp.clear_page_erms
      0.78 ± 20%      -0.2        0.58 ± 31%      -0.2        0.58 ± 17%  perf-profile.self.cycles-pp.clear_page_erms
    129919           +25.5%     163102            -2.1%     127191        proc-vmstat.nr_anon_pages
    172.73           +40.0%     241.81            +6.4%     183.84        proc-vmstat.nr_anon_transparent_hugepages
   3328013            -1.1%    3291433            +0.1%    3332863        proc-vmstat.nr_free_pages
    333903           +11.1%     371065            -0.9%     330885        proc-vmstat.nr_inactive_anon
      1740            +4.5%       1819            +4.4%       1817        proc-vmstat.nr_page_table_pages
    333903           +11.1%     371065            -0.9%     330885        proc-vmstat.nr_zone_inactive_anon
    853676           -34.9%     556019           -34.7%     557219        proc-vmstat.numa_hit
    853653           -34.9%     555977           -34.7%     557192        proc-vmstat.numa_local
   5551461            +1.0%    5607022            +0.1%    5559594        proc-vmstat.pgalloc_normal
   1075659           -27.0%     785124           -26.9%     786363        proc-vmstat.pgfault
    108727            +2.6%     111582            +2.6%     111546        proc-vmstat.pgreuse
      9027            +7.6%       9714            +6.6%       9619        proc-vmstat.thp_fault_alloc
 1.184e+08            -3.3%  1.145e+08            -3.2%  1.146e+08        perf-stat.i.branch-instructions
   5500836            -2.4%    5367239            -2.4%    5368946        perf-stat.i.branch-misses
 4.139e+08            -2.5%  4.036e+08            -2.6%  4.034e+08        perf-stat.i.cache-misses
 5.246e+08            -2.5%  5.114e+08            -2.5%  5.117e+08        perf-stat.i.cache-references
      6889            -2.6%       6710            -2.6%       6710        perf-stat.i.context-switches
      4.31            +2.6%       4.42            +2.7%       4.43        perf-stat.i.cpi
      0.10 ±  2%      -0.0        0.09 ±  2%      -0.0        0.08 ±  3%  perf-stat.i.dTLB-load-miss-rate%
    454444           -16.1%     381426           -18.4%     370782 ±  3%  perf-stat.i.dTLB-load-misses
 8.087e+08            -3.0%  7.841e+08            -3.1%  7.839e+08        perf-stat.i.dTLB-loads
      0.02            -0.0        0.01 ±  2%      -0.0        0.01 ± 14%  perf-stat.i.dTLB-store-miss-rate%
     86294           -57.1%      36992 ±  2%     -59.7%      34809 ± 13%  perf-stat.i.dTLB-store-misses
 5.311e+08            -3.0%  5.151e+08            -3.1%  5.149e+08        perf-stat.i.dTLB-stores
    129929            -4.0%     124682            -3.3%     125639        perf-stat.i.iTLB-load-misses
    146749            -3.3%     141975            -3.7%     141337        perf-stat.i.iTLB-loads
 2.249e+09            -3.1%   2.18e+09            -3.1%  2.179e+09        perf-stat.i.instructions
      0.26            -3.0%       0.25            -2.9%       0.25        perf-stat.i.ipc
    179.65            -2.7%     174.83            -2.7%     174.79        perf-stat.i.metric.M/sec
      3614           -31.4%       2478           -31.1%       2490        perf-stat.i.minor-faults
  65665882            -0.5%   65367211            -0.8%   65111743        perf-stat.i.node-loads
      3618           -31.4%       2483           -31.1%       2494        perf-stat.i.page-faults
      3.88            +3.3%       4.01            +3.3%       4.01        perf-stat.overall.cpi
     21.10            +2.7%      21.67            +2.7%      21.67        perf-stat.overall.cycles-between-cache-misses
      0.06            -0.0        0.05            -0.0        0.05 ±  3%  perf-stat.overall.dTLB-load-miss-rate%
      0.02            -0.0        0.01 ±  2%      -0.0        0.01 ± 13%  perf-stat.overall.dTLB-store-miss-rate%
      0.26            -3.2%       0.25            -3.2%       0.25        perf-stat.overall.ipc
 1.179e+08            -3.3%   1.14e+08            -3.2%  1.141e+08        perf-stat.ps.branch-instructions
   5473781            -2.4%    5340720            -2.4%    5344770        perf-stat.ps.branch-misses
 4.126e+08            -2.5%  4.023e+08            -2.5%  4.021e+08        perf-stat.ps.cache-misses
 5.229e+08            -2.5%  5.098e+08            -2.5%    5.1e+08        perf-stat.ps.cache-references
      6864            -2.6%       6687            -2.6%       6687        perf-stat.ps.context-switches
    452799           -16.1%     380049           -18.4%     369456 ±  3%  perf-stat.ps.dTLB-load-misses
  8.06e+08            -3.0%  7.815e+08            -3.1%  7.814e+08        perf-stat.ps.dTLB-loads
     85997           -57.1%      36856 ±  2%     -59.7%      34683 ± 13%  perf-stat.ps.dTLB-store-misses
 5.294e+08            -3.0%  5.135e+08            -3.0%  5.133e+08        perf-stat.ps.dTLB-stores
    129440            -4.0%     124225            -3.3%     125181        perf-stat.ps.iTLB-load-misses
    146145            -3.2%     141400            -3.7%     140780        perf-stat.ps.iTLB-loads
 2.241e+09            -3.1%  2.172e+09            -3.1%  2.172e+09        perf-stat.ps.instructions
      3599           -31.4%       2468           -31.1%       2479        perf-stat.ps.minor-faults
  65457458            -0.5%   65162312            -0.8%   64909293        perf-stat.ps.node-loads
      3604           -31.4%       2472           -31.1%       2484        perf-stat.ps.page-faults


(12)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    607.38 ± 15%     -24.4%     459.12 ± 24%      -6.0%     570.75 ±  5%  perf-c2c.DRAM.local
      6801            -3.4%       6570            -3.1%       6587        vmstat.system.cs
     15155            -0.9%      15024            -0.7%      15046        vmstat.system.in
    353771           +43.0%     505977 ±  3%      +7.1%     378972        meminfo.AnonHugePages
    518698           +26.5%     656280            -1.7%     509920        meminfo.AnonPages
   1334737           +11.5%    1487919            -0.8%    1324549        meminfo.Inactive(anon)
      1.50            -0.1        1.45            -0.1        1.45        turbostat.C1E%
      2.64            -4.0%       2.54            -2.8%       2.57        turbostat.Pkg%pc2
     25.32            -1.1%      25.06            -0.6%      25.17        turbostat.PkgWatt
      3.30            -3.0%       3.20            -2.8%       3.20        turbostat.RAMWatt
      1.25 ±  8%      -0.3        0.96 ± 16%      -0.1        1.15 ± 22%  perf-profile.children.cycles-pp.do_user_addr_fault
      1.25 ±  8%      -0.3        0.96 ± 16%      -0.1        1.15 ± 22%  perf-profile.children.cycles-pp.exc_page_fault
      1.15 ±  9%      -0.3        0.88 ± 16%      -0.1        1.02 ± 22%  perf-profile.children.cycles-pp.__handle_mm_fault
      1.18 ±  9%      -0.3        0.91 ± 15%      -0.1        1.06 ± 21%  perf-profile.children.cycles-pp.handle_mm_fault
      0.23 ± 19%      +0.1        0.32 ± 18%      +0.1        0.33 ± 20%  perf-profile.children.cycles-pp.exit_mmap
      0.23 ± 19%      +0.1        0.32 ± 18%      +0.1        0.33 ± 20%  perf-profile.children.cycles-pp.__mmput
     19667            -6.4%      18399            -6.4%      18413        phoronix-test-suite.ramspeed.Triad.Integer.mb_s
    284.07            +3.7%     294.53            +3.4%     293.86        phoronix-test-suite.time.elapsed_time
    284.07            +3.7%     294.53            +3.4%     293.86        phoronix-test-suite.time.elapsed_time.max
    120102            +1.8%     122256            +0.1%     120265        phoronix-test-suite.time.maximum_resident_set_size
    281737           -54.7%     127624           -54.7%     127574        phoronix-test-suite.time.minor_page_faults
    259.49            +4.1%     270.20            +4.1%     270.14        phoronix-test-suite.time.user_time
    284.07            +3.7%     294.53            +3.4%     293.86        time.elapsed_time
    284.07            +3.7%     294.53            +3.4%     293.86        time.elapsed_time.max
    120102            +1.8%     122256            +0.1%     120265        time.maximum_resident_set_size
    281737           -54.7%     127624           -54.7%     127574        time.minor_page_faults
      1.72            -8.1%       1.58            -8.4%       1.58        time.system_time
    259.49            +4.1%     270.20            +4.1%     270.14        time.user_time
    129673           +26.5%     164074            -1.7%     127482        proc-vmstat.nr_anon_pages
    172.74           +43.0%     247.07 ±  3%      +7.1%     185.05        proc-vmstat.nr_anon_transparent_hugepages
    360059            -1.0%     356437            +0.1%     360424        proc-vmstat.nr_dirty_background_threshold
    720999            -1.0%     713747            +0.1%     721730        proc-vmstat.nr_dirty_threshold
   3328170            -1.1%    3291542            +0.1%    3330837        proc-vmstat.nr_free_pages
    333684           +11.5%     371981            -0.8%     331138        proc-vmstat.nr_inactive_anon
      1735            +5.0%       1822            +4.9%       1819        proc-vmstat.nr_page_table_pages
    333684           +11.5%     371981            -0.8%     331138        proc-vmstat.nr_zone_inactive_anon
    857533           -34.7%     559940           -34.6%     560503        proc-vmstat.numa_hit
    857463           -34.7%     560233           -34.6%     560504        proc-vmstat.numa_local
   1082386           -26.7%     793742           -26.9%     791272        proc-vmstat.pgfault
    109917            +2.8%     113044            +2.4%     112517        proc-vmstat.pgreuse
      9028            +7.5%       9707            +6.5%       9619        proc-vmstat.thp_fault_alloc
 1.168e+08            -6.9%  1.087e+08 ±  9%      -3.5%  1.127e+08        perf-stat.i.branch-instructions
      3.39            +0.1        3.47            +0.1        3.47        perf-stat.i.branch-miss-rate%
   5431805            -8.1%    4990354 ± 15%      -2.7%    5285279        perf-stat.i.branch-misses
  4.13e+08            -3.1%  4.004e+08            -2.8%  4.015e+08        perf-stat.i.cache-misses
 5.338e+08            -2.6%  5.196e+08            -2.4%  5.211e+08        perf-stat.i.cache-references
      6835            -3.4%       6604            -3.1%       6623        perf-stat.i.context-switches
      4.05            +3.8%       4.21            +3.6%       4.20        perf-stat.i.cpi
     60.96 ±  7%      +0.4%      61.20 ± 12%      -7.7%      56.27 ±  3%  perf-stat.i.cycles-between-cache-misses
      0.08 ±  3%      -0.0        0.08 ±  6%      -0.0        0.08 ±  4%  perf-stat.i.dTLB-load-miss-rate%
    455317           -16.9%     378574           -16.7%     379148        perf-stat.i.dTLB-load-misses
 1.118e+09            -3.8%  1.076e+09            -3.3%  1.082e+09        perf-stat.i.dTLB-loads
      0.02            -0.0        0.01 ±  6%      -0.0        0.01 ±  2%  perf-stat.i.dTLB-store-miss-rate%
     86796           -57.3%      37100 ±  2%     -57.3%      37097 ±  2%  perf-stat.i.dTLB-store-misses
  7.31e+08            -3.7%   7.04e+08            -3.3%  7.068e+08        perf-stat.i.dTLB-stores
    128995            -3.1%     125030 ±  2%      -4.4%     123280        perf-stat.i.iTLB-load-misses
    145739            -4.0%     139945            -3.7%     140348        perf-stat.i.iTLB-loads
 2.395e+09            -4.3%  2.291e+09 ±  2%      -3.4%  2.314e+09        perf-stat.i.instructions
      0.28            -4.2%       0.27            -3.9%       0.27        perf-stat.i.ipc
     30.30 ±  6%     -11.5%      26.81 ±  6%     -21.3%      23.84 ± 12%  perf-stat.i.metric.K/sec
    220.55            -3.5%     212.73            -3.0%     213.94        perf-stat.i.metric.M/sec
      3598           -31.3%       2473           -31.5%       2466        perf-stat.i.minor-faults
  49026239            +1.9%   49938429            +2.0%   50024868        perf-stat.i.node-loads
  98013334            -3.0%   95053521            -2.8%   95291354        perf-stat.i.node-stores
      3602           -31.2%       2477           -31.4%       2470        perf-stat.i.page-faults
      3.64            +4.6%       3.81            +3.9%       3.78        perf-stat.overall.cpi
     21.09            +3.2%      21.76            +3.3%      21.78        perf-stat.overall.cycles-between-cache-misses
      0.04            -0.0        0.04            -0.0        0.04        perf-stat.overall.dTLB-load-miss-rate%
      0.01            -0.0        0.01 ±  2%      -0.0        0.01 ±  2%  perf-stat.overall.dTLB-store-miss-rate%
      0.27            -4.3%       0.26            -3.7%       0.26        perf-stat.overall.ipc
 1.163e+08            -6.9%  1.083e+08 ±  9%      -3.5%  1.122e+08        perf-stat.ps.branch-instructions
   5405065            -8.1%    4967211 ± 15%      -2.7%    5259197        perf-stat.ps.branch-misses
 4.117e+08            -3.0%  3.992e+08            -2.8%  4.003e+08        perf-stat.ps.cache-misses
 5.321e+08            -2.6%   5.18e+08            -2.4%  5.195e+08        perf-stat.ps.cache-references
      6810            -3.4%       6579            -3.1%       6599        perf-stat.ps.context-switches
    453677           -16.9%     377215           -16.7%     377792        perf-stat.ps.dTLB-load-misses
 1.115e+09            -3.8%  1.072e+09            -3.3%  1.078e+09        perf-stat.ps.dTLB-loads
     86500           -57.3%      36965 ±  2%     -57.3%      36962 ±  2%  perf-stat.ps.dTLB-store-misses
 7.286e+08            -3.7%  7.019e+08            -3.3%  7.045e+08        perf-stat.ps.dTLB-stores
    128515            -3.1%     124573 ±  2%      -4.4%     122831        perf-stat.ps.iTLB-load-misses
    145145            -4.0%     139336            -3.7%     139772        perf-stat.ps.iTLB-loads
 2.386e+09            -4.3%  2.283e+09 ±  2%      -3.4%  2.306e+09        perf-stat.ps.instructions
      3583           -31.3%       2462           -31.5%       2455        perf-stat.ps.minor-faults
  48873391            +1.9%   49781212            +2.0%   49874192        perf-stat.ps.node-loads
  97704914            -3.0%   94765417            -2.8%   94999974        perf-stat.ps.node-stores
      3588           -31.2%       2467           -31.4%       2460        perf-stat.ps.page-faults


(13)
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite

1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
      6786            -2.9%       6587            -2.9%       6586        vmstat.system.cs
    355264 ±  2%     +41.1%     501244            +6.5%     378393        meminfo.AnonHugePages
    520377           +25.7%     654330            -2.1%     509644        meminfo.AnonPages
   1336461           +11.2%    1486141            -0.9%    1324302        meminfo.Inactive(anon)
      1.50            -0.0        1.46            -0.1        1.45        turbostat.C1E%
     24.20            -1.2%      23.90            -0.9%      23.98        turbostat.CorWatt
      2.62            -2.4%       2.56            -3.7%       2.53        turbostat.Pkg%pc2
     25.37            -1.3%      25.03            -1.0%      25.12        turbostat.PkgWatt
      3.30            -3.1%       3.20            -3.0%       3.20        turbostat.RAMWatt
     19799            -3.5%      19106            -3.4%      19117        phoronix-test-suite.ramspeed.Average.Integer.mb_s
    283.91            +3.7%     294.40            +3.6%     294.12        phoronix-test-suite.time.elapsed_time
    283.91            +3.7%     294.40            +3.6%     294.12        phoronix-test-suite.time.elapsed_time.max
    120150            +1.7%     122196            +0.2%     120373        phoronix-test-suite.time.maximum_resident_set_size
    281692           -54.7%     127689           -54.7%     127587        phoronix-test-suite.time.minor_page_faults
    259.47            +4.1%     270.04            +4.0%     269.86        phoronix-test-suite.time.user_time
    283.91            +3.7%     294.40            +3.6%     294.12        time.elapsed_time
    283.91            +3.7%     294.40            +3.6%     294.12        time.elapsed_time.max
    120150            +1.7%     122196            +0.2%     120373        time.maximum_resident_set_size
    281692           -54.7%     127689           -54.7%     127587        time.minor_page_faults
      1.72            -7.9%       1.58            -8.4%       1.58        time.system_time
    259.47            +4.1%     270.04            +4.0%     269.86        time.user_time
    130092           +25.7%     163578            -2.1%     127411        proc-vmstat.nr_anon_pages
    173.47 ±  2%     +41.1%     244.74            +6.5%     184.76        proc-vmstat.nr_anon_transparent_hugepages
   3328419            -1.1%    3292662            +0.1%    3332791        proc-vmstat.nr_free_pages
    334114           +11.2%     371530            -0.9%     331076        proc-vmstat.nr_inactive_anon
      1732            +4.7%       1814            +5.2%       1823        proc-vmstat.nr_page_table_pages
    334114           +11.2%     371530            -0.9%     331076        proc-vmstat.nr_zone_inactive_anon
    853734           -34.6%     558669           -34.2%     562087        proc-vmstat.numa_hit
    853524           -34.6%     558628           -34.1%     562074        proc-vmstat.numa_local
   5551673            +1.0%    5609595            +0.2%    5564708        proc-vmstat.pgalloc_normal
   1077693           -26.6%     791019           -26.3%     794706        proc-vmstat.pgfault
    109591            +3.1%     112941            +2.9%     112795        proc-vmstat.pgreuse
      9027            +7.6%       9714            +6.6%       9619        proc-vmstat.thp_fault_alloc
      1.58 ± 16%      -0.5        1.08 ±  8%      -0.4        1.16 ± 24%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault
      1.42 ± 14%      -0.4        0.97 ±  9%      -0.4        1.05 ± 24%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      1.42 ± 14%      -0.4        0.98 ±  8%      -0.4        1.05 ± 24%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
      1.32 ± 14%      -0.4        0.91 ± 12%      -0.3        0.98 ± 26%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      1.30 ± 13%      -0.4        0.88 ± 13%      -0.4        0.94 ± 26%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      1.64 ± 16%      -0.5        1.12 ±  9%      -0.4        1.24 ± 22%  perf-profile.children.cycles-pp.asm_exc_page_fault
      1.48 ± 15%      -0.5        1.01 ± 10%      -0.4        1.12 ± 21%  perf-profile.children.cycles-pp.do_user_addr_fault
      1.49 ± 14%      -0.5        1.02 ±  9%      -0.4        1.12 ± 21%  perf-profile.children.cycles-pp.exc_page_fault
      1.37 ± 14%      -0.4        0.94 ± 12%      -0.3        1.05 ± 22%  perf-profile.children.cycles-pp.handle_mm_fault
      1.34 ± 13%      -0.4        0.91 ± 13%      -0.3        1.00 ± 23%  perf-profile.children.cycles-pp.__handle_mm_fault
      0.78 ± 20%      -0.3        0.50 ± 20%      -0.2        0.54 ± 33%  perf-profile.children.cycles-pp.clear_page_erms
      0.76 ± 20%      -0.3        0.50 ± 22%      -0.2        0.53 ± 34%  perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
      0.75 ± 20%      -0.2        0.50 ± 23%      -0.2        0.53 ± 33%  perf-profile.children.cycles-pp.clear_huge_page
      0.25 ± 28%      +0.0        0.28 ± 77%      -0.1        0.11 ± 52%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.24 ± 28%      +0.0        0.28 ± 77%      -0.1        0.11 ± 52%  perf-profile.children.cycles-pp.ret_from_fork
      0.23 ± 31%      +0.0        0.28 ± 78%      -0.1        0.09 ± 59%  perf-profile.children.cycles-pp.kthread
      0.77 ± 20%      -0.3        0.50 ± 18%      -0.2        0.54 ± 33%  perf-profile.self.cycles-pp.clear_page_erms
 1.166e+08            -3.3%  1.127e+08            -3.0%  1.131e+08        perf-stat.i.branch-instructions
      3.39            +0.1        3.49            +0.1        3.46        perf-stat.i.branch-miss-rate%
   5415570            -2.0%    5304890            -2.0%    5306531        perf-stat.i.branch-misses
 4.133e+08            -3.1%  4.005e+08            -2.9%  4.014e+08        perf-stat.i.cache-misses
 5.335e+08            -2.5%  5.203e+08            -2.4%  5.209e+08        perf-stat.i.cache-references
      6825            -3.1%       6616            -3.1%       6614        perf-stat.i.context-switches
      4.06            +3.5%       4.20            +3.3%       4.19        perf-stat.i.cpi
      0.08 ±  3%      -0.0        0.08 ±  2%      -0.0        0.08 ±  2%  perf-stat.i.dTLB-load-miss-rate%
    451852           -17.2%     374167 ±  4%     -16.1%     378935        perf-stat.i.dTLB-load-misses
  1.12e+09            -3.7%  1.079e+09            -3.5%  1.081e+09        perf-stat.i.dTLB-loads
      0.02            -0.0        0.01 ± 13%      -0.0        0.01        perf-stat.i.dTLB-store-miss-rate%
     86119           -59.0%      35274 ± 13%     -57.5%      36598        perf-stat.i.dTLB-store-misses
 7.319e+08            -3.7%  7.049e+08            -3.5%  7.066e+08        perf-stat.i.dTLB-stores
    128297            -2.6%     124925            -3.6%     123631        perf-stat.i.iTLB-load-misses
 2.395e+09            -3.6%  2.309e+09            -3.4%  2.315e+09        perf-stat.i.instructions
      0.28            -3.4%       0.27            -3.4%       0.27        perf-stat.i.ipc
    220.76            -3.3%     213.44            -3.1%     213.87        perf-stat.i.metric.M/sec
      3575           -30.9%       2470           -30.4%       2487        perf-stat.i.minor-faults
  49267237            +1.1%   49805411            +1.4%   49954320        perf-stat.i.node-loads
  98097080            -3.1%   95014639            -2.8%   95307489        perf-stat.i.node-stores
      3579           -30.9%       2475           -30.4%       2492        perf-stat.i.page-faults
      4.64            +0.1        4.71            +0.0        4.69        perf-stat.overall.branch-miss-rate%
      3.64            +3.8%       3.78            +3.7%       3.78        perf-stat.overall.cpi
     21.10            +3.3%      21.80            +3.2%      21.78        perf-stat.overall.cycles-between-cache-misses
      0.04            -0.0        0.03 ±  4%      -0.0        0.04        perf-stat.overall.dTLB-load-miss-rate%
      0.01            -0.0        0.01 ± 13%      -0.0        0.01        perf-stat.overall.dTLB-store-miss-rate%
      0.27            -3.7%       0.26            -3.6%       0.26        perf-stat.overall.ipc
 1.161e+08            -3.3%  1.122e+08            -3.0%  1.126e+08        perf-stat.ps.branch-instructions
   5390667            -2.1%    5280037            -2.0%    5282651        perf-stat.ps.branch-misses
  4.12e+08            -3.1%  3.993e+08            -2.9%  4.001e+08        perf-stat.ps.cache-misses
 5.318e+08            -2.5%  5.187e+08            -2.3%  5.193e+08        perf-stat.ps.cache-references
      6801            -3.1%       6593            -3.0%       6595        perf-stat.ps.context-switches
    450236           -17.2%     372836 ±  4%     -16.1%     377601        perf-stat.ps.dTLB-load-misses
 1.117e+09            -3.7%  1.075e+09            -3.5%  1.078e+09        perf-stat.ps.dTLB-loads
     85824           -59.0%      35147 ± 13%     -57.5%      36467        perf-stat.ps.dTLB-store-misses
 7.295e+08            -3.7%  7.027e+08            -3.4%  7.044e+08        perf-stat.ps.dTLB-stores
    127825            -2.6%     124475            -3.6%     123194        perf-stat.ps.iTLB-load-misses
 2.387e+09            -3.6%  2.302e+09            -3.3%  2.307e+09        perf-stat.ps.instructions
      3561           -30.9%       2460           -30.4%       2478        perf-stat.ps.minor-faults
  49109319            +1.1%   49654078            +1.4%   49800339        perf-stat.ps.node-loads
  97782680            -3.1%   94720369            -2.8%   95009401        perf-stat.ps.node-stores
      3566           -30.9%       2465           -30.4%       2482        perf-stat.ps.page-faults

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2024-01-05  9:29                       ` Oliver Sang
@ 2024-01-05 14:52                         ` Yin, Fengwei
  2024-01-05 18:49                         ` Yang Shi
  1 sibling, 0 replies; 24+ messages in thread
From: Yin, Fengwei @ 2024-01-05 14:52 UTC (permalink / raw)
  To: Oliver Sang, Yang Shi
  Cc: Rik van Riel, oe-lkp, lkp, Linux Memory Management List,
	Andrew Morton, Matthew Wilcox, Christopher Lameter, ying.huang,
	feng.tang



On 1/5/2024 5:29 PM, Oliver Sang wrote:
> hi, Yang Shi,
> 
> On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote:
>> hi, Fengwei, hi, Yang Shi,
>>
>> On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote:
>>>
>>> On 2024/1/4 09:32, Yang Shi wrote:
>>
>> ...
>>
>>>> Can you please help test the below patch?
>>> I can't access the testing box now. Oliver will help to test your patch.
>>>
>>
>> since now the commit-id of
>>    'mm: align larger anonymous mappings on THP boundaries'
>> in linux-next/master is efa7df3e3bb5d
>> I applied the patch like below:
>>
>> * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
>> * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries
>> * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi
>>
>> our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression
>> so far, I will test d8d7b1dae6f03 for all these tests. Thanks
>>
> 
> we got 12 regressions and 1 improvement results for efa7df3e3b so far.
> (4 regressions are just similar to what we reported for 1111d46b5c).
> by your patch, 6 of those regressions are fixed, others are not impacted.
> 
> below is a summary:
> 
> No.  testsuite       test                            status-on-efa7df3e3b  fix-by-d8d7b1dae6 ?
> ===  =========       ====                            ====================  ===================
> (1)  stress-ng       numa                            regression            NO
> (2)                  pthread                         regression            yes (on a Ice Lake server)
> (3)                  pthread                         regression            yes (on a Cascade Lake desktop)
> (4)  will-it-scale   malloc1                         regression            NO
> (5)                  page_fault1                     improvement           no (so still improvement)
> (6)  vm-scalability  anon-w-seq-mt                   regression            yes
> (7)  stream          nr_threads=25%                  regression            yes
> (8)                  nr_threads=50%                  regression            yes
> (9)  phoronix        osbench.CreateThreads           regression            yes (on a Cascade Lake server)
> (10)                 ramspeed.Add.Integer            regression            NO (and below 3, on a Coffee Lake desktop)
> (11)                 ramspeed.Average.FloatingPoint  regression            NO
> (12)                 ramspeed.Triad.Integer          regression            NO
> (13)                 ramspeed.Average.Integer        regression            NO

Hints on ramspeed just for your reference:
I did standalone ramspeed (not phoronix) testing on a IceLake 48C/96T +
192GB memory and didn't see the regressions on that testing box (The
testing box was retired at the end of last year and can't be accessed 
anymore).


Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
  2024-01-05  9:29                       ` Oliver Sang
  2024-01-05 14:52                         ` Yin, Fengwei
@ 2024-01-05 18:49                         ` Yang Shi
  1 sibling, 0 replies; 24+ messages in thread
From: Yang Shi @ 2024-01-05 18:49 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Yin Fengwei, Rik van Riel, oe-lkp, lkp,
	Linux Memory Management List, Andrew Morton, Matthew Wilcox,
	Christopher Lameter, ying.huang, feng.tang

On Fri, Jan 5, 2024 at 1:29 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Yang Shi,
>
> On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote:
> > hi, Fengwei, hi, Yang Shi,
> >
> > On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote:
> > >
> > > On 2024/1/4 09:32, Yang Shi wrote:
> >
> > ...
> >
> > > > Can you please help test the below patch?
> > > I can't access the testing box now. Oliver will help to test your patch.
> > >
> >
> > since now the commit-id of
> >   'mm: align larger anonymous mappings on THP boundaries'
> > in linux-next/master is efa7df3e3bb5d
> > I applied the patch like below:
> >
> > * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
> > * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries
> > * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi
> >
> > our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression
> > so far, I will test d8d7b1dae6f03 for all these tests. Thanks
> >
>

Hi Oliver,

Thanks for running the test. Please see the inline comments.

> we got 12 regressions and 1 improvement results for efa7df3e3b so far.
> (4 regressions are just similar to what we reported for 1111d46b5c).
> by your patch, 6 of those regressions are fixed, others are not impacted.
>
> below is a summary:
>
> No.  testsuite       test                            status-on-efa7df3e3b  fix-by-d8d7b1dae6 ?
> ===  =========       ====                            ====================  ===================
> (1)  stress-ng       numa                            regression            NO
> (2)                  pthread                         regression            yes (on a Ice Lake server)
> (3)                  pthread                         regression            yes (on a Cascade Lake desktop)
> (4)  will-it-scale   malloc1                         regression            NO

I think this was reported earlier when Rik submitted the patch in the
first place. IIRC, Huang Ying did some analysis on this one and
thought is can be ignored.

> (5)                  page_fault1                     improvement           no (so still improvement)
> (6)  vm-scalability  anon-w-seq-mt                   regression            yes
> (7)  stream          nr_threads=25%                  regression            yes
> (8)                  nr_threads=50%                  regression            yes
> (9)  phoronix        osbench.CreateThreads           regression            yes (on a Cascade Lake server)
> (10)                 ramspeed.Add.Integer            regression            NO (and below 3, on a Coffee Lake desktop)
> (11)                 ramspeed.Average.FloatingPoint  regression            NO
> (12)                 ramspeed.Triad.Integer          regression            NO
> (13)                 ramspeed.Average.Integer        regression            NO

Not fixing the ramspeed regression is expected. But it seems like both
I and Fengwei can't reproduce the regression with running ramspeed
alone.

>
>
> below are details, for those regressions not fixed by d8d7b1dae6, attached
> full comparison.
>
>
> (1) detail comparison is attached as 'stress-ng-regression'
>
> Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
> =========================================================================================
> class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>     251.12           -48.2%     130.00           -47.9%     130.75        stress-ng.numa.ops
>       4.10           -49.4%       2.08           -49.2%       2.09        stress-ng.numa.ops_per_sec

This is a new one. I did some analysis, it seems like it is not
related to the THP patch since I can reproduce it on the kernel (on
aarch64 VM) w/o the THP patch if I set THP to always.

The profiling showed the regression was caused by move_pages()
syscall. The test actually calls a bunch of NUMA syscalls, for
example, set_mempolicy(), mbind(), move_pages(), migrate_pages(), etc,
with different parameters. When calling move_pages() it tries to move
pages (at base page granularity) to different nodes in a circular
list. On my 2-node NUMA VM, it actually moves:

0th page to node #1
1st page to node #0
2nd page to node #1
3rd page to node #0
....
1023rd page to node #0

But for THP, it actually bounces the THP between the two nodes for 512 times.

The pgmigrate_success counter in /proc/vmstat also reflected the case:

For base page, the delta is 1928431, but for THP case the delta is 218466402.

The kernel already did the node check to kip move if the page is
already on the target node, but the test case just do the bounce on
purpose since it just assumes base page. So I think this case should
be run with THP disabled.

>
>
> (2)
> Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/pthread/stress-ng/60s
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>    3272223           -87.8%     400430            +0.5%    3287322        stress-ng.pthread.ops
>      54516           -87.8%       6664            +0.5%      54772        stress-ng.pthread.ops_per_sec
>
>
> (3)
> Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with memory: 128G
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>    2250845           -85.2%     332370 ±  6%      -0.8%    2232820        stress-ng.pthread.ops
>      37510           -85.2%       5538 ±  6%      -0.8%      37209        stress-ng.pthread.ops_per_sec
>
>
> (4) full comparison attached as 'will-it-scale-regression'
>
> Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>      10994           -86.7%       1466           -86.7%       1460        will-it-scale.per_process_ops
>    1231431           -86.7%     164315           -86.7%     163624        will-it-scale.workload
>
>
> (5)
> Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault1/will-it-scale
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>   18858970           +44.8%   27298921           +44.9%   27330479        will-it-scale.224.threads
>      56.06           +13.3%      63.53           +13.8%      63.81        will-it-scale.224.threads_idle
>      84191           +44.8%     121869           +44.9%     122010        will-it-scale.per_thread_ops
>   18858970           +44.8%   27298921           +44.9%   27330479        will-it-scale.workload
>
>
> (6)
> Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>     345968            -6.5%     323566            +0.1%     346304        vm-scalability.median
>       1.91 ± 10%      -0.5        1.38 ± 20%      -0.2        1.75 ± 13%  vm-scalability.median_stddev%
>   79708409            -7.4%   73839640            -0.1%   79613742        vm-scalability.throughput
>
>
> (7)
> Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G
> =========================================================================================
> array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
>   50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>     349414           -16.2%     292854 ±  2%      -0.4%     348048        stream.add_bandwidth_MBps
>     347727 ±  2%     -16.5%     290470 ±  2%      -0.6%     345750 ±  2%  stream.add_bandwidth_MBps_harmonicMean
>     332206           -21.6%     260428 ±  3%      -0.4%     330838        stream.copy_bandwidth_MBps
>     330746 ±  2%     -22.6%     255915 ±  3%      -0.6%     328725 ±  2%  stream.copy_bandwidth_MBps_harmonicMean
>     301178           -16.9%     250209 ±  2%      -0.4%     299920        stream.scale_bandwidth_MBps
>     300262           -17.7%     247151 ±  2%      -0.6%     298586 ±  2%  stream.scale_bandwidth_MBps_harmonicMean
>     337408           -12.5%     295287 ±  2%      -0.3%     336304        stream.triad_bandwidth_MBps
>     336153           -12.7%     293621            -0.5%     334624 ±  2%  stream.triad_bandwidth_MBps_harmonicMean
>
>
> (8)
> Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G
> =========================================================================================
> array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
>   50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/50%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>     345632           -19.7%     277550 ±  3%      +0.4%     347067 ±  2%  stream.add_bandwidth_MBps
>     342263 ±  2%     -19.7%     274704 ±  2%      +0.4%     343609 ±  2%  stream.add_bandwidth_MBps_harmonicMean
>     343820           -17.3%     284428 ±  3%      +0.1%     344248        stream.copy_bandwidth_MBps
>     341759 ±  2%     -17.8%     280934 ±  3%      +0.1%     342025 ±  2%  stream.copy_bandwidth_MBps_harmonicMean
>     343270           -17.8%     282330 ±  3%      +0.3%     344276 ±  2%  stream.scale_bandwidth_MBps
>     340812 ±  2%     -18.3%     278284 ±  3%      +0.3%     341672 ±  2%  stream.scale_bandwidth_MBps_harmonicMean
>     364596           -19.7%     292831 ±  3%      +0.4%     366145 ±  2%  stream.triad_bandwidth_MBps
>     360643 ±  2%     -19.9%     289034 ±  3%      +0.4%     362004 ±  2%  stream.triad_bandwidth_MBps_harmonicMean
>
>
> (9)
> Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with memory: 512G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/Create Threads/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>      26.82         +1348.4%     388.43            +4.0%      27.88        phoronix-test-suite.osbench.CreateThreads.us_per_event
>
>
> **** for below (10) - (13), full comparison is attached as phoronix-regressions
> (they all happen on a Coffee Lake desktop)
> (10)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>      20115            -4.5%      19211            -4.5%      19217        phoronix-test-suite.ramspeed.Add.Integer.mb_s
>
>
> (11)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>      19960            -2.9%      19378            -3.0%      19366        phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
>
>
> (12)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>      19667            -6.4%      18399            -6.4%      18413        phoronix-test-suite.ramspeed.Triad.Integer.mb_s
>
>
> (13)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
>   gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>      19799            -3.5%      19106            -3.4%      19117        phoronix-test-suite.ramspeed.Average.Integer.mb_s
>
>
>
> >
> >
> > commit d8d7b1dae6f0311d528b289cda7b317520f9a984
> > Author: 0day robot <lkp@intel.com>
> > Date:   Thu Jan 4 12:51:10 2024 +0800
> >
> >     fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
> >
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 40d94411d4920..91197bd387730 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> >         return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
> >                _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> >                _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
> > +              _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
> >                arch_calc_vm_flag_bits(flags);
> >  }
> >
> >
> > >
> > > Regards
> > > Yin, Fengwei
> > >
> > > >
> > > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > > index 40d94411d492..dc7048824be8 100644
> > > > --- a/include/linux/mman.h
> > > > +++ b/include/linux/mman.h
> > > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > > >          return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
> > > >                 _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> > > >                 _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
> > > > +              _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
> > > >                 arch_calc_vm_flag_bits(flags);
> > > >   }
> > > >


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-01-05 18:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-19 15:41 [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression kernel test robot
2023-12-20  5:27 ` Yang Shi
2023-12-20  8:29   ` Yin Fengwei
2023-12-20 15:42     ` Christoph Lameter (Ampere)
2023-12-20 20:14       ` Yang Shi
2023-12-20 20:09     ` Yang Shi
2023-12-21  0:26       ` Yang Shi
2023-12-21  0:58         ` Yin Fengwei
2023-12-21  1:02           ` Yin Fengwei
2023-12-21  4:49           ` Matthew Wilcox
2023-12-21  4:58             ` Yin Fengwei
2023-12-21 18:07             ` Yang Shi
2023-12-21 18:14               ` Matthew Wilcox
2023-12-22  1:06                 ` Yin, Fengwei
2023-12-22  2:23                   ` Huang, Ying
2023-12-21 13:39           ` Yin, Fengwei
2023-12-21 18:11             ` Yang Shi
2023-12-22  1:13               ` Yin, Fengwei
2024-01-04  1:32                 ` Yang Shi
2024-01-04  8:18                   ` Yin Fengwei
2024-01-04  8:39                     ` Oliver Sang
2024-01-05  9:29                       ` Oliver Sang
2024-01-05 14:52                         ` Yin, Fengwei
2024-01-05 18:49                         ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox