linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [linux-next:master] [memcg]  01d37228d3:  netperf.Throughput_Mbps 37.9% regression
@ 2025-03-10  5:50 kernel test robot
  2025-03-10  9:55 ` Vlastimil Babka
  0 siblings, 1 reply; 6+ messages in thread
From: kernel test robot @ 2025-03-10  5:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: oe-lkp, lkp, Michal Hocko, Vlastimil Babka, Shakeel Butt,
	cgroups, linux-mm, oliver.sang



Hello,

kernel test robot noticed a 37.9% regression of netperf.Throughput_Mbps on:


commit: 01d37228d331047a0bbbd1026cec2ccabef6d88d ("memcg: Use trylock to access memcg stock_lock.")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

[test failed on linux-next/master 7ec162622e66a4ff886f8f28712ea1b13069e1aa]

testcase: netperf
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:

	ip: ipv4
	runtime: 300s
	nr_threads: 50%
	cluster: cs-localhost
	test: TCP_MAERTS
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.mmapfork.ops_per_sec  63.5% regression                                        |
| test machine     | 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 8592+ (Emerald Rapids) with 256G memory            |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | nr_threads=100%                                                                                    |
|                  | test=mmapfork                                                                                      |
|                  | testtime=60s                                                                                       |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | hackbench: hackbench.throughput  26.6% regression                                                  |
| test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory         |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | ipc=socket                                                                                         |
|                  | iterations=4                                                                                       |
|                  | mode=threads                                                                                       |
|                  | nr_threads=100%                                                                                    |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | lmbench3: lmbench3.TCP.socket.bandwidth.64B.MB/sec  33.0% regression                               |
| test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory        |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | mode=development                                                                                   |
|                  | nr_threads=100%                                                                                    |
|                  | test=TCP                                                                                           |
|                  | test_memory_size=50%                                                                               |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | vm-scalability: vm-scalability.throughput  86.8% regression                                        |
| test machine     | 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 8592+ (Emerald Rapids) with 256G memory            |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | runtime=300s                                                                                       |
|                  | size=1T                                                                                            |
|                  | test=lru-shm                                                                                       |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | netperf: netperf.Throughput_Mbps 39.9% improvement                                                 |
| test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory         |
| test parameters  | cluster=cs-localhost                                                                               |
|                  | cpufreq_governor=performance                                                                       |
|                  | ip=ipv4                                                                                            |
|                  | nr_threads=200%                                                                                    |
|                  | runtime=300s                                                                                       |
|                  | test=TCP_MAERTS                                                                                    |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_thread_ops  68.8% regression                                      |
| test machine     | 104 threads 2 sockets (Skylake) with 192G memory                                                   |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | mode=thread                                                                                        |
|                  | nr_task=100%                                                                                       |
|                  | test=fallocate1                                                                                    |
+------------------+----------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202503101254.cfd454df-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250310/202503101254.cfd454df-lkp@intel.com

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
  cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-9.4/50%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp2/TCP_MAERTS/netperf

commit: 
  8c57b687e8 ("mm, bpf: Introduce free_pages_nolock()")
  01d37228d3 ("memcg: Use trylock to access memcg stock_lock.")

8c57b687e8331eb8 01d37228d331047a0bbbd1026ce 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     88798 ±  2%     +11.3%      98788        perf-c2c.HITM.total
     11324 ±  9%     +29.0%      14612        uptime.idle
 5.698e+09           +62.0%  9.228e+09        cpuidle..time
 6.409e+08 ±  2%     -13.9%  5.517e+08 ±  4%  cpuidle..usage
     12.79 ±  2%     +10.0       22.80        mpstat.cpu.all.idle%
      2.92 ±  2%      -0.4        2.55        mpstat.cpu.all.irq%
     68.81            -8.5       60.34        mpstat.cpu.all.sys%
      2.75            -1.1        1.61 ±  2%  mpstat.cpu.all.usr%
 8.542e+08           -36.5%  5.424e+08        numa-numastat.node0.local_node
 8.541e+08           -36.5%  5.425e+08        numa-numastat.node0.numa_hit
 8.262e+08           -39.5%  4.995e+08 ±  3%  numa-numastat.node1.local_node
 8.262e+08           -39.5%  4.996e+08 ±  3%  numa-numastat.node1.numa_hit
     13.41 ±  2%     +73.9%      23.32        vmstat.cpu.id
    110.55           -13.7%      95.45        vmstat.procs.r
   4461013 ±  2%     -12.9%    3883497 ±  4%  vmstat.system.cs
   2363200 ±  2%     -12.3%    2073470 ±  4%  vmstat.system.in
   6829101 ±  4%     -59.5%    2765741 ±  7%  numa-meminfo.node1.Active
   6829101 ±  4%     -59.5%    2765741 ±  7%  numa-meminfo.node1.Active(anon)
   7426985 ± 22%     -32.0%    5051150 ± 27%  numa-meminfo.node1.FilePages
   2764830 ± 11%     -92.4%     209706 ± 21%  numa-meminfo.node1.Mapped
   8991931 ± 18%     -29.4%    6351136 ± 20%  numa-meminfo.node1.MemUsed
     14170 ±  8%     -41.4%       8302 ±  5%  numa-meminfo.node1.PageTables
   6214806 ±  3%     -63.5%    2266447 ±  7%  numa-meminfo.node1.Shmem
   7077695 ±  2%     -57.6%    2999270 ±  5%  meminfo.Active
   7077695 ±  2%     -57.6%    2999270 ±  5%  meminfo.Active(anon)
   9791069           -41.0%    5777962 ±  2%  meminfo.Cached
   7238271 ±  2%     -56.5%    3151650 ±  5%  meminfo.Committed_AS
   2812962 ± 11%     -92.1%     223137 ± 10%  meminfo.Mapped
  12548272           -35.9%    8045050 ±  2%  meminfo.Memused
     22784 ±  3%     -27.4%      16539        meminfo.PageTables
   6286416 ±  2%     -63.8%    2273611 ±  7%  meminfo.Shmem
  12766197           -32.4%    8626151 ±  2%  meminfo.max_used_kB
 8.541e+08           -36.5%  5.427e+08        numa-vmstat.node0.numa_hit
 8.542e+08           -36.5%  5.425e+08        numa-vmstat.node0.numa_local
   1707145 ±  4%     -59.5%     691093 ±  7%  numa-vmstat.node1.nr_active_anon
   1856614 ± 22%     -32.0%    1262417 ± 27%  numa-vmstat.node1.nr_file_pages
    691174 ± 11%     -92.3%      52918 ± 21%  numa-vmstat.node1.nr_mapped
      3544 ±  8%     -41.3%       2080 ±  5%  numa-vmstat.node1.nr_page_table_pages
   1553569 ±  3%     -63.6%     566242 ±  7%  numa-vmstat.node1.nr_shmem
   1707145 ±  4%     -59.5%     691093 ±  7%  numa-vmstat.node1.nr_zone_active_anon
 8.262e+08           -39.5%  4.997e+08 ±  3%  numa-vmstat.node1.numa_hit
 8.262e+08           -39.5%  4.997e+08 ±  3%  numa-vmstat.node1.numa_local
     22880           -37.9%      14205 ±  2%  netperf.ThroughputBoth_Mbps
   1464367           -37.9%     909168 ±  2%  netperf.ThroughputBoth_total_Mbps
     22880           -37.9%      14205 ±  2%  netperf.Throughput_Mbps
   1464367           -37.9%     909168 ±  2%  netperf.Throughput_total_Mbps
     94030 ± 15%    +799.5%     845847 ± 16%  netperf.time.involuntary_context_switches
     35098           +11.3%      39072 ±  3%  netperf.time.minor_page_faults
      3619           -30.6%       2511        netperf.time.percent_of_cpu_this_job_got
     10591           -31.1%       7296        netperf.time.system_time
    307.43           -12.8%     268.12        netperf.time.user_time
 6.797e+08 ±  2%     -12.9%  5.922e+08 ±  4%  netperf.time.voluntary_context_switches
 3.352e+09           -37.9%  2.081e+09 ±  2%  netperf.workload
   1768827 ±  2%     -57.6%     749641 ±  5%  proc-vmstat.nr_active_anon
    198757            -8.2%     182368        proc-vmstat.nr_anon_pages
   6242276            +1.8%    6354594        proc-vmstat.nr_dirty_background_threshold
  12499816            +1.8%   12724725        proc-vmstat.nr_dirty_threshold
   2447152           -41.0%    1444280 ±  2%  proc-vmstat.nr_file_pages
  62798979            +1.8%   63923764        proc-vmstat.nr_free_pages
    703005 ± 11%     -92.0%      56220 ± 12%  proc-vmstat.nr_mapped
      5711 ±  3%     -27.3%       4153        proc-vmstat.nr_page_table_pages
   1570988 ±  2%     -63.8%     568192 ±  7%  proc-vmstat.nr_shmem
     33010            -7.1%      30660        proc-vmstat.nr_slab_reclaimable
     70932            -3.7%      68338        proc-vmstat.nr_slab_unreclaimable
   1768827 ±  2%     -57.6%     749641 ±  5%  proc-vmstat.nr_zone_active_anon
    351363 ± 32%     -79.4%      72278 ± 16%  proc-vmstat.numa_hint_faults
    337005 ± 34%     -82.9%      57525 ± 24%  proc-vmstat.numa_hint_faults_local
 1.679e+09           -37.9%  1.042e+09 ±  2%  proc-vmstat.numa_hit
 1.679e+09           -37.9%  1.042e+09 ±  2%  proc-vmstat.numa_local
    411756 ± 21%     -63.5%     150280 ± 34%  proc-vmstat.numa_pte_updates
  1.34e+10           -37.9%  8.324e+09 ±  2%  proc-vmstat.pgalloc_normal
   1393623 ±  8%     -23.4%    1067508        proc-vmstat.pgfault
  1.34e+10           -37.9%  8.323e+09 ±  2%  proc-vmstat.pgfree
  11265047           -21.1%    8884763        sched_debug.cfs_rq:/.avg_vruntime.avg
  13067285 ±  2%     -26.3%    9630862 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.max
  10675424           -23.9%    8119367 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.min
      0.78           -13.5%       0.67 ±  2%  sched_debug.cfs_rq:/.h_nr_queued.avg
      0.37           +13.0%       0.42 ±  2%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      0.77           -13.5%       0.67 ±  2%  sched_debug.cfs_rq:/.h_nr_runnable.avg
      0.37 ±  2%     +12.8%       0.42 ±  2%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
      8980 ± 10%     -14.6%       7667 ±  7%  sched_debug.cfs_rq:/.load.avg
  11265047           -21.1%    8884763        sched_debug.cfs_rq:/.min_vruntime.avg
  13067285 ±  2%     -26.3%    9630862 ±  2%  sched_debug.cfs_rq:/.min_vruntime.max
  10675424           -23.9%    8119367 ±  3%  sched_debug.cfs_rq:/.min_vruntime.min
      0.75           -11.7%       0.66        sched_debug.cfs_rq:/.nr_queued.avg
      0.33 ±  3%     +23.5%       0.40 ±  2%  sched_debug.cfs_rq:/.nr_queued.stddev
    265.16 ±  2%     +18.4%     313.92        sched_debug.cfs_rq:/.util_avg.stddev
    628.03           -17.3%     519.63 ±  2%  sched_debug.cfs_rq:/.util_est.avg
    313.28 ±  2%     +20.2%     376.59        sched_debug.cfs_rq:/.util_est.stddev
      5339 ±  5%     +79.9%       9606 ±  3%  sched_debug.cpu.avg_idle.min
      1441 ±  7%     -17.8%       1185 ± 17%  sched_debug.cpu.clock_task.stddev
      2929           -10.2%       2632        sched_debug.cpu.curr->pid.avg
      1380 ±  2%     +17.9%       1628        sched_debug.cpu.curr->pid.stddev
      0.76           -14.0%       0.66 ±  2%  sched_debug.cpu.nr_running.avg
      0.39 ±  2%     +11.2%       0.43        sched_debug.cpu.nr_running.stddev
   5279363           -14.3%    4526246 ±  3%  sched_debug.cpu.nr_switches.avg
 2.297e+10           -28.9%  1.634e+10 ±  2%  perf-stat.i.branch-instructions
      0.81            +0.1        0.93        perf-stat.i.branch-miss-rate%
 1.832e+08           -18.4%  1.495e+08 ±  2%  perf-stat.i.branch-misses
      1.64 ±  7%      +0.6        2.27 ± 13%  perf-stat.i.cache-miss-rate%
 6.943e+09           -34.2%   4.57e+09 ±  2%  perf-stat.i.cache-references
   4494744 ±  2%     -13.0%    3911893 ±  4%  perf-stat.i.context-switches
      2.51           +36.1%       3.42 ±  2%  perf-stat.i.cpi
 2.932e+11            -4.2%   2.81e+11        perf-stat.i.cpu-cycles
      2907 ± 16%   +1723.9%      53022 ± 13%  perf-stat.i.cpu-migrations
 1.167e+11           -29.3%  8.249e+10 ±  2%  perf-stat.i.instructions
      0.40           -25.8%       0.30 ±  2%  perf-stat.i.ipc
      0.04 ± 37%     -81.0%       0.01 ± 47%  perf-stat.i.major-faults
     35.11 ±  2%     -12.9%      30.57 ±  4%  perf-stat.i.metric.K/sec
      4270 ±  8%     -25.2%       3195        perf-stat.i.minor-faults
      4270 ±  8%     -25.2%       3195        perf-stat.i.page-faults
      0.80            +0.1        0.92        perf-stat.overall.branch-miss-rate%
      1.58 ±  7%      +0.6        2.20 ± 13%  perf-stat.overall.cache-miss-rate%
      2.51           +35.7%       3.41 ±  2%  perf-stat.overall.cpi
      0.40           -26.3%       0.29 ±  2%  perf-stat.overall.ipc
     10488           +13.8%      11937        perf-stat.overall.path-length
 2.289e+10           -28.9%  1.628e+10 ±  2%  perf-stat.ps.branch-instructions
 1.826e+08           -18.4%   1.49e+08 ±  2%  perf-stat.ps.branch-misses
  6.92e+09           -34.2%  4.555e+09 ±  2%  perf-stat.ps.cache-references
   4479829 ±  2%     -13.0%    3898858 ±  4%  perf-stat.ps.context-switches
 2.923e+11            -4.2%    2.8e+11        perf-stat.ps.cpu-cycles
      2901 ± 16%   +1721.9%      52859 ± 13%  perf-stat.ps.cpu-migrations
 1.164e+11           -29.3%  8.222e+10 ±  2%  perf-stat.ps.instructions
      0.04 ± 36%     -81.1%       0.01 ± 47%  perf-stat.ps.major-faults
      4246 ±  8%     -25.2%       3175        perf-stat.ps.minor-faults
      4246 ±  8%     -25.2%       3175        perf-stat.ps.page-faults
 3.515e+13           -29.3%  2.484e+13 ±  2%  perf-stat.total.instructions
      0.01 ± 36%   +4025.9%       0.37 ±133%  perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.release_sock.sk_wait_data.tcp_recvmsg_locked
      0.10 ± 30%    +275.0%       0.38 ± 23%  perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
      0.21 ±134%  +26130.2%      54.60 ± 11%  perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
      0.13 ± 11%   +4401.2%       6.02 ± 34%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.19 ±150%  +29283.8%      54.51 ±  8%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
      0.02 ± 42%   +1730.4%       0.35 ± 60%  perf-sched.sch_delay.avg.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
      0.09 ± 61%  +83629.3%      76.33 ± 49%  perf-sched.sch_delay.avg.ms.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
      2.00 ± 67%   +2740.7%      56.88 ± 93%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.06 ± 57%  +1.6e+05%      97.93 ± 48%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.37 ± 89%   +4177.4%      15.90 ± 19%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.61 ± 87%  +12221.9%      75.25 ± 56%  perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
      0.01 ± 21%    +289.8%       0.03 ±  5%  perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      3.12 ±102%   +5221.9%     166.03 ±  7%  perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.15 ±121%   +4038.3%       6.25 ± 10%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     62.06 ± 52%    +247.8%     215.86 ± 46%  perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg
      0.03 ± 64%  +22004.8%       6.93 ±136%  perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.release_sock.sk_wait_data.tcp_recvmsg_locked
     36.09 ± 57%    +524.7%     225.47 ± 52%  perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
    146.95 ± 33%    +643.8%       1092 ± 27%  perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
      3.47 ± 33%  +29240.5%       1019        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.56 ± 65%     -88.1%       0.07 ±136%  perf-sched.sch_delay.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
    173.81 ± 42%    +538.7%       1110 ± 46%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
      0.89 ± 92%   +3331.6%      30.64 ± 72%  perf-sched.sch_delay.max.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
      3.17 ±107%  +20724.1%     660.12 ± 82%  perf-sched.sch_delay.max.ms.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
      0.04 ± 17%     -62.0%       0.01 ± 48%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2.01 ± 52%  +74985.5%       1509 ± 33%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     63.02 ±125%    +610.9%     448.01 ±  7%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    138.54 ± 15%    +265.7%     506.70 ± 15%  perf-sched.sch_delay.max.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
    783.25 ± 33%    +317.5%       3269 ± 31%  perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    437.71 ± 91%    +197.5%       1302 ± 34%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ± 44%   +1817.5%       0.26 ±  6%  perf-sched.total_sch_delay.average.ms
      1168 ± 33%    +265.4%       4269 ± 14%  perf-sched.total_sch_delay.max.ms
      0.27 ±  4%    +239.8%       0.93 ±  8%  perf-sched.total_wait_and_delay.average.ms
   6426495 ±  3%     -66.1%    2178159 ±  8%  perf-sched.total_wait_and_delay.count.ms
      4156 ±  6%     +97.9%       8227 ± 21%  perf-sched.total_wait_and_delay.max.ms
      0.26 ±  4%    +159.1%       0.68 ±  9%  perf-sched.total_wait_time.average.ms
      4156 ±  6%     +29.0%       5361 ±  6%  perf-sched.total_wait_time.max.ms
      0.34 ± 18%    +351.4%       1.53 ± 18%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
      0.37 ±168%  +29845.0%     109.35 ± 12%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
      5.01 ± 24%    +307.2%      20.41 ± 18%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.33 ±181%  +32588.3%     109.02 ±  8%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
      0.25 ± 77%  +60202.8%     152.67 ± 49%  perf-sched.wait_and_delay.avg.ms.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
     37.01 ±  5%    +185.4%     105.64 ± 37%  perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
     22.34 ± 59%   +1096.8%     267.33 ± 51%  perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     28.51 ± 11%    +657.4%     215.92 ± 65%  perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      5.47 ±111%   +5490.8%     305.70 ± 37%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    491.64 ±  7%    +244.5%       1693 ± 60%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      5.44 ± 13%    +581.2%      37.03 ± 16%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1.37 ± 79%  +10884.8%     150.49 ± 56%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
      0.02 ± 14%    +561.9%       0.16 ±  5%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
    182.37 ±  9%    +210.0%     565.41 ±  6%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.44 ±113%   +2996.3%      13.52 ± 10%  perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    696.96 ±  6%     +47.1%       1025 ± 24%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1078 ± 73%    +218.9%       3438 ± 14%  perf-sched.wait_and_delay.count.__cond_resched.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg
      5253 ± 23%     -64.0%       1891 ± 13%  perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
    663.00 ±  8%     -54.6%     301.17 ± 19%  perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
     12.50 ± 22%     -82.7%       2.17 ± 90%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      4.83 ±  7%     -79.3%       1.00 ±100%  perf-sched.wait_and_delay.count.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      5.67 ±  8%     -79.4%       1.17 ±104%  perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
    115.50 ±  5%     -78.9%      24.33 ± 13%  perf-sched.wait_and_delay.count.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
    117.50 ±  4%     -73.8%      30.83 ±  8%  perf-sched.wait_and_delay.count.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     11.33 ± 26%     -88.2%       1.33 ±141%  perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    123.83 ± 49%    +136.5%     292.83 ± 22%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      2233 ±  9%     -85.5%     323.17 ± 94%  perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
     22.67 ±  4%     -63.2%       8.33 ± 49%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     12.50 ±  6%     -61.3%       4.83 ± 36%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     82.50 ±  4%     -71.5%      23.50 ± 17%  perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     19.17 ±  3%     -77.4%       4.33 ± 17%  perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
      1001 ±  6%     -73.0%     270.67 ± 15%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1790 ± 32%     -99.6%       8.00 ± 19%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
   6330659 ±  3%     -65.9%    2157272 ±  8%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      5075 ± 13%     -64.9%       1781 ±  9%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     38453 ± 49%     -73.5%      10206 ± 16%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    766.67 ±  6%     -70.7%     224.33 ± 17%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    108.98 ± 77%    +421.4%     568.27 ± 31%  perf-sched.wait_and_delay.max.ms.__cond_resched.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg
     73.56 ± 54%    +619.8%     529.51 ± 35%  perf-sched.wait_and_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
    163.67 ±108%   +1235.6%       2185 ± 27%  perf-sched.wait_and_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
      1001          +103.6%       2038        perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    174.95 ±126%   +1169.0%       2220 ± 46%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
      6.13 ±115%  +21438.1%       1320 ± 82%  perf-sched.wait_and_delay.max.ms.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
      2788 ± 25%     -62.0%       1058 ±102%  perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    741.28 ± 14%    +125.8%       1673 ± 14%  perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      1003          +251.0%       3523 ± 61%  perf-sched.wait_and_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    342.95 ±136%    +927.4%       3523 ± 39%  perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      1008 ± 40%    +374.8%       4787 ± 65%  perf-sched.wait_and_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
    207.61 ± 65%    +333.7%     900.49 ±  7%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    277.11 ± 15%    +268.2%       1020 ± 15%  perf-sched.wait_and_delay.max.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      4119 ±  8%     +73.3%       7139 ± 26%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3146 ± 20%     +89.9%       5976 ± 18%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.37 ±104%    +369.8%       1.76 ± 15%  perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg
      0.02 ± 35%   +9586.0%       1.61 ±149%  perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.sk_wait_data.tcp_recvmsg_locked
      0.24 ± 13%    +383.1%       1.15 ± 17%  perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
      0.23 ±128%  +23772.2%      54.75 ± 12%  perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
      4.88 ± 24%    +195.1%      14.39 ± 14%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.20 ±144%  +26532.7%      54.51 ±  8%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
      0.07 ± 35%   +2174.0%       1.58 ± 52%  perf-sched.wait_time.avg.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
      0.23 ± 29%  +33379.5%      76.33 ± 49%  perf-sched.wait_time.avg.ms.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
     37.01 ±  5%    +185.4%     105.64 ± 37%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
     22.28 ± 59%    +765.9%     192.89 ± 46%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.39 ±101%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
     26.50 ± 12%    +500.0%     159.04 ± 67%  perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
    498.82 ±  9%     -70.4%     147.68 ±114%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      5.41 ±112%   +3742.6%     207.76 ± 43%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    473.69 ±  3%    +177.8%       1315 ± 45%  perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      5.06 ±  7%    +317.1%      21.12 ± 13%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.76 ± 72%   +9807.3%      75.25 ± 56%  perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
      0.02 ± 10%    +690.9%       0.13 ±  6%  perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
    179.25 ±  8%    +122.8%     399.39 ±  5%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.29 ±108%   +2447.1%       7.27 ± 10%  perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     63.71 ± 51%    +604.4%     448.79 ± 18%  perf-sched.wait_time.max.ms.__cond_resched.__release_sock.__sk_flush_backlog.tcp_recvmsg_locked.tcp_recvmsg
      0.09 ±130%  +36184.9%      33.26 ±175%  perf-sched.wait_time.max.ms.__cond_resched.__release_sock.release_sock.sk_wait_data.tcp_recvmsg_locked
     38.34 ± 49%    +782.5%     338.36 ± 32%  perf-sched.wait_time.max.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
    146.97 ± 33%    +643.7%       1092 ± 27%  perf-sched.wait_time.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
      0.62 ± 57%     -89.2%       0.07 ±136%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
    173.81 ± 42%    +538.7%       1110 ± 46%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
      1.41 ± 42%  +10955.3%     155.36 ± 76%  perf-sched.wait_time.max.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
      4.70 ± 52%  +13934.8%     660.12 ± 82%  perf-sched.wait_time.max.ms.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
      2788 ± 25%     -62.0%       1058 ±102%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    741.28 ± 14%    +125.8%       1673 ± 14%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      1002          +151.0%       2515 ± 44%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.55 ± 78%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      1688 ± 27%     +98.5%       3350 ± 22%  perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
    341.76 ±136%    +587.1%       2348 ± 31%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    840.47 ± 28%    +309.7%       3443 ± 44%  perf-sched.wait_time.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
    167.71 ± 40%    +174.9%     460.98 ±  6%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    138.56 ± 15%    +284.4%     532.69 ± 12%  perf-sched.wait_time.max.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      4119 ±  8%     +20.5%       4965 ±  7%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2922 ± 15%     +75.6%       5133 ± 10%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm


***************************************************************************************************
lkp-emr-2sp1: 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 8592+ (Emerald Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-emr-2sp1/mmapfork/stress-ng/60s

commit: 
  8c57b687e8 ("mm, bpf: Introduce free_pages_nolock()")
  01d37228d3 ("memcg: Use trylock to access memcg stock_lock.")

8c57b687e8331eb8 01d37228d331047a0bbbd1026ce 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
  61300134 ±  4%     -11.2%   54464501 ±  3%  meminfo.max_used_kB
    100.42           +12.5%     112.99 ±  4%  uptime.boot
      1.93 ± 13%      -1.8        0.16 ± 14%  mpstat.cpu.all.soft%
     17.02 ±  2%      -5.5       11.49 ±  2%  mpstat.cpu.all.usr%
      5510 ± 48%    +195.8%      16296 ± 23%  perf-c2c.DRAM.remote
      1484 ± 49%    +138.9%       3545 ± 24%  perf-c2c.HITM.remote
     16.52 ±  2%     -32.4%      11.16 ±  2%  vmstat.cpu.us
     79660           -17.3%      65897 ±  4%  vmstat.system.cs
 3.332e+08 ±  2%     -54.9%  1.502e+08 ± 22%  numa-numastat.node0.local_node
 3.337e+08           -54.9%  1.504e+08 ± 22%  numa-numastat.node0.numa_hit
 3.329e+08           -72.4%   92005595 ± 52%  numa-numastat.node1.local_node
 3.335e+08           -72.4%   92128084 ± 52%  numa-numastat.node1.numa_hit
 3.205e+08           -51.8%  1.546e+08 ± 16%  numa-numastat.node2.local_node
 3.208e+08           -51.8%  1.547e+08 ± 16%  numa-numastat.node2.numa_hit
 3.173e+08 ±  2%     -73.1%   85455746 ± 45%  numa-numastat.node3.local_node
 3.176e+08 ±  2%     -73.1%   85574085 ± 45%  numa-numastat.node3.numa_hit
     12219           -63.1%       4511 ±  6%  stress-ng.mmapfork.ops
    202.59           -63.5%      73.93 ±  6%  stress-ng.mmapfork.ops_per_sec
     64.01           +19.7%      76.59 ±  7%  stress-ng.time.elapsed_time
     64.01           +19.7%      76.59 ±  7%  stress-ng.time.elapsed_time.max
   4100955           +10.0%    4509134 ±  4%  stress-ng.time.involuntary_context_switches
   1.3e+09           -63.1%  4.801e+08 ±  7%  stress-ng.time.minor_page_faults
     24509            +1.4%      24848        stress-ng.time.percent_of_cpu_this_job_got
     12906           +30.3%      16810 ±  6%  stress-ng.time.system_time
      2783 ±  2%     -20.4%       2216 ±  7%  stress-ng.time.user_time
    464362           -60.5%     183317 ±  5%  stress-ng.time.voluntary_context_switches
   5361967            +1.5%    5440921        proc-vmstat.nr_dirty_background_threshold
  10737045            +1.5%   10895145        proc-vmstat.nr_dirty_threshold
   7335955            -7.4%    6793970 ±  6%  proc-vmstat.nr_file_pages
  53947071            +1.4%   54723190        proc-vmstat.nr_free_pages
   6453955 ±  2%      -8.4%    5911969 ±  7%  proc-vmstat.nr_shmem
     60752            -5.1%      57679        proc-vmstat.nr_slab_reclaimable
    210937            -7.3%     195559 ±  3%  proc-vmstat.nr_slab_unreclaimable
 1.306e+09           -63.0%   4.83e+08 ±  7%  proc-vmstat.numa_hit
 1.305e+09           -63.0%  4.824e+08 ±  7%  proc-vmstat.numa_local
 1.505e+09           -63.0%  5.563e+08 ±  7%  proc-vmstat.pgalloc_normal
 1.301e+09           -63.0%  4.807e+08 ±  7%  proc-vmstat.pgfault
 1.504e+09           -63.1%   5.55e+08 ±  7%  proc-vmstat.pgfree
    616613 ± 23%     -54.7%     279591 ± 21%  proc-vmstat.pgreuse
    389483           -63.3%     142929 ±  6%  proc-vmstat.thp_fault_alloc
   5489228 ± 29%     +62.7%    8928447 ± 38%  numa-meminfo.node0.FilePages
  10213877 ± 31%     -50.5%    5060225 ± 37%  numa-meminfo.node3.Active
  10213877 ± 31%     -50.5%    5060225 ± 37%  numa-meminfo.node3.Active(anon)
   1849516 ± 31%     -58.7%     764271 ± 49%  numa-meminfo.node3.AnonHugePages
   2103284 ± 26%     -54.3%     961145 ± 33%  numa-meminfo.node3.AnonPages
   4500176 ± 30%     -62.0%    1708987 ± 31%  numa-meminfo.node3.AnonPages.max
   8719420 ± 29%     -52.1%    4173994 ± 41%  numa-meminfo.node3.FilePages
     28893 ± 23%     -38.5%      17773 ± 19%  numa-meminfo.node3.KernelStack
   6264266 ± 35%     -49.9%    3139363 ± 52%  numa-meminfo.node3.Mapped
  52318919 ±  6%     +11.7%   58419737 ±  3%  numa-meminfo.node3.MemFree
  13602300 ± 23%     -44.9%    7501482 ± 28%  numa-meminfo.node3.MemUsed
    146208 ± 31%     -56.0%      64374 ± 39%  numa-meminfo.node3.PageTables
    217141 ± 16%     -33.3%     144858 ± 12%  numa-meminfo.node3.SUnreclaim
   8085495 ± 33%     -49.3%    4097432 ± 39%  numa-meminfo.node3.Shmem
    279175 ± 13%     -32.4%     188683 ± 16%  numa-meminfo.node3.Slab
 3.341e+08           -55.0%  1.504e+08 ± 22%  numa-vmstat.node0.numa_hit
 3.335e+08 ±  2%     -55.0%  1.502e+08 ± 22%  numa-vmstat.node0.numa_local
 3.338e+08           -72.4%   92136512 ± 52%  numa-vmstat.node1.numa_hit
 3.332e+08           -72.4%   92014024 ± 52%  numa-vmstat.node1.numa_local
 3.211e+08           -51.8%  1.547e+08 ± 16%  numa-vmstat.node2.numa_hit
 3.208e+08           -51.8%  1.546e+08 ± 16%  numa-vmstat.node2.numa_local
   2548625 ± 32%     -50.5%    1261365 ± 41%  numa-vmstat.node3.nr_active_anon
    526621 ± 27%     -54.4%     239946 ± 36%  numa-vmstat.node3.nr_anon_pages
    904.50 ± 32%     -59.0%     370.93 ± 53%  numa-vmstat.node3.nr_anon_transparent_hugepages
   2173840 ± 30%     -52.2%    1040058 ± 45%  numa-vmstat.node3.nr_file_pages
  13073386 ±  6%     +11.8%   14619853 ±  4%  numa-vmstat.node3.nr_free_pages
     28795 ± 23%     -38.2%      17795 ± 20%  numa-vmstat.node3.nr_kernel_stack
     35596 ± 33%     -54.7%      16125 ± 43%  numa-vmstat.node3.nr_page_table_pages
   2015361 ± 33%     -49.3%    1020916 ± 44%  numa-vmstat.node3.nr_shmem
     53949 ± 16%     -32.8%      36254 ± 14%  numa-vmstat.node3.nr_slab_unreclaimable
   2548649 ± 32%     -50.5%    1261287 ± 41%  numa-vmstat.node3.nr_zone_active_anon
 3.179e+08 ±  2%     -73.1%   85558747 ± 45%  numa-vmstat.node3.numa_hit
 3.176e+08 ±  2%     -73.1%   85440409 ± 45%  numa-vmstat.node3.numa_local
  12034713 ±  2%     +27.0%   15288847 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.max
   1438718 ± 32%    +154.6%    3662687 ± 29%  sched_debug.cfs_rq:/.avg_vruntime.stddev
     70491 ±190%     -95.0%       3512 ± 44%  sched_debug.cfs_rq:/.load.avg
    743319 ±202%     -97.4%      19566 ±124%  sched_debug.cfs_rq:/.load.stddev
  12034713 ±  2%     +27.0%   15288847 ±  8%  sched_debug.cfs_rq:/.min_vruntime.max
   1438712 ± 32%    +154.6%    3662687 ± 29%  sched_debug.cfs_rq:/.min_vruntime.stddev
    222.33 ±  6%     -11.2%     197.52 ±  8%  sched_debug.cfs_rq:/.util_avg.stddev
     60.33 ± 23%     -50.9%      29.61 ± 22%  sched_debug.cfs_rq:/.util_est.avg
    139547 ± 15%     +69.7%     236820 ± 23%  sched_debug.cpu.avg_idle.stddev
    179893 ±  2%     -65.7%      61761 ±  3%  sched_debug.cpu.curr->pid.avg
    192433           -64.8%      67681 ±  4%  sched_debug.cpu.curr->pid.max
     38238 ± 26%     -72.7%      10420 ±  8%  sched_debug.cpu.curr->pid.stddev
    868454 ± 22%     +35.9%    1180661 ± 19%  sched_debug.cpu.max_idle_balance_cost.max
     29004 ± 42%    +201.6%      87490 ± 36%  sched_debug.cpu.max_idle_balance_cost.stddev
     10283           -12.3%       9017 ±  3%  sched_debug.cpu.nr_switches.avg
      8342 ±  4%     -24.6%       6290 ±  4%  sched_debug.cpu.nr_switches.min
      1412 ± 47%     +60.5%       2267 ± 35%  sched_debug.cpu.nr_switches.stddev
     57.00 ± 39%     -50.0%      28.50 ± 49%  sched_debug.cpu.nr_uninterruptible.max
      7.07 ±  7%     -34.3%       4.65 ± 18%  sched_debug.cpu.nr_uninterruptible.stddev
 4.289e+10           -70.6%  1.263e+10 ±  2%  perf-stat.i.branch-instructions
      0.27            +0.1        0.40 ±  4%  perf-stat.i.branch-miss-rate%
  86292456 ±  2%     -49.7%   43436183 ±  6%  perf-stat.i.branch-misses
     52.01            -8.7       43.30 ±  5%  perf-stat.i.cache-miss-rate%
  1.09e+09           -72.1%  3.043e+08 ± 15%  perf-stat.i.cache-misses
 2.089e+09           -66.0%  7.105e+08 ± 20%  perf-stat.i.cache-references
     81002           -21.4%      63704 ±  5%  perf-stat.i.context-switches
      3.27          +240.9%      11.16 ±  2%  perf-stat.i.cpi
  7.08e+11            +1.7%    7.2e+11        perf-stat.i.cpu-cycles
      7221 ±  5%     -73.5%       1917 ±  5%  perf-stat.i.cpu-migrations
    642.37          +274.4%       2404 ± 18%  perf-stat.i.cycles-between-cache-misses
 2.136e+11           -70.2%  6.365e+10 ±  2%  perf-stat.i.instructions
      0.32           -63.8%       0.12 ±  3%  perf-stat.i.ipc
      1.64 ± 59%     -94.6%       0.09 ± 88%  perf-stat.i.major-faults
      2.34 ±  4%    -100.0%       0.00        perf-stat.i.metric.K/sec
    314368 ±  2%     -68.1%     100262 ±  2%  perf-stat.i.minor-faults
    314370 ±  2%     -68.1%     100263 ±  2%  perf-stat.i.page-faults
      0.20 ±  2%      +0.1        0.32 ±  9%  perf-stat.overall.branch-miss-rate%
     51.98            -8.5       43.45 ±  5%  perf-stat.overall.cache-miss-rate%
      3.33          +241.0%      11.35 ±  2%  perf-stat.overall.cpi
    651.94          +273.2%       2432 ± 18%  perf-stat.overall.cycles-between-cache-misses
      0.30           -70.7%       0.09 ±  2%  perf-stat.overall.ipc
   4.2e+10           -70.2%  1.251e+10 ±  2%  perf-stat.ps.branch-instructions
  83007046 ±  2%     -52.3%   39570947 ±  8%  perf-stat.ps.branch-misses
 1.068e+09           -71.6%   3.03e+08 ± 16%  perf-stat.ps.cache-misses
 2.054e+09           -65.7%  7.048e+08 ± 20%  perf-stat.ps.cache-references
     79568           -19.1%      64350 ±  3%  perf-stat.ps.context-switches
 6.961e+11            +2.7%  7.152e+11        perf-stat.ps.cpu-cycles
      7079 ±  5%     -73.2%       1894 ±  5%  perf-stat.ps.cpu-migrations
 2.091e+11           -69.9%  6.303e+10 ±  2%  perf-stat.ps.instructions
      1.56 ± 59%     -97.0%       0.05 ± 74%  perf-stat.ps.major-faults
    305825 ±  3%     -68.3%      97054        perf-stat.ps.minor-faults
    305826 ±  3%     -68.3%      97054        perf-stat.ps.page-faults
 1.349e+13           -64.1%  4.846e+12 ±  8%  perf-stat.total.instructions
     28.19 ± 10%    +366.8%     131.59 ± 88%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
     34.75 ± 15%     +77.7%      61.75 ± 36%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pud_alloc
     34.03 ± 29%     +54.9%      52.72 ± 20%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pgd_alloc
     24.31 ±  9%    +198.8%      72.64 ± 33%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
     25.40 ±110%    +244.6%      87.53 ± 42%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.allocate_slab.___slab_alloc
     41.64 ± 10%     +41.3%      58.86 ±  9%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
     33.33 ±  8%    +213.7%     104.54 ± 68%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
     35.00 ± 28%     +67.7%      58.69 ± 26%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_node_noprof.__get_vm_area_node.__vmalloc_node_range_noprof.alloc_thread_stack_node
      6.70 ± 98%    +367.0%      31.28 ± 71%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
     31.30 ± 30%    +619.1%     225.07 ±179%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.__vmalloc_area_node.__vmalloc_node_range_noprof.alloc_thread_stack_node
     34.75 ± 19%    +198.6%     103.76 ± 94%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
     38.41 ± 10%     +65.4%      63.55 ± 24%  perf-sched.sch_delay.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
     44.17 ±  6%     +57.3%      69.45 ± 25%  perf-sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
     30.30 ± 17%     +41.9%      43.00 ± 14%  perf-sched.sch_delay.avg.ms.__cond_resched.__vmalloc_area_node.__vmalloc_node_range_noprof.alloc_thread_stack_node.dup_task_struct
      1.48 ± 33%    +818.4%      13.56 ±109%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
     32.02 ± 28%     +98.3%      63.48 ± 23%  perf-sched.sch_delay.avg.ms.__cond_resched.cgroup_css_set_fork.cgroup_can_fork.copy_process.kernel_clone
     33.07 ± 11%    +184.3%      94.01 ± 52%  perf-sched.sch_delay.avg.ms.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop
     30.00 ±  9%    +239.8%     101.93 ± 52%  perf-sched.sch_delay.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
      6.51 ± 31%    +348.8%      29.20 ± 40%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.__mm_populate.vm_mmap_pgoff.do_syscall_64
      1.37 ±123%    +609.8%       9.72 ± 70%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.part
     31.18 ±  6%    +197.2%      92.68 ± 52%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write.anon_vma_clone.anon_vma_fork.dup_mmap
     30.95 ± 10%    +282.4%     118.34 ± 67%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write.anon_vma_fork.dup_mmap.dup_mm
     33.18 ±  9%    +166.0%      88.25 ± 41%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write.dup_mmap.dup_mm.constprop
     39.63 ±  6%     +67.8%      66.50 ± 21%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write.free_pgtables.exit_mmap.__mmput
     38.72 ±  9%     +62.3%      62.86 ± 14%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap
     37.73 ± 12%     +62.0%      61.14 ± 11%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_file_vma_batch_add.free_pgtables.exit_mmap
      8.30 ± 50%    +275.7%      31.20 ± 43%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.67 ± 50%    +290.9%       2.61 ± 46%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      4.39 ±101%    +698.1%      35.06 ± 49%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
     38.18 ± 25%     +60.7%      61.36 ± 15%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.path_put.exit_fs.do_exit
      0.48 ±135%   +2338.4%      11.59 ± 64%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
      0.60 ±152%   +1831.8%      11.56 ± 89%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
     42.64 ±  7%     +52.1%      64.85 ± 20%  perf-sched.sch_delay.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
     21.22 ± 35%     -47.9%      11.05 ± 56%  perf-sched.sch_delay.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     35.21 ± 16%    +339.5%     154.78 ±119%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_bulk_noprof.mas_dup_alloc.isra.0
      8.71 ± 54%    +270.1%      32.24 ± 40%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo
      7.82 ± 51%    +285.6%      30.14 ± 40%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.__shmem_file_setup
      1.73 ±141%    +785.8%      15.36 ± 63%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
     35.74 ± 31%    +484.4%     208.89 ±103%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_pid.copy_process.kernel_clone
     31.90 ± 10%    +214.3%     100.25 ± 54%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.anon_vma_fork.dup_mmap.dup_mm
     25.61 ± 29%    +644.5%     190.67 ±164%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.copy_signal.copy_process.kernel_clone
     37.35 ± 13%     +61.0%      60.16 ± 19%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.dup_mm.constprop.0
      0.75 ±223%   +1417.4%      11.45 ±120%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
      9.62 ± 46%    +243.7%      33.06 ± 36%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
     27.94 ± 13%    +207.2%      85.84 ± 27%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
      7.69 ± 48%    +288.3%      29.87 ± 34%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
      8.08 ± 23%    +264.7%      29.45 ± 42%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      1.21 ±172%   +1936.5%      24.64 ± 77%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
     32.66 ±  9%    +184.8%      93.02 ± 44%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.dup_mmap.dup_mm
     38.89 ± 11%     +85.5%      72.16 ± 34%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
      4.15 ±223%   +1063.1%      48.31 ± 33%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_event_exit_task.do_exit.do_group_exit
     32.51 ± 17%    +291.0%     127.10 ± 66%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init
     44.38 ± 16%    +262.6%     160.91 ±101%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm
     35.49 ± 15%     +67.2%      59.34 ± 26%  perf-sched.sch_delay.avg.ms.__cond_resched.remove_vma.exit_mmap.__mmput.exit_mm
     48.78 ± 10%     +24.2%      60.57 ± 13%  perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
     45.39 ±  8%     +53.8%      69.81 ± 12%  perf-sched.sch_delay.avg.ms.__cond_resched.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
     42.58 ± 30%    +283.5%     163.28 ±140%  perf-sched.sch_delay.avg.ms.__cond_resched.switch_task_namespaces.do_exit.do_group_exit.__x64_sys_exit_group
     14.92 ±101%    +260.8%      53.85 ± 29%  perf-sched.sch_delay.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     35.61 ± 16%    +140.7%      85.71 ± 76%  perf-sched.sch_delay.avg.ms.__cond_resched.uprobe_start_dup_mmap.dup_mm.constprop.0
     29.84 ± 17%     +83.0%      54.61 ±  9%  perf-sched.sch_delay.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     46.38 ±  7%     +35.6%      62.87 ±  8%  perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
     11.27 ± 11%    +933.5%     116.43 ±119%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
     11.15 ± 11%    +459.2%      62.34 ± 58%  perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
     34.39 ±  9%     +59.8%      54.95 ±  9%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     30.52 ± 18%     +41.8%      43.27 ± 10%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      2.28 ± 49%   +1270.2%      31.31 ±114%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      7.76 ± 35%    +144.3%      18.95 ± 31%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     38.01 ± 23%    +151.3%      95.53 ± 46%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.__put_anon_vma
     29.46 ± 14%    +201.3%      88.77 ± 63%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.anon_vma_clone
     29.91 ± 10%    +134.2%      70.04 ± 23%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.anon_vma_fork
     32.15 ± 24%    +102.8%      65.18 ± 23%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.dup_mmap
     36.21 ± 20%     +56.2%      56.56 ± 27%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_anon_vmas
     14.35 ± 96%    +344.7%      63.80 ± 23%  perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_file_vma_batch_final
      0.08 ± 57%   +5932.7%       4.71 ±121%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     37.96 ± 13%     +84.3%      69.97 ± 37%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      4.77 ± 39%     -70.7%       1.40 ± 99%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.ret_from_fork_asm.[unknown]
    111.67 ± 20%     +60.7%     179.44 ± 19%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pgd_alloc
    445.63 ± 20%    +632.3%       3263 ± 57%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
     99.05 ± 17%     +72.4%     170.81 ± 24%  perf-sched.sch_delay.max.ms.__cond_resched.__dentry_kill.dput.__fput.task_work_run
    483.91 ± 16%    +592.2%       3349 ± 56%  perf-sched.sch_delay.max.ms.__cond_resched.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff
    148.35 ± 21%   +1202.9%       1932 ±113%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_node_noprof.__get_vm_area_node.__vmalloc_node_range_noprof.alloc_thread_stack_node
    362.40 ± 14%    +921.6%       3702 ± 62%  perf-sched.sch_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
    257.63 ±145%   +1627.8%       4451 ±147%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    260.06 ± 33%   +4874.0%      12935 ±112%  perf-sched.sch_delay.max.ms.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop
    271.64 ± 21%   +5159.3%      14286 ±101%  perf-sched.sch_delay.max.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
     33.52 ± 39%    +228.4%     110.08 ± 37%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.__mm_populate.vm_mmap_pgoff.do_syscall_64
    104.84 ± 33%    +577.7%     710.49 ±173%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
      1.80 ±104%   +2305.8%      43.38 ± 67%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link_path_walk.part
     19.72 ± 46%    +391.9%      97.03 ± 42%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
    222.08 ± 11%   +5794.4%      13090 ±108%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.anon_vma_clone.anon_vma_fork.dup_mmap
    274.25 ±  5%   +5170.7%      14454 ±111%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.dup_mmap.dup_mm.constprop
     73.44 ± 48%    +113.6%     156.88 ± 21%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.unlink_file_vma_batch_final.free_pgtables.exit_mmap
     23.85 ± 50%    +249.4%      83.33 ± 34%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
     92.73 ± 29%     +71.5%     159.00 ± 24%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
      4.07 ± 36%    +430.7%      21.62 ± 42%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      5.62 ±121%   +1304.8%      78.91 ± 39%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.vm_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.01 ±140%   +5492.8%      56.59 ± 42%  perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.part
      0.75 ±163%   +5258.4%      40.24 ± 77%  perf-sched.sch_delay.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
     34.61 ± 78%    +215.9%     109.35 ± 30%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo
     44.21 ± 76%    +252.4%     155.83 ± 36%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.dup_task_struct.copy_process.kernel_clone
      1.78 ± 95%    +591.1%      12.30 ± 60%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.__khugepaged_enter.do_huge_pmd_anonymous_page.__handle_mm_fault
     35.91 ± 77%    +159.0%      93.00 ± 31%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.__shmem_file_setup
      3.16 ±154%   +1390.5%      47.14 ± 53%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
    330.41 ± 24%   +4614.7%      15577 ±102%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.anon_vma_fork.dup_mmap.dup_mm
      0.75 ±223%   +4317.2%      33.33 ±110%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
     37.23 ± 75%    +142.7%      90.36 ± 33%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
    145.93 ± 24%   +9241.3%      13631 ±108%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
     43.53 ± 79%    +166.1%     115.84 ± 31%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      3.38 ±203%   +1420.1%      51.46 ± 60%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
    224.58 ± 18%    +681.9%       1755 ±105%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
      4.15 ±223%   +2488.3%     107.50 ± 30%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.perf_event_exit_task.do_exit.do_group_exit
    411.45 ±112%   +3129.7%      13288 ±108%  perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init
    428.93 ± 16%    +707.9%       3465 ± 50%  perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
     40.64 ± 80%    +240.6%     138.43 ± 54%  perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_unacct_blocks.shmem_undo_range.shmem_evict_inode.evict
    459.95 ± 14%    +804.7%       4161 ± 57%  perf-sched.sch_delay.max.ms.__cond_resched.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
     91.90 ± 34%    +659.5%     697.98 ±177%  perf-sched.sch_delay.max.ms.__cond_resched.switch_task_namespaces.do_exit.do_group_exit.__x64_sys_exit_group
     45.16 ± 95%    +176.9%     125.05 ± 37%  perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     56.92 ± 47%    +144.0%     138.85 ± 30%  perf-sched.sch_delay.max.ms.__cond_resched.unmap_page_range.unmap_vmas.exit_mmap.__mmput
    319.32 ± 23%    +445.2%       1740 ± 63%  perf-sched.sch_delay.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
    348.62 ± 21%    +721.6%       2864 ± 53%  perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
     27.43 ± 75%   +1687.1%     490.13 ±145%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      1363 ± 60%   +1094.4%      16281 ±102%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    317.08 ± 29%    +578.7%       2151 ± 82%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
    451.54 ± 16%    +831.0%       4203 ± 54%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    142.47 ± 23%     +50.7%     214.69 ± 26%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
    298.70 ± 20%    +754.7%       2553 ± 53%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
    717.11 ± 59%   +1529.8%      11687 ±127%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
     39.28 ± 45%    +302.2%     158.00 ±122%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     17.73 ±112%    +561.1%     117.22 ± 58%  perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_file_vma_batch_final
     58.28 ± 53%   +2240.3%       1363 ± 71%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     47.17 ± 95%  +15545.3%       7379 ±104%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    314.10 ± 16%   +4088.0%      13154 ±100%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     36.28 ± 78%    +123.3%      81.01 ± 32%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     35.75 ±  9%     +58.8%      56.76 ±  4%  perf-sched.total_sch_delay.average.ms
      1769 ± 70%    +860.7%      17004 ± 96%  perf-sched.total_sch_delay.max.ms
     81.84 ±  8%     +61.4%     132.10 ±  9%  perf-sched.total_wait_and_delay.average.ms
    446898 ±  7%     +98.4%     886795 ± 17%  perf-sched.total_wait_and_delay.count.ms
      3840 ± 62%    +804.8%      34745 ± 92%  perf-sched.total_wait_and_delay.max.ms
     46.09 ±  6%     +63.5%      75.34 ± 13%  perf-sched.total_wait_time.average.ms
      3062 ± 25%    +607.8%      21677 ± 69%  perf-sched.total_wait_time.max.ms
     32.02 ±141%    +823.3%     295.64 ± 79%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
     81.86 ± 10%     +41.9%     116.16 ±  9%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
     28.49 ±141%    +696.0%     226.79 ± 53%  perf-sched.wait_and_delay.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
      9.67 ± 17%    +170.4%      26.14 ± 56%  perf-sched.wait_and_delay.avg.ms.__cond_resched.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
     80.77 ± 79%    +507.2%     490.43 ±109%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_pid.copy_process.kernel_clone
     14.65 ±223%   +1421.7%     222.99 ± 55%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.anon_vma_fork.dup_mmap.dup_mm
     42.27 ±100%    +374.5%     200.59 ± 42%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.dup_mmap.dup_mm
     31.22 ±141%    +755.9%     267.22 ± 63%  perf-sched.wait_and_delay.avg.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init
    384.40 ± 40%     -75.2%      95.38 ±145%  perf-sched.wait_and_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
     96.32 ± 10%     +24.3%     119.73 ± 13%  perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
     90.77 ±  8%     +50.2%     136.29 ± 11%  perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
     37.70 ±  7%    +544.1%     242.86 ± 52%  perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.15 ±  4%    +572.3%       0.98 ± 56%  perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
    133.70 ±  9%    +250.2%     468.27 ± 64%  perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    571.39 ±  7%    +162.7%       1501 ± 69%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    457.45           +55.6%     711.90 ± 45%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
    114.14 ±  4%    +453.2%     631.37 ± 45%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     83.38 ± 10%     +86.2%     155.25 ± 39%  perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     70.48 ±  2%    +337.6%     308.40 ± 49%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     29.17 ±142%   +1403.4%     438.50 ± 28%  perf-sched.wait_and_delay.count.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
      5278 ± 11%     -88.4%     610.67 ±141%  perf-sched.wait_and_delay.count.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
    304.50 ±142%    +923.2%       3115 ± 18%  perf-sched.wait_and_delay.count.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
      3084 ± 17%     -83.9%     495.67 ±142%  perf-sched.wait_and_delay.count.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
      5591 ± 21%    +132.7%      13007 ± 16%  perf-sched.wait_and_delay.count.__cond_resched.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
    286.67 ±223%   +1574.3%       4799 ± 17%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.anon_vma_fork.dup_mmap.dup_mm
    192.50 ±100%    +938.4%       1998 ± 18%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.dup_mmap.dup_mm
     97.50 ±142%    +386.3%     474.17 ± 15%  perf-sched.wait_and_delay.count.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init
    483.83 ± 26%     -55.7%     214.50 ± 46%  perf-sched.wait_and_delay.count.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.mm_init.dup_mm
     31620 ± 11%   +1464.8%     494787 ± 16%  perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
     74608 ±  7%     -60.6%      29368 ± 25%  perf-sched.wait_and_delay.count.__cond_resched.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
    690.50 ± 17%     -59.6%     279.17 ±102%  perf-sched.wait_and_delay.count.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
      1929 ± 20%     -59.0%     790.50 ±104%  perf-sched.wait_and_delay.count.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
     34736 ±  7%     -27.1%      25322 ± 17%  perf-sched.wait_and_delay.count.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
     11.00 ±  9%     +87.9%      20.67 ± 17%  perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
     24.17 ±  9%     +98.6%      48.00 ± 15%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     13.50 ±  9%    +300.0%      54.00 ±102%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     43.50 ±  8%    +139.8%     104.33 ± 16%  perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
     20294 ± 11%     -31.3%      13937 ± 19%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    762.80 ±145%   +2865.9%      22623 ±136%  perf-sched.wait_and_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
    891.25 ± 20%    +511.4%       5448 ± 47%  perf-sched.wait_and_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
    967.82 ± 16%    +556.8%       6356 ± 57%  perf-sched.wait_and_delay.max.ms.__cond_resched.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff
    724.81 ± 14%    +848.3%       6873 ± 59%  perf-sched.wait_and_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
    772.30 ±145%   +3714.2%      29456 ± 96%  perf-sched.wait_and_delay.max.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
    762.95 ±223%   +4012.3%      31374 ±101%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.anon_vma_fork.dup_mmap.dup_mm
      1069 ±132%   +2709.8%      30059 ± 96%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.dup_mmap.dup_mm
    512.94 ±188%   +5166.8%      27015 ±107%  perf-sched.wait_and_delay.max.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init
    857.86 ± 16%    +679.2%       6684 ± 55%  perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
    919.89 ± 14%    +764.9%       7956 ± 59%  perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
    914.96 ± 12%   +2199.9%      21043 ± 88%  perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    321.23 ± 19%    +594.3%       2230 ± 74%  perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      2838 ± 53%   +1052.5%      32709 ±102%  perf-sched.wait_and_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    903.08 ± 16%   +1170.0%      11469 ± 67%  perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      1727 ± 17%   +1533.4%      28221 ± 97%  perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1050         +1308.0%      14792 ±118%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
      1567 ±  8%   +1368.0%      23016 ± 87%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1730 ± 53%   +1443.6%      26709 ± 97%  perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2200 ± 22%   +1164.5%      27827 ±105%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     53.28 ± 19%    +207.9%     164.05 ± 72%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.__pmd_alloc
     35.30 ± 12%    +208.9%     109.06 ± 48%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
     23.94 ±107%    +265.6%      87.53 ± 42%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.allocate_slab.___slab_alloc
     40.22 ± 10%     +42.5%      57.29 ±  9%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
      1.32 ±206%   +2168.2%      29.87 ± 74%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
     38.38 ± 10%     +59.5%      61.24 ± 23%  perf-sched.wait_time.avg.ms.__cond_resched.__put_anon_vma.unlink_anon_vmas.free_pgtables.exit_mmap
     45.35 ±  9%     +45.5%      65.96 ± 24%  perf-sched.wait_time.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
      4.21 ± 15%    +280.3%      16.00 ± 91%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
     49.28 ±  9%    +132.5%     114.60 ± 61%  perf-sched.wait_time.avg.ms.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop
     49.10 ±  7%    +154.3%     124.87 ± 53%  perf-sched.wait_time.avg.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
      0.47 ±107%   +5995.1%      28.58 ± 39%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.__mm_populate.vm_mmap_pgoff.do_syscall_64
      0.60 ±177%   +5535.0%      33.81 ± 48%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
     56.26 ± 18%     +95.3%     109.85 ± 54%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.anon_vma_clone.anon_vma_fork.dup_mmap
     45.90 ± 12%    +198.5%     137.01 ± 62%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.anon_vma_fork.dup_mmap.dup_mm
     50.84 ±  9%    +114.6%     109.07 ± 45%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.dup_mmap.dup_mm.constprop
     40.23 ±  7%     +60.7%      64.65 ± 22%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.free_pgtables.exit_mmap.__mmput
     39.00 ±  8%     +57.8%      61.55 ± 14%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.unlink_anon_vmas.free_pgtables.exit_mmap
     37.73 ± 12%     +56.6%      59.08 ± 12%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.unlink_file_vma_batch_add.free_pgtables.exit_mmap
      1.15 ±196%   +2575.9%      30.79 ± 44%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
      0.19 ± 64%    +303.8%       0.76 ± 28%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
     38.18 ± 25%     +60.7%      61.36 ± 15%  perf-sched.wait_time.avg.ms.__cond_resched.dput.path_put.exit_fs.do_exit
      0.46 ±140%   +1957.9%       9.55 ± 63%  perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
      0.14 ±202%   +7445.0%      10.61 ± 94%  perf-sched.wait_time.avg.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
     43.32 ±  7%     +44.6%      62.65 ± 18%  perf-sched.wait_time.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
      0.75 ± 30%   +1444.8%      11.62 ± 63%  perf-sched.wait_time.avg.ms.__cond_resched.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
     49.62 ± 32%     -77.7%      11.05 ± 56%  perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      0.48 ± 95%   +6560.8%      32.08 ± 40%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo
      4.89 ±174%    +564.9%      32.54 ± 43%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_lru_noprof.shmem_alloc_inode.alloc_inode.new_inode
      2.54 ±147%   +1068.9%      29.69 ± 41%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.__shmem_file_setup
      0.59 ±138%   +2094.7%      12.94 ± 81%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
     50.54 ±  9%    +142.8%     122.73 ± 56%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.anon_vma_fork.dup_mmap.dup_mm
     28.71 ± 37%    +576.9%     194.34 ±165%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.copy_signal.copy_process.kernel_clone
      0.52 ±130%   +6185.7%      32.63 ± 36%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
     41.66 ± 25%    +150.9%     104.53 ± 22%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
      1.48 ±147%   +1847.1%      28.80 ± 35%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.security_inode_alloc.inode_init_always_gfp.alloc_inode
      0.81 ± 48%   +3501.5%      29.20 ± 42%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      0.31 ±172%   +7598.9%      24.05 ± 81%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
     50.76 ±  8%    +111.9%     107.57 ± 40%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.dup_mmap.dup_mm
     38.89 ± 11%     +79.6%      69.87 ± 36%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
      4.15 ±223%   +1063.1%      48.31 ± 33%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_event_exit_task.do_exit.do_group_exit
     51.54 ± 18%    +171.9%     140.12 ± 61%  perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock_killable.pcpu_alloc_noprof.__percpu_counter_init_many.mm_init
    384.38 ± 40%     -67.3%     125.66 ± 96%  perf-sched.wait_time.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
     35.52 ± 15%     +67.0%      59.33 ± 26%  perf-sched.wait_time.avg.ms.__cond_resched.remove_vma.exit_mmap.__mmput.exit_mm
     47.53 ± 10%     +24.5%      59.17 ± 13%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
     45.37 ±  8%     +46.5%      66.48 ± 11%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
     37.68 ±  7%    +541.9%     241.86 ± 52%  perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     42.58 ± 30%    +283.5%     163.28 ±140%  perf-sched.wait_time.avg.ms.__cond_resched.switch_task_namespaces.do_exit.do_group_exit.__x64_sys_exit_group
      5.03 ±115%    +947.4%      52.66 ± 29%  perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     31.31 ± 18%     +82.2%      57.03 ±  9%  perf-sched.wait_time.avg.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     46.45 ±  6%     +29.1%      59.95 ±  7%  perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
      0.15 ±  4%    +572.3%       0.98 ± 56%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
    122.43 ±  9%    +187.4%     351.84 ± 46%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.72 ± 52%   +3352.8%      59.24 ± 60%  perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
     32.83 ±  9%     +63.7%      53.76 ± 10%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     12.22 ± 20%    +183.3%      34.61 ± 33%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
     30.44 ± 18%     +35.2%      41.15 ± 13%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
    563.63 ±  7%    +163.0%       1482 ± 70%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     38.02 ± 23%    +151.3%      95.54 ± 46%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.__put_anon_vma
     45.48 ± 12%     +83.7%      83.53 ± 26%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.anon_vma_fork
     46.77 ± 14%    +134.7%     109.78 ± 67%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.dup_mmap
     14.35 ± 96%    +344.7%      63.80 ± 23%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_file_vma_batch_final
     28.53 ± 31%    +752.9%     243.34 ±109%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    457.42           +55.6%     711.84 ± 45%  perf-sched.wait_time.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      5.01 ±  3%    +105.7%      10.31 ± 60%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    114.07 ±  4%    +452.5%     630.20 ± 45%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     45.42 ±  8%     +87.8%      85.28 ± 40%  perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2.09 ± 77%    +416.8%      10.82 ± 56%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     70.30 ±  2%    +335.3%     306.05 ± 49%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    974.31 ± 50%   +1374.0%      14360 ±104%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.alloc_pages_noprof.pte_alloc_one
    445.63 ± 20%    +511.4%       2724 ± 47%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
     99.05 ± 17%     +72.4%     170.81 ± 24%  perf-sched.wait_time.max.ms.__cond_resched.__dentry_kill.dput.__fput.task_work_run
    483.91 ± 16%    +556.8%       3178 ± 57%  perf-sched.wait_time.max.ms.__cond_resched.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff
      5.26 ±206%   +1212.1%      69.05 ± 45%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
     35.48 ±172%  +15060.6%       5378 ±205%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.vmstat_start.seq_read_iter.proc_reg_read_iter
    362.40 ± 14%    +848.3%       3436 ± 59%  perf-sched.wait_time.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
      1433 ± 86%    +864.9%      13835 ±102%  perf-sched.wait_time.max.ms.__cond_resched.copy_page_range.dup_mmap.dup_mm.constprop
      1520 ± 41%    +969.1%      16259 ± 90%  perf-sched.wait_time.max.ms.__cond_resched.copy_pte_range.copy_p4d_range.copy_page_range.dup_mmap
      6.37 ± 99%   +1535.9%     104.18 ± 31%  perf-sched.wait_time.max.ms.__cond_resched.down_read.__mm_populate.vm_mmap_pgoff.do_syscall_64
    104.84 ± 33%    +577.7%     710.49 ±173%  perf-sched.wait_time.max.ms.__cond_resched.down_read.acct_collect.do_exit.do_group_exit
      3.60 ±177%   +2595.0%      97.03 ± 42%  perf-sched.wait_time.max.ms.__cond_resched.down_write.__mmap_new_vma.__mmap_region.do_mmap
      1546 ± 50%    +953.2%      16284 ±100%  perf-sched.wait_time.max.ms.__cond_resched.down_write.dup_mmap.dup_mm.constprop
     73.44 ± 48%    +113.6%     156.88 ± 21%  perf-sched.wait_time.max.ms.__cond_resched.down_write.unlink_file_vma_batch_final.free_pgtables.exit_mmap
      5.75 ±183%   +1348.7%      83.33 ± 34%  perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_link_file.__mmap_new_vma.__mmap_region
     92.73 ± 29%     +71.5%     159.00 ± 24%  perf-sched.wait_time.max.ms.__cond_resched.down_write.vms_gather_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
      3.00 ± 54%    +545.4%      19.38 ± 52%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      1.00 ±143%   +5554.3%      56.59 ± 42%  perf-sched.wait_time.max.ms.__cond_resched.dput.step_into.link_path_walk.part
      0.27 ±210%  +13877.5%      38.00 ± 86%  perf-sched.wait_time.max.ms.__cond_resched.dput.terminate_walk.path_openat.do_filp_open
      1935 ± 67%     -87.7%     238.19 ±173%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
      6.59 ±123%   +1558.4%     109.35 ± 30%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_lru_noprof.__d_alloc.d_alloc_pseudo.alloc_file_pseudo
     44.21 ± 76%    +269.4%     163.33 ± 33%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.dup_task_struct.copy_process.kernel_clone
     26.78 ±122%    +247.2%      93.00 ± 31%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.alloc_file_pseudo.__shmem_file_setup
      0.89 ±150%   +4891.0%      44.44 ± 64%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.alloc_empty_file.path_openat.do_filp_open
      0.75 ±223%   +2718.3%      21.27 ±180%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.getname_flags.part.0
      4.17 ±127%   +2067.1%      90.36 ± 33%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__mmap_new_vma
    228.57 ± 45%   +5935.1%      13794 ±106%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.prepare_creds.copy_creds.copy_process
     13.81 ± 36%    +738.7%     115.84 ± 31%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      0.49 ±137%  +10424.4%      51.41 ± 60%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vms_gather_munmap_vmas
    224.58 ± 18%    +604.8%       1582 ±108%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.futex_exit_release.exit_mm_release.exit_mm
      4.15 ±223%   +2488.3%     107.50 ± 30%  perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_event_exit_task.do_exit.do_group_exit
    428.93 ± 16%    +679.2%       3342 ± 55%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
     40.64 ± 80%    +240.6%     138.43 ± 54%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_unacct_blocks.shmem_undo_range.shmem_evict_inode.evict
    459.95 ± 14%    +764.9%       3978 ± 59%  perf-sched.wait_time.max.ms.__cond_resched.shmem_undo_range.shmem_evict_inode.evict.__dentry_kill
    914.95 ± 12%   +1849.7%      17838 ± 86%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     91.90 ± 34%    +659.5%     697.98 ±177%  perf-sched.wait_time.max.ms.__cond_resched.switch_task_namespaces.do_exit.do_group_exit.__x64_sys_exit_group
     24.81 ±116%    +384.4%     120.21 ± 37%  perf-sched.wait_time.max.ms.__cond_resched.task_work_run.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
    270.14 ± 19%   +2674.2%       7494 ±140%  perf-sched.wait_time.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
     56.92 ± 47%    +144.0%     138.85 ± 30%  perf-sched.wait_time.max.ms.__cond_resched.unmap_page_range.unmap_vmas.exit_mmap.__mmput
    549.17 ± 71%    +379.4%       2632 ± 89%  perf-sched.wait_time.max.ms.__cond_resched.wp_page_copy.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
    319.32 ± 23%    +361.3%       1472 ± 62%  perf-sched.wait_time.max.ms.__cond_resched.zap_pmd_range.isra.0.unmap_page_range
    481.60 ± 52%    +448.7%       2642 ± 55%  perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
    321.23 ± 19%    +594.3%       2230 ± 74%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      2179 ± 38%    +675.4%      16899 ± 97%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    317.08 ± 29%    +599.1%       2216 ± 85%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
    451.54 ± 16%   +1653.3%       7916 ±102%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    277.33 ± 11%    +804.4%       2508 ± 56%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      1523 ± 12%   +1227.5%      20227 ± 69%  perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1025         +1338.8%      14757 ±118%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
    645.52 ± 69%    +473.8%       3703 ±157%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.dup_mmap
     17.73 ±112%    +561.1%     117.22 ± 58%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.unlink_file_vma_batch_final
    242.29 ± 40%   +4359.2%      10804 ±131%  perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     54.15 ± 66%  +14372.4%       7836 ±106%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1567 ±  8%   +1116.1%      19067 ± 84%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1711 ± 54%    +841.3%      16107 ± 85%  perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     32.68 ± 96%    +147.9%      81.01 ± 32%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      2200 ± 22%    +728.3%      18226 ± 83%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm



***************************************************************************************************
lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
  gcc-12/performance/socket/4/x86_64-rhel-9.4/threads/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/hackbench

commit: 
  8c57b687e8 ("mm, bpf: Introduce free_pages_nolock()")
  01d37228d3 ("memcg: Use trylock to access memcg stock_lock.")

8c57b687e8331eb8 01d37228d331047a0bbbd1026ce 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   2931589 ± 14%     -29.8%    2059056 ± 12%  cpuidle..usage
    274766 ±  4%     -17.8%     225885 ± 13%  numa-meminfo.node0.SUnreclaim
    200.50           +26.2%     253.04 ±  2%  uptime.boot
   1992153          +114.3%    4268815 ±  4%  vmstat.system.cs
      1.88 ± 16%      -0.5        1.36 ±  8%  mpstat.cpu.all.idle%
      0.02 ±  3%      +0.0        0.02 ±  4%  mpstat.cpu.all.soft%
    128556           +31.8%     169412        meminfo.AnonHugePages
    485113           -15.1%     411686        meminfo.SUnreclaim
    614406           -12.0%     540564        meminfo.Slab
    538737 ± 13%    +114.5%    1155509 ± 13%  numa-numastat.node0.local_node
    615940 ± 15%     +95.7%    1205124 ± 11%  numa-numastat.node0.numa_hit
   1008187 ± 19%     +45.5%    1466818 ±  5%  numa-numastat.node1.local_node
   1063633 ± 18%     +45.7%    1549639 ±  4%  numa-numastat.node1.numa_hit
     68502 ±  4%     -17.6%      56417 ± 14%  numa-vmstat.node0.nr_slab_unreclaimable
    615163 ± 15%     +95.9%    1204843 ± 11%  numa-vmstat.node0.numa_hit
    537960 ± 13%    +114.7%    1155228 ± 13%  numa-vmstat.node0.numa_local
   1062574 ± 18%     +45.8%    1548926 ±  4%  numa-vmstat.node1.numa_hit
   1007129 ± 19%     +45.6%    1466105 ±  5%  numa-vmstat.node1.numa_local
     40153 ± 27%     -83.8%       6498 ± 24%  perf-c2c.DRAM.local
      5474 ± 28%     -45.5%       2981 ± 20%  perf-c2c.DRAM.remote
     73336 ± 26%     -74.2%      18885 ± 20%  perf-c2c.HITM.local
      1539 ± 39%     -42.5%     884.83 ± 26%  perf-c2c.HITM.remote
     74875 ± 26%     -73.6%      19769 ± 21%  perf-c2c.HITM.total
    121167           -15.0%     102975        proc-vmstat.nr_slab_unreclaimable
   1682677 ±  9%     +63.9%    2757090 ±  5%  proc-vmstat.numa_hit
   1550029 ± 10%     +69.3%    2624651 ±  5%  proc-vmstat.numa_local
   3252894 ±  7%    +209.5%   10068916 ±  7%  proc-vmstat.pgalloc_normal
   2648888 ±  9%    +262.9%    9612330 ±  8%  proc-vmstat.pgfree
    415614           -26.6%     304910        hackbench.throughput
    405261           -25.9%     300194        hackbench.throughput_avg
    415614           -26.6%     304910        hackbench.throughput_best
    386952           -24.9%     290681        hackbench.throughput_worst
    149.14           +34.8%     201.02        hackbench.time.elapsed_time
    149.14           +34.8%     201.02        hackbench.time.elapsed_time.max
  58196111 ±  6%    +286.5%  2.249e+08 ±  7%  hackbench.time.involuntary_context_switches
    134003 ±  5%     -16.3%     112130        hackbench.time.minor_page_faults
     17596           +35.8%      23894        hackbench.time.system_time
      1136           +28.7%       1463        hackbench.time.user_time
 2.372e+08          +167.5%  6.346e+08 ±  3%  hackbench.time.voluntary_context_switches
      1.42           -34.8%       0.92 ± 13%  perf-stat.i.MPKI
 4.477e+10           -17.2%  3.707e+10        perf-stat.i.branch-instructions
      0.41            +0.1        0.50        perf-stat.i.branch-miss-rate%
 1.744e+08            +3.0%  1.796e+08        perf-stat.i.branch-misses
     23.68            -8.5       15.20 ±  7%  perf-stat.i.cache-miss-rate%
 3.098e+08           -46.4%  1.661e+08 ± 11%  perf-stat.i.cache-misses
 1.318e+09           -16.2%  1.105e+09 ±  4%  perf-stat.i.cache-references
   1972433          +116.9%    4278376 ±  4%  perf-stat.i.context-switches
      1.48           +22.1%       1.81        perf-stat.i.cpi
 3.239e+11            +1.3%  3.283e+11        perf-stat.i.cpu-cycles
     47350 ± 13%     +65.3%      78248 ± 19%  perf-stat.i.cpu-migrations
      1064           +91.4%       2037 ± 12%  perf-stat.i.cycles-between-cache-misses
 2.186e+11           -16.9%  1.816e+11        perf-stat.i.instructions
      0.68           -18.0%       0.56        perf-stat.i.ipc
     15.69          +115.3%      33.79 ±  3%  perf-stat.i.metric.K/sec
      4400 ±  6%     -18.0%       3607        perf-stat.i.minor-faults
      4400 ±  6%     -18.0%       3607        perf-stat.i.page-faults
      1.42           -35.3%       0.92 ± 13%  perf-stat.overall.MPKI
      0.39            +0.1        0.48        perf-stat.overall.branch-miss-rate%
     23.53            -8.5       15.00 ±  7%  perf-stat.overall.cache-miss-rate%
      1.48           +21.9%       1.81        perf-stat.overall.cpi
      1046           +91.4%       2004 ± 11%  perf-stat.overall.cycles-between-cache-misses
      0.67           -17.9%       0.55        perf-stat.overall.ipc
 4.448e+10           -17.0%  3.691e+10        perf-stat.ps.branch-instructions
  1.73e+08            +3.2%  1.786e+08        perf-stat.ps.branch-misses
 3.075e+08           -46.3%  1.653e+08 ± 11%  perf-stat.ps.cache-misses
 1.307e+09           -15.9%  1.099e+09 ±  4%  perf-stat.ps.cache-references
   1958509          +116.8%    4245688 ±  4%  perf-stat.ps.context-switches
 3.219e+11            +1.5%  3.266e+11        perf-stat.ps.cpu-cycles
     46201 ± 13%     +68.6%      77874 ± 19%  perf-stat.ps.cpu-migrations
 2.172e+11           -16.7%  1.808e+11        perf-stat.ps.instructions
      4287 ±  6%     -17.1%       3552        perf-stat.ps.minor-faults
      4287 ±  6%     -17.1%       3552        perf-stat.ps.page-faults
 3.263e+13           +12.0%  3.653e+13        perf-stat.total.instructions
   7909525 ±  2%     +52.6%   12071802        sched_debug.cfs_rq:/.avg_vruntime.avg
  10481951 ±  9%     +61.8%   16959043 ± 11%  sched_debug.cfs_rq:/.avg_vruntime.max
   7141207           +52.0%   10853360 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.min
     22.46           +13.0%      25.38        sched_debug.cfs_rq:/.h_nr_queued.avg
      5.93 ±  5%     +19.3%       7.08 ±  3%  sched_debug.cfs_rq:/.h_nr_queued.stddev
     22.20           +14.1%      25.33        sched_debug.cfs_rq:/.h_nr_runnable.avg
      6.03 ±  5%     +17.9%       7.11 ±  3%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
    355.83           -22.8%     274.75        sched_debug.cfs_rq:/.load_avg.max
   7909525 ±  2%     +52.6%   12071802        sched_debug.cfs_rq:/.min_vruntime.avg
  10481951 ±  9%     +61.8%   16959043 ± 11%  sched_debug.cfs_rq:/.min_vruntime.max
   7141207           +52.0%   10853360 ±  2%  sched_debug.cfs_rq:/.min_vruntime.min
      0.69           +11.8%       0.78        sched_debug.cfs_rq:/.nr_queued.avg
      0.44 ± 35%     +59.4%       0.71 ± 13%  sched_debug.cfs_rq:/.nr_queued.min
      0.12 ± 17%     -28.0%       0.08 ± 14%  sched_debug.cfs_rq:/.nr_queued.stddev
    341.39           -25.0%     256.00        sched_debug.cfs_rq:/.removed.load_avg.max
    174.00           -24.7%     131.00 ±  2%  sched_debug.cfs_rq:/.removed.runnable_avg.max
    174.00           -24.7%     131.00 ±  2%  sched_debug.cfs_rq:/.removed.util_avg.max
    198071 ±125%    +348.4%     888081 ± 62%  sched_debug.cfs_rq:/.runnable_avg.avg
   1977052 ±141%    +219.2%    6310648 ± 51%  sched_debug.cfs_rq:/.runnable_avg.stddev
      1871           +20.4%       2253        sched_debug.cfs_rq:/.util_est.avg
    577483 ±  2%     -12.5%     505023 ±  8%  sched_debug.cpu.avg_idle.avg
    113708           +27.4%     144846 ±  2%  sched_debug.cpu.clock.avg
    115037           +26.7%     145757 ±  2%  sched_debug.cpu.clock.max
    112235           +28.2%     143890 ±  2%  sched_debug.cpu.clock.min
    113310           +27.3%     144279 ±  2%  sched_debug.cpu.clock_task.avg
    114742           +26.6%     145320 ±  2%  sched_debug.cpu.clock_task.max
    103500           +30.2%     134733 ±  2%  sched_debug.cpu.clock_task.min
     12338           +15.5%      14254        sched_debug.cpu.curr->pid.avg
     15248           +12.6%      17174        sched_debug.cpu.curr->pid.max
     22.46           +13.0%      25.37        sched_debug.cpu.nr_running.avg
      5.94 ±  5%     +19.1%       7.07 ±  3%  sched_debug.cpu.nr_running.stddev
    944460 ±  2%    +212.8%    2954732 ±  3%  sched_debug.cpu.nr_switches.avg
   1384690 ±  9%    +173.0%    3780143 ±  3%  sched_debug.cpu.nr_switches.max
    779137 ±  2%    +198.7%    2327029 ±  9%  sched_debug.cpu.nr_switches.min
     94939 ± 18%    +274.7%     355703 ± 41%  sched_debug.cpu.nr_switches.stddev
    112191           +28.2%     143862 ±  2%  sched_debug.cpu_clk
    111036           +28.5%     142706 ±  2%  sched_debug.ktime
    113021           +28.0%     144688 ±  2%  sched_debug.sched_clk
     55.02 ± 69%   +1310.2%     775.94 ± 89%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
     38.12 ± 77%    +943.0%     397.61 ± 59%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
     65.04 ± 17%    +653.0%     489.75 ± 28%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     20.05 ± 24%    +288.4%      77.88 ± 34%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
     10.24 ± 95%     -99.5%       0.05 ±191%  perf-sched.sch_delay.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
     47.60 ± 34%    +735.2%     397.53 ± 42%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
     19.81 ±109%     -75.7%       4.81 ± 60%  perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
     55.61 ± 32%   +1002.8%     613.30 ± 20%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
      0.37 ±217%  +2.4e+05%     895.65 ±222%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     27.14 ± 39%    +347.6%     121.45 ± 42%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     66.58 ± 18%   +1020.4%     745.92 ± 18%  perf-sched.sch_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
      5.48 ± 17%    +213.6%      17.19 ± 16%  perf-sched.sch_delay.avg.ms.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
      8.61 ± 22%    +385.3%      41.78 ± 10%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    330.55 ±116%   +2270.5%       7835 ± 55%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
      5574 ± 16%    +211.6%      17370 ±  8%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     17.26 ± 88%     -99.7%       0.05 ±191%  perf-sched.sch_delay.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      5162 ± 15%    +236.2%      17354 ± 12%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      4727 ± 24%    +239.1%      16028 ± 13%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    209.42 ±202%   +3332.2%       7187 ±117%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
      1343 ± 39%    +152.0%       3384 ± 42%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      5830 ± 16%    +212.2%      18205 ±  8%  perf-sched.sch_delay.max.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
      5836 ± 15%    +202.9%      17679 ±  8%  perf-sched.sch_delay.max.ms.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
      5816 ± 15%    +212.2%      18162 ±  8%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     10.88 ± 19%    +232.3%      36.16 ± 11%  perf-sched.total_sch_delay.average.ms
      5942 ± 13%    +206.8%      18228 ±  8%  perf-sched.total_sch_delay.max.ms
     27.55 ± 19%    +232.7%      91.65 ± 12%  perf-sched.total_wait_and_delay.average.ms
     11902 ± 13%    +206.3%      36457 ±  8%  perf-sched.total_wait_and_delay.max.ms
     16.67 ± 19%    +232.9%      55.49 ± 12%  perf-sched.total_wait_time.average.ms
      6456 ±  4%    +182.5%      18242 ±  8%  perf-sched.total_wait_time.max.ms
    113.95 ± 69%   +1316.6%       1614 ± 82%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
     79.96 ± 73%   +1082.3%     945.34 ± 57%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
    143.75 ± 16%    +611.5%       1022 ± 27%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     42.41 ± 20%    +275.0%     159.04 ± 32%  perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    101.84 ± 33%    +721.9%     837.03 ± 41%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
    124.93 ± 34%    +929.0%       1285 ± 20%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     74.56 ± 60%    +705.1%     600.29 ± 41%  perf-sched.wait_and_delay.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     76.07 ± 36%    +330.8%     327.70 ± 26%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    146.24 ± 18%    +989.6%       1593 ± 18%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
     16.40 ± 18%    +229.2%      54.00 ± 16%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
    739.69 ± 13%    +254.3%       2621 ± 44%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     19.70 ± 23%    +353.8%      89.38 ±  9%  perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    837.21 ± 24%    +166.9%       2234 ± 13%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     36.50 ± 86%     -96.3%       1.33 ±223%  perf-sched.wait_and_delay.count.__cond_resched.__dentry_kill.shrink_dentry_list.shrink_dcache_parent.d_invalidate
    132.83 ± 58%     -60.0%      53.17 ± 27%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    101624 ± 38%     -81.9%      18404 ± 17%  perf-sched.wait_and_delay.count.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
    733.10 ± 64%     -84.4%     114.45 ±223%  perf-sched.wait_and_delay.max.ms.__cond_resched.__dentry_kill.shrink_dentry_list.shrink_dcache_parent.d_invalidate
    661.77 ±116%   +2433.4%      16765 ± 49%  perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
     11149 ± 16%    +211.6%      34740 ±  8%  perf-sched.wait_and_delay.max.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     10324 ± 15%    +236.2%      34709 ± 12%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      9455 ± 24%    +239.0%      32056 ± 13%  perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    646.51 ± 59%    +503.5%       3902 ± 68%  perf-sched.wait_and_delay.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      2745 ± 39%    +175.6%       7565 ± 33%  perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     11661 ± 16%    +212.2%      36412 ±  8%  perf-sched.wait_and_delay.max.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
     11674 ± 15%    +203.0%      35373 ±  8%  perf-sched.wait_and_delay.max.ms.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
      6384 ±  4%    +184.5%      18161 ±  7%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     11633 ± 15%    +212.2%      36324 ±  8%  perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      6098 ±  9%    +192.4%      17831 ±  7%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     58.92 ± 68%   +1322.7%     838.26 ± 76%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
     41.83 ± 69%   +1209.3%     547.73 ± 62%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
     78.71 ± 15%    +577.2%     533.02 ± 26%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     22.36 ± 17%    +263.0%      81.16 ± 29%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      8.84 ±109%     -99.5%       0.05 ±191%  perf-sched.wait_time.avg.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
     54.24 ± 33%    +710.2%     439.49 ± 40%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
     69.31 ± 35%    +869.9%     672.25 ± 21%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     51.67 ± 62%    +784.1%     456.81 ± 40%  perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
     48.93 ± 35%    +321.5%     206.25 ± 19%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     79.66 ± 18%    +963.8%     847.45 ± 18%  perf-sched.wait_time.avg.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
     10.92 ± 19%    +237.0%      36.81 ± 16%  perf-sched.wait_time.avg.ms.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
    739.68 ± 13%    +254.3%       2621 ± 44%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     11.09 ± 23%    +329.3%      47.60 ±  8%  perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    833.25 ± 24%    +136.5%       1970 ±  7%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    331.23 ±116%   +2838.5%       9733 ± 52%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
      5575 ± 16%    +212.8%      17440 ±  8%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     11.46 ± 79%     -99.6%       0.05 ±191%  perf-sched.wait_time.max.ms.__cond_resched.dput.__fput.__x64_sys_close.do_syscall_64
      5162 ± 15%    +236.2%      17354 ± 12%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      4850 ± 25%    +230.4%      16028 ± 13%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
    216.39 ±194%   +3253.6%       7256 ±115%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
    386.99 ± 62%    +582.5%       2641 ± 54%  perf-sched.wait_time.max.ms.rcu_gp_kthread.kthread.ret_from_fork.ret_from_fork_asm
      1402 ± 39%    +198.2%       4181 ± 27%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      5830 ± 16%    +212.3%      18206 ±  8%  perf-sched.wait_time.max.ms.schedule_timeout.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
      5840 ± 15%    +204.1%      17759 ±  8%  perf-sched.wait_time.max.ms.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
      6384 ±  4%    +184.5%      18161 ±  7%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      5818 ± 15%    +212.2%      18162 ±  8%  perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      6098 ±  9%    +192.4%      17831 ±  7%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     55.29 ±  2%     -16.0       39.24        perf-profile.calltrace.cycles-pp.read
     50.80 ±  2%     -14.9       35.91        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
     47.07 ±  3%     -14.9       32.19        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
     50.53 ±  2%     -14.8       35.71        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
     48.63 ±  3%     -14.8       33.86        perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
     45.80 ±  3%     -14.6       31.18        perf-profile.calltrace.cycles-pp.sock_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
     44.98 ±  3%     -14.5       30.52        perf-profile.calltrace.cycles-pp.sock_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64
     44.54 ±  3%     -14.4       30.17        perf-profile.calltrace.cycles-pp.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.vfs_read.ksys_read
     44.16 ±  3%     -14.3       29.87        perf-profile.calltrace.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.vfs_read
     13.28 ± 10%      -6.2        7.10 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
     16.20 ±  8%      -5.9       10.30 ±  2%  perf-profile.calltrace.cycles-pp.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
     12.86 ± 11%      -5.2        7.64 ±  3%  perf-profile.calltrace.cycles-pp.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
     11.84 ± 11%      -4.7        7.19 ±  3%  perf-profile.calltrace.cycles-pp.kfree.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
      7.28 ± 23%      -4.6        2.64 ±  5%  perf-profile.calltrace.cycles-pp.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      7.06 ± 23%      -4.5        2.53 ±  6%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg
      6.92 ± 23%      -4.5        2.45 ±  6%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__put_partials.kmem_cache_free.unix_stream_read_generic
      7.38 ± 21%      -4.1        3.27 ±  8%  perf-profile.calltrace.cycles-pp.__put_partials.kfree.skb_release_data.consume_skb.unix_stream_read_generic
      7.17 ± 21%      -4.0        3.16 ±  8%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__put_partials.kfree.skb_release_data.consume_skb
      7.03 ± 22%      -3.9        3.08 ±  8%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__put_partials.kfree.skb_release_data
      5.59 ± 20%      -3.4        2.22 ±  4%  perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      7.82 ±  2%      -3.4        4.45 ±  2%  perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      7.75 ±  2%      -3.4        4.40 ±  2%  perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      7.60 ±  2%      -3.3        4.28 ±  2%  perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
      4.94 ± 22%      -3.2        1.70 ±  5%  perf-profile.calltrace.cycles-pp.get_partial_node.___slab_alloc.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags
      4.53 ± 23%      -3.0        1.51 ±  5%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.get_partial_node.___slab_alloc.kmem_cache_alloc_node_noprof.__alloc_skb
      4.50 ± 23%      -3.0        1.50 ±  5%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.get_partial_node.___slab_alloc.kmem_cache_alloc_node_noprof
      5.74 ± 19%      -2.9        2.80 ±  7%  perf-profile.calltrace.cycles-pp.___slab_alloc.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
      5.05 ± 21%      -2.9        2.19 ±  7%  perf-profile.calltrace.cycles-pp.get_partial_node.___slab_alloc.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb
      4.64 ± 22%      -2.7        1.99 ±  8%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.get_partial_node.___slab_alloc.__kmalloc_node_track_caller_noprof.kmalloc_reserve
      4.61 ± 22%      -2.6        1.96 ±  8%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.get_partial_node.___slab_alloc.__kmalloc_node_track_caller_noprof
      3.91 ±  3%      -1.8        2.12 ±  2%  perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
      3.03 ±  3%      -1.2        1.79        perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
      2.85 ±  3%      -1.2        1.64        perf-profile.calltrace.cycles-pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
      2.15 ±  4%      -1.0        1.11 ±  2%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
      2.25 ± 14%      -0.7        1.57 ±  3%  perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
      3.24 ±  4%      -0.6        2.59        perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      2.67 ±  3%      -0.6        2.03 ±  2%  perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
      3.10 ±  4%      -0.6        2.48        perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
      1.53 ±  2%      -0.6        0.92 ± 28%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write
      1.52 ±  3%      -0.6        0.92 ± 28%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.read
      2.90 ±  4%      -0.6        2.32        perf-profile.calltrace.cycles-pp.sock_wfree.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic
      2.98 ±  4%      -0.6        2.43 ± 11%  perf-profile.calltrace.cycles-pp.__memcg_slab_free_hook.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      1.97 ±  3%      -0.5        1.48 ±  2%  perf-profile.calltrace.cycles-pp.clear_bhb_loop.write
      1.96 ±  2%      -0.4        1.53 ±  2%  perf-profile.calltrace.cycles-pp.clear_bhb_loop.read
      1.32 ±  5%      -0.4        0.95 ±  2%  perf-profile.calltrace.cycles-pp.__slab_free.kfree.skb_release_data.consume_skb.unix_stream_read_generic
      1.23 ±  3%      -0.3        0.91 ±  2%  perf-profile.calltrace.cycles-pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
      1.11 ±  3%      -0.3        0.84 ±  2%  perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_write_iter.vfs_write
      0.60 ±  3%      +0.1        0.69        perf-profile.calltrace.cycles-pp.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      0.61 ±  5%      +0.1        0.71 ±  2%  perf-profile.calltrace.cycles-pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      1.09 ±  5%      +0.4        1.47 ±  8%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.43 ±100%      +0.5        0.91 ±  5%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.skb_queue_tail.unix_stream_sendmsg.sock_write_iter.vfs_write
      0.29 ±100%      +0.5        0.78 ±  5%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__pick_next_task.__schedule.schedule.schedule_timeout
      0.30 ±100%      +0.5        0.80 ±  4%  perf-profile.calltrace.cycles-pp.__pick_next_task.__schedule.schedule.schedule_timeout.unix_stream_data_wait
      0.00            +0.5        0.51        perf-profile.calltrace.cycles-pp._raw_spin_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      0.08 ±223%      +0.5        0.60        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      0.00            +0.6        0.56 ±  6%  perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule
      0.00            +0.6        0.63 ± 12%  perf-profile.calltrace.cycles-pp.__schedule.schedule.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.7        0.65 ± 12%  perf-profile.calltrace.cycles-pp.schedule.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      0.17 ±141%      +0.7        0.83 ±  4%  perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.enqueue_task.ttwu_do_activate.try_to_wake_up
      0.00            +0.7        0.67 ±  2%  perf-profile.calltrace.cycles-pp.mutex_unlock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      0.78 ± 27%      +0.7        1.45 ±  5%  perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
      0.35 ±100%      +0.7        1.04 ±  4%  perf-profile.calltrace.cycles-pp.dequeue_entities.dequeue_task_fair.try_to_block_task.__schedule.schedule
      0.36 ±100%      +0.7        1.08 ±  3%  perf-profile.calltrace.cycles-pp.dequeue_task_fair.try_to_block_task.__schedule.schedule.schedule_timeout
      0.38 ±100%      +0.7        1.12 ±  3%  perf-profile.calltrace.cycles-pp.try_to_block_task.__schedule.schedule.schedule_timeout.unix_stream_data_wait
      0.39 ±100%      +0.8        1.15 ±  4%  perf-profile.calltrace.cycles-pp.enqueue_task_fair.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
      0.40 ±100%      +0.8        1.18 ±  4%  perf-profile.calltrace.cycles-pp.enqueue_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
      1.68 ± 46%      +1.1        2.81 ±  8%  perf-profile.calltrace.cycles-pp.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter.vfs_write
      1.44 ± 42%      +1.2        2.63 ±  7%  perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable
      1.51 ± 42%      +1.2        2.70 ±  7%  perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg.sock_write_iter
      1.47 ± 42%      +1.2        2.67 ±  7%  perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_sync_key.sock_def_readable.unix_stream_sendmsg
      1.59 ± 29%      +1.3        2.89 ±  3%  perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic
      2.29 ± 45%      +1.3        3.61 ±  7%  perf-profile.calltrace.cycles-pp.sock_def_readable.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
      1.63 ± 29%      +1.3        2.96 ±  3%  perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
      1.68 ± 28%      +1.4        3.02 ±  3%  perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg
      1.96 ± 29%      +1.7        3.65 ±  3%  perf-profile.calltrace.cycles-pp.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter
      0.00            +4.8        4.85 ±  2%  perf-profile.calltrace.cycles-pp.page_counter_try_charge.try_charge_memcg.obj_cgroup_charge.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof
      8.90 ± 12%      +5.0       13.87 ±  2%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
      2.16 ±  3%      +8.6       10.80 ±  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
      0.00            +8.7        8.70 ±  2%  perf-profile.calltrace.cycles-pp.try_charge_memcg.obj_cgroup_charge.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb
      0.44 ± 44%      +8.8        9.20 ±  2%  perf-profile.calltrace.cycles-pp.obj_cgroup_charge.__memcg_slab_post_alloc_hook.kmem_cache_alloc_node_noprof.__alloc_skb.alloc_skb_with_frags
      0.00            +9.5        9.52 ±  2%  perf-profile.calltrace.cycles-pp.page_counter_try_charge.try_charge_memcg.obj_cgroup_charge.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof
      9.94 ± 10%     +13.6       23.50        perf-profile.calltrace.cycles-pp.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
      9.38 ± 11%     +13.8       23.19        perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
     43.95           +15.4       59.35        perf-profile.calltrace.cycles-pp.write
     37.71           +16.6       54.29        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     35.73           +16.6       52.34        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
     39.75           +16.7       56.48        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
     39.48           +16.8       56.28        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      2.49 ±  3%     +16.9       19.37 ±  2%  perf-profile.calltrace.cycles-pp.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
     34.47           +16.9       51.41        perf-profile.calltrace.cycles-pp.sock_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.70 ±  2%     +17.1       17.82 ±  2%  perf-profile.calltrace.cycles-pp.obj_cgroup_charge.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve.__alloc_skb
      0.00           +17.1       17.13 ±  2%  perf-profile.calltrace.cycles-pp.try_charge_memcg.obj_cgroup_charge.__memcg_slab_post_alloc_hook.__kmalloc_node_track_caller_noprof.kmalloc_reserve
     33.13           +17.3       50.43        perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write.do_syscall_64
     23.11 ±  7%     +17.7       40.82        perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_write.ksys_write
     20.87 ±  9%     +17.9       38.78        perf-profile.calltrace.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter.vfs_write
     20.58 ±  9%     +18.0       38.56        perf-profile.calltrace.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_write_iter
     55.43 ±  2%     -15.6       39.81        perf-profile.children.cycles-pp.read
     47.13 ±  3%     -14.9       32.23        perf-profile.children.cycles-pp.vfs_read
     48.70 ±  3%     -14.8       33.92        perf-profile.children.cycles-pp.ksys_read
     45.85 ±  3%     -14.6       31.21        perf-profile.children.cycles-pp.sock_read_iter
     45.04 ±  3%     -14.5       30.57        perf-profile.children.cycles-pp.sock_recvmsg
     44.58 ±  3%     -14.4       30.20        perf-profile.children.cycles-pp.unix_stream_recvmsg
     44.36 ±  3%     -14.3       30.02        perf-profile.children.cycles-pp.unix_stream_read_generic
     23.88 ± 19%     -14.1        9.79 ±  5%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     25.27 ± 18%     -13.7       11.62 ±  4%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     14.68 ± 21%      -8.8        5.92 ±  6%  perf-profile.children.cycles-pp.__put_partials
     11.33 ± 19%      -6.3        5.03 ±  5%  perf-profile.children.cycles-pp.___slab_alloc
     13.34 ± 10%      -6.2        7.14 ±  2%  perf-profile.children.cycles-pp.kmem_cache_free
     16.27 ±  8%      -5.9       10.35 ±  2%  perf-profile.children.cycles-pp.consume_skb
     10.21 ± 21%      -5.9        4.35 ±  6%  perf-profile.children.cycles-pp.get_partial_node
     12.90 ± 11%      -5.2        7.66 ±  3%  perf-profile.children.cycles-pp.skb_release_data
     11.90 ± 11%      -4.7        7.23 ±  3%  perf-profile.children.cycles-pp.kfree
      7.86 ±  2%      -3.4        4.48 ±  2%  perf-profile.children.cycles-pp.unix_stream_read_actor
      7.78 ±  2%      -3.4        4.42 ±  2%  perf-profile.children.cycles-pp.skb_copy_datagram_iter
      7.65 ±  2%      -3.3        4.32 ±  2%  perf-profile.children.cycles-pp.__skb_datagram_iter
      3.93 ±  3%      -1.8        2.14 ±  2%  perf-profile.children.cycles-pp._copy_to_iter
      4.23 ±  3%      -1.5        2.70        perf-profile.children.cycles-pp.__check_object_size
      3.07 ±  3%      -1.3        1.82        perf-profile.children.cycles-pp.simple_copy_to_iter
      2.82 ±  4%      -1.2        1.62        perf-profile.children.cycles-pp.check_heap_object
      3.96 ±  2%      -0.9        3.04 ±  2%  perf-profile.children.cycles-pp.clear_bhb_loop
      2.75 ±  3%      -0.7        2.08 ±  2%  perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
      3.28 ±  4%      -0.7        2.62        perf-profile.children.cycles-pp.skb_release_head_state
      3.16 ±  4%      -0.6        2.53        perf-profile.children.cycles-pp.unix_destruct_scm
      2.93 ±  4%      -0.6        2.34        perf-profile.children.cycles-pp.sock_wfree
      2.40 ±  5%      -0.5        1.94        perf-profile.children.cycles-pp.__slab_free
      1.74 ±  2%      -0.4        1.35 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      1.34 ±  2%      -0.3        1.01 ±  2%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.25 ±  3%      -0.3        0.93 ±  2%  perf-profile.children.cycles-pp._copy_from_iter
      1.08 ±  3%      -0.3        0.79 ±  3%  perf-profile.children.cycles-pp.rw_verify_area
      0.62 ±  3%      -0.3        0.36        perf-profile.children.cycles-pp.__build_skb_around
      0.73 ±  3%      -0.2        0.55 ±  2%  perf-profile.children.cycles-pp.__check_heap_object
      0.63 ±  3%      -0.1        0.48        perf-profile.children.cycles-pp.__cond_resched
      0.40 ±  5%      -0.1        0.28 ±  5%  perf-profile.children.cycles-pp.fsnotify_pre_content
      0.20 ±  6%      -0.1        0.10 ±  4%  perf-profile.children.cycles-pp.put_cpu_partial
      0.41 ±  3%      -0.1        0.31        perf-profile.children.cycles-pp.x64_sys_call
      0.53 ±  3%      -0.1        0.43 ±  4%  perf-profile.children.cycles-pp.__virt_addr_valid
      0.61 ±  4%      -0.1        0.52        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.27 ±  2%      -0.1        0.20 ±  2%  perf-profile.children.cycles-pp.rcu_all_qs
      0.28 ±  3%      -0.1        0.22 ±  2%  perf-profile.children.cycles-pp.__scm_recv_common
      0.23 ±  3%      -0.1        0.17 ±  2%  perf-profile.children.cycles-pp.security_file_permission
      0.30 ±  3%      -0.0        0.25 ±  3%  perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      0.21 ±  5%      -0.0        0.16 ±  6%  perf-profile.children.cycles-pp.kmalloc_size_roundup
      0.16 ±  5%      -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.maybe_add_creds
      0.16 ±  4%      -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.is_vmalloc_addr
      0.17 ±  4%      -0.0        0.13        perf-profile.children.cycles-pp.put_pid
      0.15 ±  2%      -0.0        0.11        perf-profile.children.cycles-pp.security_socket_recvmsg
      0.14 ±  3%      -0.0        0.10        perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
      0.21 ±  5%      -0.0        0.17 ±  4%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
      0.13 ±  3%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.security_socket_sendmsg
      0.18 ±  2%      -0.0        0.15 ±  3%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.14 ±  3%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.manage_oob
      0.17 ±  4%      -0.0        0.14 ±  2%  perf-profile.children.cycles-pp.check_stack_object
      0.11 ±  3%      -0.0        0.08        perf-profile.children.cycles-pp.wait_for_unix_gc
      0.17 ±  3%      -0.0        0.14 ±  2%  perf-profile.children.cycles-pp.unix_scm_to_skb
      0.13 ±  5%      -0.0        0.10 ±  7%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.09 ±  5%      -0.0        0.07        perf-profile.children.cycles-pp.skb_put
      0.07            -0.0        0.05        perf-profile.children.cycles-pp.__x64_sys_read
      0.07            -0.0        0.05        perf-profile.children.cycles-pp.__x64_sys_write
      0.09 ±  5%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.skb_free_head
      0.07 ±  6%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.kfree_skbmem
      0.08 ± 21%      +0.0        0.10 ±  7%  perf-profile.children.cycles-pp.__get_user_8
      0.07 ± 27%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.avg_vruntime
      0.09 ± 23%      +0.0        0.13 ±  7%  perf-profile.children.cycles-pp.rseq_get_rseq_cs
      0.08 ± 16%      +0.0        0.12 ±  6%  perf-profile.children.cycles-pp.os_xsave
      0.07 ± 25%      +0.0        0.11 ±  6%  perf-profile.children.cycles-pp.sched_clock
      0.08 ± 35%      +0.0        0.12 ±  3%  perf-profile.children.cycles-pp.place_entity
      0.02 ± 99%      +0.0        0.07 ±  8%  perf-profile.children.cycles-pp.__put_user_8
      0.02 ± 99%      +0.0        0.07 ±  8%  perf-profile.children.cycles-pp.__wrgsbase_inactive
      0.06 ± 19%      +0.0        0.11 ± 10%  perf-profile.children.cycles-pp.___perf_sw_event
      0.08 ± 29%      +0.0        0.12 ±  7%  perf-profile.children.cycles-pp.sched_clock_cpu
      0.25 ±  9%      +0.0        0.30 ±  5%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
      0.01 ±223%      +0.0        0.06 ± 11%  perf-profile.children.cycles-pp.ktime_get
      0.04 ±100%      +0.1        0.09 ±  5%  perf-profile.children.cycles-pp.update_entity_lag
      0.01 ±223%      +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.update_curr_dl_se
      0.01 ±223%      +0.1        0.06 ± 11%  perf-profile.children.cycles-pp.clockevents_program_event
      0.04 ±100%      +0.1        0.10 ±  5%  perf-profile.children.cycles-pp.native_sched_clock
      0.09 ± 31%      +0.1        0.14 ±  8%  perf-profile.children.cycles-pp.update_rq_clock
      0.00            +0.1        0.06 ±  9%  perf-profile.children.cycles-pp.__rb_insert_augmented
      0.06 ± 14%      +0.1        0.13 ±  9%  perf-profile.children.cycles-pp.vruntime_eligible
      0.08 ± 26%      +0.1        0.14 ± 10%  perf-profile.children.cycles-pp.put_prev_entity
      0.04 ±101%      +0.1        0.11 ±  6%  perf-profile.children.cycles-pp.finish_wait
      0.00            +0.1        0.07 ± 12%  perf-profile.children.cycles-pp.charge_memcg
      0.10 ± 18%      +0.1        0.17 ±  9%  perf-profile.children.cycles-pp.check_preempt_wakeup_fair
      0.04 ± 72%      +0.1        0.12 ±  4%  perf-profile.children.cycles-pp.rseq_update_cpu_node_id
      0.00            +0.1        0.08 ± 14%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.64 ±  3%      +0.1        0.72        perf-profile.children.cycles-pp.skb_unlink
      0.00            +0.1        0.08 ± 10%  perf-profile.children.cycles-pp.set_next_buddy
      0.00            +0.1        0.08 ± 13%  perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
      0.00            +0.1        0.09 ± 15%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.00            +0.1        0.09 ± 15%  perf-profile.children.cycles-pp.shmem_write_begin
      0.65 ±  5%      +0.1        0.74        perf-profile.children.cycles-pp.mutex_lock
      0.14 ± 31%      +0.1        0.22 ±  7%  perf-profile.children.cycles-pp.wakeup_preempt
      0.05            +0.1        0.14 ± 10%  perf-profile.children.cycles-pp.cmd_record
      0.17 ± 22%      +0.1        0.26 ±  5%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.05 ±  7%      +0.1        0.15 ±  9%  perf-profile.children.cycles-pp.handle_internal_command
      0.05 ±  7%      +0.1        0.15 ±  9%  perf-profile.children.cycles-pp.main
      0.14 ± 22%      +0.1        0.24 ±  6%  perf-profile.children.cycles-pp.rseq_ip_fixup
      0.05 ±  7%      +0.1        0.15 ±  9%  perf-profile.children.cycles-pp.run_builtin
      0.13 ± 25%      +0.1        0.23 ±  9%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.31 ± 13%      +0.1        0.42 ±  5%  perf-profile.children.cycles-pp.switch_fpu_return
      0.00            +0.1        0.11 ± 11%  perf-profile.children.cycles-pp.generic_perform_write
      0.19 ± 11%      +0.1        0.30 ±  7%  perf-profile.children.cycles-pp.__dequeue_entity
      0.00            +0.1        0.11 ± 13%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.05 ±100%      +0.1        0.16 ±  5%  perf-profile.children.cycles-pp.update_rq_clock_task
      0.00            +0.1        0.12 ± 10%  perf-profile.children.cycles-pp.record__pushfn
      0.00            +0.1        0.12 ± 12%  perf-profile.children.cycles-pp.writen
      0.00            +0.1        0.14 ±  8%  perf-profile.children.cycles-pp.perf_mmap__push
      0.18 ± 15%      +0.1        0.32 ±  6%  perf-profile.children.cycles-pp.__enqueue_entity
      0.16 ± 22%      +0.1        0.30 ±  4%  perf-profile.children.cycles-pp.prepare_task_switch
      0.00            +0.1        0.14 ±  9%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.14 ± 21%      +0.1        0.27 ±  5%  perf-profile.children.cycles-pp.__switch_to
      0.21 ± 23%      +0.2        0.37 ±  6%  perf-profile.children.cycles-pp.__rseq_handle_notify_resume
      0.35 ± 19%      +0.2        0.52 ±  9%  perf-profile.children.cycles-pp.__switch_to_asm
      0.29 ± 15%      +0.2        0.47 ±  6%  perf-profile.children.cycles-pp.set_next_entity
      0.00            +0.2        0.18 ±  4%  perf-profile.children.cycles-pp.page_counter_cancel
      0.00            +0.2        0.19 ±  4%  perf-profile.children.cycles-pp.page_counter_uncharge
      0.00            +0.2        0.19 ±  3%  perf-profile.children.cycles-pp.drain_stock
      0.15 ± 16%      +0.2        0.36 ±  8%  perf-profile.children.cycles-pp.pick_eevdf
      0.22 ± 25%      +0.2        0.45 ±  7%  perf-profile.children.cycles-pp.pick_task_fair
      0.34 ± 31%      +0.2        0.59 ±  6%  perf-profile.children.cycles-pp.dequeue_entity
      0.21 ± 44%      +0.3        0.47 ± 11%  perf-profile.children.cycles-pp.get_any_partial
      0.47 ± 28%      +0.3        0.74 ±  7%  perf-profile.children.cycles-pp.update_load_avg
      0.00            +0.3        0.29 ±  3%  perf-profile.children.cycles-pp.refill_stock
      0.39 ±  6%      +0.3        0.68        perf-profile.children.cycles-pp.mutex_unlock
      0.37 ± 34%      +0.3        0.68 ±  8%  perf-profile.children.cycles-pp.update_curr
      0.69 ±  6%      +0.3        1.03        perf-profile.children.cycles-pp.fput
      0.48 ± 25%      +0.4        0.86 ±  4%  perf-profile.children.cycles-pp.enqueue_entity
      0.39 ±  4%      +0.4        0.82 ± 28%  perf-profile.children.cycles-pp.obj_cgroup_uncharge_pages
      2.39 ±  9%      +0.4        2.82 ±  5%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.61 ± 32%      +0.5        1.07 ±  4%  perf-profile.children.cycles-pp.dequeue_entities
      0.72 ± 31%      +0.5        1.18 ±  4%  perf-profile.children.cycles-pp.enqueue_task_fair
      0.76 ± 31%      +0.5        1.22 ±  5%  perf-profile.children.cycles-pp.enqueue_task
      0.72 ± 25%      +0.5        1.19 ±  6%  perf-profile.children.cycles-pp.pick_next_task_fair
      0.74 ± 25%      +0.5        1.21 ±  6%  perf-profile.children.cycles-pp.__pick_next_task
      0.62 ± 31%      +0.5        1.10 ±  4%  perf-profile.children.cycles-pp.dequeue_task_fair
      0.64 ± 29%      +0.5        1.14 ±  4%  perf-profile.children.cycles-pp.try_to_block_task
      0.32 ±  8%      +0.5        0.84 ± 53%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      0.29 ±  6%      +0.5        0.82 ± 62%  perf-profile.children.cycles-pp.__mod_memcg_state
      0.90 ± 30%      +0.6        1.48 ±  5%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.00            +0.7        0.67 ± 11%  perf-profile.children.cycles-pp.propagate_protected_usage
      1.58 ± 41%      +1.1        2.68 ±  7%  perf-profile.children.cycles-pp.try_to_wake_up
      1.65 ± 41%      +1.1        2.74 ±  8%  perf-profile.children.cycles-pp.__wake_up_common
      1.60 ± 41%      +1.1        2.71 ±  7%  perf-profile.children.cycles-pp.autoremove_wake_function
      1.87 ± 28%      +1.2        3.08 ±  3%  perf-profile.children.cycles-pp.schedule_timeout
      2.31 ± 44%      +1.3        3.63 ±  7%  perf-profile.children.cycles-pp.sock_def_readable
      2.11 ± 29%      +1.5        3.65 ±  4%  perf-profile.children.cycles-pp.__schedule
      2.10 ± 28%      +1.6        3.69 ±  4%  perf-profile.children.cycles-pp.schedule
      1.96 ± 29%      +1.7        3.67 ±  3%  perf-profile.children.cycles-pp.unix_stream_data_wait
     90.76            +1.9       92.65        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     90.24            +2.0       92.23        perf-profile.children.cycles-pp.do_syscall_64
      8.98 ± 12%      +5.0       13.93 ±  2%  perf-profile.children.cycles-pp.kmem_cache_alloc_node_noprof
     10.02 ± 10%     +13.5       23.56        perf-profile.children.cycles-pp.kmalloc_reserve
      9.60 ± 10%     +13.6       23.25        perf-profile.children.cycles-pp.__kmalloc_node_track_caller_noprof
      0.00           +14.4       14.44 ±  2%  perf-profile.children.cycles-pp.page_counter_try_charge
     44.12           +15.9       60.04        perf-profile.children.cycles-pp.write
     37.81           +16.6       54.46        perf-profile.children.cycles-pp.ksys_write
     35.82           +16.7       52.50        perf-profile.children.cycles-pp.vfs_write
     34.55           +16.9       51.46        perf-profile.children.cycles-pp.sock_write_iter
     33.40           +17.2       50.62        perf-profile.children.cycles-pp.unix_stream_sendmsg
     23.17 ±  7%     +17.7       40.86        perf-profile.children.cycles-pp.sock_alloc_send_pskb
     20.93 ±  9%     +17.9       38.82        perf-profile.children.cycles-pp.alloc_skb_with_frags
     20.68 ±  9%     +17.9       38.63        perf-profile.children.cycles-pp.__alloc_skb
      4.77 ±  3%     +25.5       30.26 ±  2%  perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
      1.29 ±  3%     +25.8       27.07 ±  2%  perf-profile.children.cycles-pp.obj_cgroup_charge
      0.13 ±  3%     +25.8       25.92 ±  2%  perf-profile.children.cycles-pp.try_charge_memcg
     23.87 ± 19%     -14.1        9.78 ±  5%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      3.89 ±  3%      -1.8        2.12 ±  2%  perf-profile.self.cycles-pp._copy_to_iter
      3.22 ±  6%      -1.3        1.92 ±  2%  perf-profile.self.cycles-pp.__memcg_slab_free_hook
      2.44 ±  3%      -1.1        1.39 ±  2%  perf-profile.self.cycles-pp.unix_stream_read_generic
      2.13 ±  4%      -1.1        1.07        perf-profile.self.cycles-pp.check_heap_object
      3.92 ±  2%      -0.9        3.00 ±  2%  perf-profile.self.cycles-pp.clear_bhb_loop
      1.89 ±  4%      -0.9        1.02        perf-profile.self.cycles-pp.kmem_cache_free
      2.29 ±  3%      -0.6        1.66        perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook
      2.34 ±  3%      -0.6        1.78        perf-profile.self.cycles-pp.sock_wfree
      0.92 ±  6%      -0.5        0.38 ±  3%  perf-profile.self.cycles-pp.skb_release_data
      1.46 ± 16%      -0.5        0.97 ±  5%  perf-profile.self.cycles-pp.unix_stream_sendmsg
      1.11 ±  4%      -0.5        0.63        perf-profile.self.cycles-pp.___slab_alloc
      2.35 ±  5%      -0.4        1.90        perf-profile.self.cycles-pp.__slab_free
      0.76 ± 10%      -0.4        0.38 ±  5%  perf-profile.self.cycles-pp.get_partial_node
      1.28 ±  3%      -0.4        0.92 ±  2%  perf-profile.self.cycles-pp.__kmalloc_node_track_caller_noprof
      1.30 ±  2%      -0.3        0.98 ±  2%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.21 ±  2%      -0.3        0.90 ±  2%  perf-profile.self.cycles-pp._copy_from_iter
      1.10 ±  3%      -0.3        0.81 ±  2%  perf-profile.self.cycles-pp.__alloc_skb
      0.68 ±  3%      -0.3        0.39 ±  2%  perf-profile.self.cycles-pp.__skb_datagram_iter
      1.05 ±  3%      -0.3        0.77 ±  2%  perf-profile.self.cycles-pp.sock_write_iter
      0.99            -0.3        0.72 ±  2%  perf-profile.self.cycles-pp.kmem_cache_alloc_node_noprof
      0.58 ±  3%      -0.3        0.33        perf-profile.self.cycles-pp.__build_skb_around
      0.92 ±  2%      -0.2        0.70 ±  2%  perf-profile.self.cycles-pp.read
      0.92 ±  2%      -0.2        0.70 ±  2%  perf-profile.self.cycles-pp.write
      1.34 ±  3%      -0.2        1.14        perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.39 ±  8%      -0.2        0.21 ±  3%  perf-profile.self.cycles-pp.__put_partials
      0.88 ±  2%      -0.2        0.71 ±  4%  perf-profile.self.cycles-pp.obj_cgroup_charge
      0.69 ±  3%      -0.2        0.52 ±  2%  perf-profile.self.cycles-pp.__check_heap_object
      0.70 ±  6%      -0.2        0.54 ±  2%  perf-profile.self.cycles-pp.vfs_write
      0.80 ±  2%      -0.2        0.64 ±  2%  perf-profile.self.cycles-pp.sock_read_iter
      0.79 ±  3%      -0.1        0.65 ±  2%  perf-profile.self.cycles-pp.do_syscall_64
      0.53 ±  3%      -0.1        0.39 ±  2%  perf-profile.self.cycles-pp.rw_verify_area
      0.57 ±  3%      -0.1        0.44 ±  2%  perf-profile.self.cycles-pp.__check_object_size
      0.77 ±  3%      -0.1        0.64 ±  2%  perf-profile.self.cycles-pp.vfs_read
      0.47 ±  5%      -0.1        0.36 ±  2%  perf-profile.self.cycles-pp.kfree
      0.58 ±  3%      -0.1        0.47        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.34 ±  5%      -0.1        0.24        perf-profile.self.cycles-pp.sock_alloc_send_pskb
      0.51 ±  3%      -0.1        0.41 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.33 ±  6%      -0.1        0.23 ±  6%  perf-profile.self.cycles-pp.fsnotify_pre_content
      0.20 ±  6%      -0.1        0.10 ±  3%  perf-profile.self.cycles-pp.put_cpu_partial
      0.48 ±  3%      -0.1        0.39 ±  4%  perf-profile.self.cycles-pp.__virt_addr_valid
      0.20 ±  3%      -0.1        0.11 ± 18%  perf-profile.self.cycles-pp.obj_cgroup_uncharge_pages
      0.36 ±  3%      -0.1        0.28 ±  2%  perf-profile.self.cycles-pp.x64_sys_call
      0.34 ±  2%      -0.1        0.26 ±  2%  perf-profile.self.cycles-pp.__cond_resched
      0.28 ±  3%      -0.1        0.21 ±  2%  perf-profile.self.cycles-pp.ksys_write
      0.43 ±  3%      -0.1        0.35 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.24 ±  3%      -0.1        0.18 ±  5%  perf-profile.self.cycles-pp.kmalloc_reserve
      0.27 ±  3%      -0.1        0.20 ±  2%  perf-profile.self.cycles-pp.alloc_skb_with_frags
      0.30 ±  3%      -0.1        0.24 ±  3%  perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
      0.33 ±  3%      -0.1        0.27 ±  3%  perf-profile.self.cycles-pp.sock_recvmsg
      0.29 ±  3%      -0.1        0.23 ±  2%  perf-profile.self.cycles-pp.ksys_read
      0.21 ±  3%      -0.1        0.16 ±  3%  perf-profile.self.cycles-pp.rcu_all_qs
      0.22 ±  4%      -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.unix_stream_recvmsg
      0.19 ±  3%      -0.0        0.14        perf-profile.self.cycles-pp.security_file_permission
      0.25 ±  3%      -0.0        0.21 ±  3%  perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
      0.21 ±  3%      -0.0        0.16 ±  3%  perf-profile.self.cycles-pp.__scm_recv_common
      0.17 ±  6%      -0.0        0.13 ±  7%  perf-profile.self.cycles-pp.kmalloc_size_roundup
      0.14 ±  3%      -0.0        0.10        perf-profile.self.cycles-pp.skb_unlink
      0.13 ±  3%      -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.skb_queue_tail
      0.12 ±  7%      -0.0        0.09        perf-profile.self.cycles-pp.maybe_add_creds
      0.18 ±  4%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.13 ±  3%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.skb_copy_datagram_iter
      0.13 ±  2%      -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.consume_skb
      0.18 ±  3%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.unix_destruct_scm
      0.12 ±  5%      -0.0        0.09        perf-profile.self.cycles-pp.is_vmalloc_addr
      0.11 ±  4%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.security_socket_getpeersec_dgram
      0.14 ±  3%      -0.0        0.11 ±  3%  perf-profile.self.cycles-pp.check_stack_object
      0.10 ±  4%      -0.0        0.07        perf-profile.self.cycles-pp.security_socket_sendmsg
      0.17 ±  5%      -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
      0.06            -0.0        0.03 ± 70%  perf-profile.self.cycles-pp.kfree_skbmem
      0.09 ±  4%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.wait_for_unix_gc
      0.12 ±  4%      -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.security_socket_recvmsg
      0.15 ±  3%      -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.unix_scm_to_skb
      0.12 ±  4%      -0.0        0.09        perf-profile.self.cycles-pp.manage_oob
      0.10 ±  4%      -0.0        0.08        perf-profile.self.cycles-pp.skb_release_head_state
      0.12 ±  4%      -0.0        0.09        perf-profile.self.cycles-pp.put_pid
      0.08 ±  6%      -0.0        0.06 ±  8%  perf-profile.self.cycles-pp.skb_put
      0.08 ±  6%      -0.0        0.06        perf-profile.self.cycles-pp.skb_free_head
      0.09            -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.simple_copy_to_iter
      0.08 ±  6%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.unix_stream_read_actor
      0.07 ± 16%      +0.0        0.10 ±  7%  perf-profile.self.cycles-pp.__get_user_8
      0.08 ± 19%      +0.0        0.11 ±  4%  perf-profile.self.cycles-pp.pick_next_task_fair
      0.05 ± 47%      +0.0        0.08 ±  4%  perf-profile.self.cycles-pp.unix_stream_data_wait
      0.02 ± 99%      +0.0        0.07 ±  7%  perf-profile.self.cycles-pp.__wrgsbase_inactive
      0.08 ± 14%      +0.0        0.12 ±  5%  perf-profile.self.cycles-pp.os_xsave
      0.02 ±141%      +0.0        0.06 ±  6%  perf-profile.self.cycles-pp.place_entity
      0.07 ± 32%      +0.0        0.12 ± 12%  perf-profile.self.cycles-pp.enqueue_task_fair
      0.03 ±100%      +0.0        0.07 ± 10%  perf-profile.self.cycles-pp.___perf_sw_event
      0.05 ± 74%      +0.0        0.10 ±  5%  perf-profile.self.cycles-pp.avg_vruntime
      0.06 ± 13%      +0.1        0.11 ± 11%  perf-profile.self.cycles-pp.vruntime_eligible
      0.25 ±  8%      +0.1        0.30 ±  5%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
      0.00            +0.1        0.05 ±  7%  perf-profile.self.cycles-pp.check_preempt_wakeup_fair
      0.02 ±141%      +0.1        0.07 ±  5%  perf-profile.self.cycles-pp.select_task_rq_fair
      0.01 ±223%      +0.1        0.06 ±  6%  perf-profile.self.cycles-pp.schedule
      0.02 ±141%      +0.1        0.07 ±  8%  perf-profile.self.cycles-pp.__put_user_8
      0.00            +0.1        0.06 ±  9%  perf-profile.self.cycles-pp.__rb_insert_augmented
      0.04 ±100%      +0.1        0.09 ±  7%  perf-profile.self.cycles-pp.native_sched_clock
      0.04 ±100%      +0.1        0.10 ±  4%  perf-profile.self.cycles-pp.dequeue_entity
      0.09 ± 21%      +0.1        0.15 ±  7%  perf-profile.self.cycles-pp.dequeue_entities
      0.04 ± 72%      +0.1        0.11 ±  8%  perf-profile.self.cycles-pp.rseq_update_cpu_node_id
      0.00            +0.1        0.07 ±  6%  perf-profile.self.cycles-pp.refill_stock
      0.43 ±  4%      +0.1        0.50 ±  2%  perf-profile.self.cycles-pp.unix_write_space
      0.00            +0.1        0.08 ±  9%  perf-profile.self.cycles-pp.set_next_buddy
      0.04 ±100%      +0.1        0.12 ±  5%  perf-profile.self.cycles-pp.switch_fpu_return
      0.15 ± 22%      +0.1        0.24 ±  6%  perf-profile.self.cycles-pp.__update_load_avg_se
      0.10 ± 24%      +0.1        0.19 ±  4%  perf-profile.self.cycles-pp.prepare_task_switch
      0.15 ± 11%      +0.1        0.24 ±  7%  perf-profile.self.cycles-pp.__dequeue_entity
      0.13 ± 34%      +0.1        0.22 ±  9%  perf-profile.self.cycles-pp.update_curr
      0.12 ± 23%      +0.1        0.21 ±  9%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.04 ±100%      +0.1        0.14 ±  4%  perf-profile.self.cycles-pp.update_rq_clock_task
      0.04 ±100%      +0.1        0.14 ±  5%  perf-profile.self.cycles-pp.prepare_to_wait
      0.28 ± 24%      +0.1        0.40 ±  5%  perf-profile.self.cycles-pp.__schedule
      0.18 ± 14%      +0.1        0.32 ±  6%  perf-profile.self.cycles-pp.__enqueue_entity
      0.13 ± 19%      +0.1        0.27 ±  5%  perf-profile.self.cycles-pp.__switch_to
      0.40 ±  6%      +0.1        0.54 ±  2%  perf-profile.self.cycles-pp.mutex_lock
      0.11 ± 20%      +0.2        0.28 ±  7%  perf-profile.self.cycles-pp.pick_eevdf
      0.35 ± 18%      +0.2        0.52 ±  9%  perf-profile.self.cycles-pp.__switch_to_asm
      0.00            +0.2        0.18 ±  4%  perf-profile.self.cycles-pp.page_counter_cancel
      0.37 ±  6%      +0.3        0.67        perf-profile.self.cycles-pp.mutex_unlock
      0.65 ±  6%      +0.3        0.99        perf-profile.self.cycles-pp.fput
      1.68 ± 14%      +0.4        2.08 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.27 ±  9%      +0.5        0.80 ± 56%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      0.22 ±  7%      +0.5        0.77 ± 66%  perf-profile.self.cycles-pp.__mod_memcg_state
      0.00            +0.7        0.65 ± 11%  perf-profile.self.cycles-pp.propagate_protected_usage
      0.10 ±  4%     +11.3       11.43 ±  3%  perf-profile.self.cycles-pp.try_charge_memcg
      0.00           +13.7       13.72 ±  2%  perf-profile.self.cycles-pp.page_counter_try_charge



***************************************************************************************************
lkp-spr-2sp4: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_threads/rootfs/tbox_group/test/test_memory_size/testcase:
  gcc-12/performance/x86_64-rhel-9.4/development/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/TCP/50%/lmbench3

commit: 
  8c57b687e8 ("mm, bpf: Introduce free_pages_nolock()")
  01d37228d3 ("memcg: Use trylock to access memcg stock_lock.")

8c57b687e8331eb8 01d37228d331047a0bbbd1026ce 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      1594 ±  3%     +19.3%       1901        meminfo.Mlocked
    149.67 ±  4%      +8.9%     163.00 ±  5%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    969.96 ±  6%     +48.3%       1438 ±  2%  uptime.boot
 6.042e+08            -9.6%   5.46e+08 ±  4%  numa-numastat.node0.local_node
 6.043e+08            -9.6%  5.462e+08 ±  4%  numa-numastat.node0.numa_hit
 6.043e+08            -9.6%  5.462e+08 ±  4%  numa-vmstat.node0.numa_hit
 6.042e+08            -9.6%   5.46e+08 ±  4%  numa-vmstat.node0.numa_local
   4526689 ±  6%    +310.8%   18596060 ±  2%  vmstat.system.cs
    303047            -8.1%     278503        vmstat.system.in
     12.93 ±  5%      -4.2        8.72 ±  3%  mpstat.cpu.all.idle%
      4.39 ±  4%      +9.4       13.80 ±  2%  mpstat.cpu.all.soft%
      5.33 ±  3%      -1.1        4.22        mpstat.cpu.all.usr%
    184478           -34.4%     121014 ±  2%  lmbench3.TCP.socket.bandwidth.10MB.MB/sec
     12082 ±  4%     -33.0%       8091        lmbench3.TCP.socket.bandwidth.64B.MB/sec
    915.46 ±  7%     +50.8%       1380 ±  2%  lmbench3.time.elapsed_time
    915.46 ±  7%     +50.8%       1380 ±  2%  lmbench3.time.elapsed_time.max
  44831013 ±  7%     -44.1%   25067866 ±  3%  lmbench3.time.involuntary_context_switches
     11254 ±  4%     -27.5%       8155 ±  3%  lmbench3.time.percent_of_cpu_this_job_got
      6453 ±  6%      +9.1%       7040 ±  2%  lmbench3.time.user_time
 1.802e+09 ±  5%    +597.6%  1.257e+10 ±  2%  lmbench3.time.voluntary_context_switches
    397.85 ±  3%     +19.4%     474.88        proc-vmstat.nr_mlock
      8096            +2.9%       8335        proc-vmstat.nr_page_table_pages
 1.205e+09            -8.9%  1.098e+09        proc-vmstat.numa_hit
 1.205e+09            -8.9%  1.097e+09        proc-vmstat.numa_local
 9.615e+09            -8.9%  8.756e+09        proc-vmstat.pgalloc_normal
   3414882 ±  3%     +32.2%    4513062 ±  2%  proc-vmstat.pgfault
 9.614e+09            -8.9%  8.755e+09        proc-vmstat.pgfree
    146555 ±  3%     +30.1%     190716 ±  4%  proc-vmstat.pgreuse
      0.65 ± 77%      +0.6        1.21 ± 21%  perf-profile.calltrace.cycles-pp.__tcp_cleanup_rbuf.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvmsg
      0.58 ± 80%      +1.7        2.23 ±  8%  perf-profile.calltrace.cycles-pp.release_sock.tcp_recvmsg.inet_recvmsg.sock_recvmsg.sock_read_iter
      0.23 ± 51%      +0.1        0.36 ±  2%  perf-profile.children.cycles-pp.record__pushfn
      0.23 ± 51%      +0.1        0.36 ±  2%  perf-profile.children.cycles-pp.writen
      0.21 ± 53%      +0.1        0.35 ±  5%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.21 ± 52%      +0.1        0.35 ±  6%  perf-profile.children.cycles-pp.generic_perform_write
      0.78 ± 49%      +0.5        1.25 ± 21%  perf-profile.children.cycles-pp.__tcp_cleanup_rbuf
      0.17 ± 78%      +0.3        0.45 ± 24%  perf-profile.self.cycles-pp.__tcp_cleanup_rbuf
      0.33 ± 99%      +1.7        2.02 ± 10%  perf-profile.self.cycles-pp.release_sock
  83660333 ±  7%     +37.1%  1.147e+08 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.avg
  97337639 ±  9%     +37.5%  1.338e+08 ±  2%  sched_debug.cfs_rq:/.avg_vruntime.max
  71425435 ±  6%     +42.3%  1.016e+08 ±  5%  sched_debug.cfs_rq:/.avg_vruntime.min
      0.42 ±  4%     +16.7%       0.50 ±  3%  sched_debug.cfs_rq:/.h_nr_queued.stddev
  83660333 ±  7%     +37.1%  1.147e+08 ±  2%  sched_debug.cfs_rq:/.min_vruntime.avg
  97337639 ±  9%     +37.5%  1.338e+08 ±  2%  sched_debug.cfs_rq:/.min_vruntime.max
  71425435 ±  6%     +42.3%  1.016e+08 ±  5%  sched_debug.cfs_rq:/.min_vruntime.min
    104.29 ± 40%     -41.8%      60.70 ± 40%  sched_debug.cfs_rq:/.removed.load_avg.max
    271.46 ±  7%     +17.4%     318.60 ±  4%  sched_debug.cfs_rq:/.util_est.stddev
    839790 ±  5%     -13.7%     724828 ±  3%  sched_debug.cpu.avg_idle.avg
     94203 ± 36%     -46.8%      50142 ± 55%  sched_debug.cpu.avg_idle.min
    312614 ±  8%     +28.3%     401108 ±  3%  sched_debug.cpu.avg_idle.stddev
    500309 ±  5%     +45.6%     728380        sched_debug.cpu.clock.avg
    500675 ±  5%     +45.5%     728646        sched_debug.cpu.clock.max
    499891 ±  5%     +45.7%     728091        sched_debug.cpu.clock.min
    472701 ±  5%     +33.6%     631669 ±  2%  sched_debug.cpu.clock_task.avg
    480686 ±  5%     +34.3%     645473 ±  2%  sched_debug.cpu.clock_task.max
    459106 ±  5%     +33.2%     611651 ±  2%  sched_debug.cpu.clock_task.min
      2864 ±  8%     +95.5%       5599 ± 13%  sched_debug.cpu.clock_task.stddev
     17343 ±  6%     +30.8%      22685 ±  3%  sched_debug.cpu.curr->pid.avg
     20079 ±  3%     +28.2%      25732        sched_debug.cpu.curr->pid.max
     66128 ± 16%     -32.6%      44587 ±  7%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.42 ±  4%     +17.4%       0.50 ±  3%  sched_debug.cpu.nr_running.stddev
  12746739 ±  6%    +370.9%   60025838        sched_debug.cpu.nr_switches.avg
  16962002 ±  6%    +296.5%   67251221 ±  2%  sched_debug.cpu.nr_switches.max
   7092180 ± 12%    +582.2%   48379329 ±  6%  sched_debug.cpu.nr_switches.min
   1956675 ± 10%     +72.0%    3365465 ± 26%  sched_debug.cpu.nr_switches.stddev
    499883 ±  5%     +45.7%     728085        sched_debug.cpu_clk
    498838 ±  5%     +45.7%     727039        sched_debug.ktime
    500718 ±  5%     +45.6%     728919        sched_debug.sched_clk
 7.122e+10 ±  4%      +9.8%  7.823e+10 ±  2%  perf-stat.i.branch-instructions
      0.33 ±  3%      +0.0        0.37        perf-stat.i.branch-miss-rate%
 1.092e+08 ±  3%    +107.4%  2.265e+08 ±  2%  perf-stat.i.branch-misses
     13.88 ±  3%      +7.1       20.93 ±  2%  perf-stat.i.cache-miss-rate%
  2.94e+08 ±  8%     -33.3%  1.962e+08 ±  3%  perf-stat.i.cache-misses
 1.297e+09 ±  4%     -56.4%  5.652e+08 ±  3%  perf-stat.i.cache-references
   4741398 ±  5%    +304.2%   19164589 ±  2%  perf-stat.i.context-switches
 5.582e+11            +5.5%  5.891e+11        perf-stat.i.cpu-cycles
    443841 ±  4%     -36.7%     281171 ±  6%  perf-stat.i.cpu-migrations
     35296 ±  8%     -26.9%      25797 ±  4%  perf-stat.i.cycles-between-cache-misses
  3.65e+11 ±  4%     +10.9%  4.047e+11 ±  2%  perf-stat.i.instructions
      0.65 ±  8%     -42.8%       0.37 ± 24%  perf-stat.i.major-faults
     23.04 ±  5%    +276.7%      86.77 ±  2%  perf-stat.i.metric.K/sec
      3611 ±  4%     -12.8%       3149        perf-stat.i.minor-faults
      3612 ±  4%     -12.8%       3149        perf-stat.i.page-faults
      0.79 ±  5%     -38.0%       0.49 ±  4%  perf-stat.overall.MPKI
      0.15 ±  2%      +0.1        0.29        perf-stat.overall.branch-miss-rate%
     22.67 ±  4%     +12.4       35.07 ±  2%  perf-stat.overall.cache-miss-rate%
      2022 ± 10%     +50.6%       3046 ±  4%  perf-stat.overall.cycles-between-cache-misses
 6.892e+10 ±  4%     +10.9%   7.64e+10 ±  2%  perf-stat.ps.branch-instructions
 1.043e+08 ±  4%    +110.3%  2.194e+08 ±  2%  perf-stat.ps.branch-misses
 2.803e+08 ±  9%     -30.8%   1.94e+08 ±  4%  perf-stat.ps.cache-misses
 1.234e+09 ±  5%     -55.2%  5.533e+08 ±  3%  perf-stat.ps.cache-references
   4509001 ±  6%    +311.1%   18537174 ±  2%  perf-stat.ps.context-switches
 5.613e+11            +5.1%    5.9e+11        perf-stat.ps.cpu-cycles
    412889 ±  4%     -35.3%     267257 ±  6%  perf-stat.ps.cpu-migrations
 3.533e+11 ±  4%     +11.9%  3.952e+11 ±  2%  perf-stat.ps.instructions
      0.69 ±  7%     -41.4%       0.40 ± 23%  perf-stat.ps.major-faults
      3548 ±  4%     -11.9%       3125        perf-stat.ps.minor-faults
      3549 ±  4%     -11.9%       3126        perf-stat.ps.page-faults
 3.226e+14 ±  2%     +69.2%  5.459e+14        perf-stat.total.instructions



***************************************************************************************************
lkp-emr-2sp1: 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 8592+ (Emerald Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/debian-12-x86_64-20240206.cgz/300s/1T/lkp-emr-2sp1/lru-shm/vm-scalability

commit: 
  8c57b687e8 ("mm, bpf: Introduce free_pages_nolock()")
  01d37228d3 ("memcg: Use trylock to access memcg stock_lock.")

8c57b687e8331eb8 01d37228d331047a0bbbd1026ce 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 4.666e+10           +12.1%  5.231e+10 ±  2%  cpuidle..time
   4452013          +145.5%   10928419 ±  4%  cpuidle..usage
    233.91           +46.5%     342.71        uptime.boot
     54574           +10.4%      60225        uptime.idle
     65451 ± 28%     +86.7%     122214 ± 35%  meminfo.AnonHugePages
   4653322 ±  4%    +338.3%   20396688 ±  6%  meminfo.Mapped
     22987 ±  2%    +150.5%      57574 ±  5%  meminfo.PageTables
      1.00 ±100%  +66550.0%     666.50 ±138%  perf-c2c.DRAM.local
     58.17 ± 21%  +15319.8%       8969 ±127%  perf-c2c.DRAM.remote
     11.83 ± 80%  +14446.5%       1721 ±138%  perf-c2c.HITM.local
     19.17 ± 32%  +11917.4%       2303 ±125%  perf-c2c.HITM.remote
     91.76           -27.2%      66.77 ±  2%  vmstat.cpu.id
     22.81 ±  2%    +280.7%      86.82 ±  3%  vmstat.procs.r
      6701            -9.9%       6040 ±  3%  vmstat.system.cs
     47395 ±  5%    +192.6%     138675 ±  5%  vmstat.system.in
     91.77           -25.2       66.53 ±  2%  mpstat.cpu.all.idle%
      0.07 ± 10%      +0.2        0.24 ± 42%  mpstat.cpu.all.irq%
      0.03            +0.0        0.06 ± 25%  mpstat.cpu.all.soft%
      5.84           +25.4       31.28 ±  4%  mpstat.cpu.all.sys%
      2.29            -0.4        1.89 ±  3%  mpstat.cpu.all.usr%
     45.47          +100.8%      91.30 ±  7%  mpstat.max_utilization_pct
   1186980          +334.3%    5154770 ± 10%  numa-meminfo.node0.Mapped
      6229 ± 11%    +131.7%      14432 ±  3%  numa-meminfo.node0.PageTables
   1183346 ±  4%    +317.9%    4945218 ±  6%  numa-meminfo.node1.Mapped
      5409 ± 14%    +154.5%      13769 ±  4%  numa-meminfo.node1.PageTables
   1213253 ±  5%    +323.8%    5142208 ±  7%  numa-meminfo.node2.Mapped
      6317 ± 17%    +129.9%      14522 ±  6%  numa-meminfo.node2.PageTables
   1219782 ±  4%    +311.7%    5022404 ± 10%  numa-meminfo.node3.Mapped
      5915 ± 10%    +143.6%      14409 ±  9%  numa-meminfo.node3.PageTables
    293807 ±  5%    +333.7%    1274238 ± 10%  numa-vmstat.node0.nr_mapped
      1529 ± 11%    +134.5%       3587 ±  4%  numa-vmstat.node0.nr_page_table_pages
    293261 ±  2%    +316.8%    1222310 ±  6%  numa-vmstat.node1.nr_mapped
      1356 ± 12%    +152.5%       3424 ±  5%  numa-vmstat.node1.nr_page_table_pages
    300064 ±  5%    +324.0%    1272313 ±  6%  numa-vmstat.node2.nr_mapped
      1566 ± 15%    +130.9%       3617 ±  7%  numa-vmstat.node2.nr_page_table_pages
    299769 ±  5%    +317.7%    1252275 ± 10%  numa-vmstat.node3.nr_mapped
      1466 ± 14%    +145.5%       3600 ±  9%  numa-vmstat.node3.nr_page_table_pages
     43026            +4.4%      44910        proc-vmstat.nr_kernel_stack
   1205005 ±  2%    +321.4%    5077813 ±  6%  proc-vmstat.nr_mapped
      5917          +142.4%      14340 ±  5%  proc-vmstat.nr_page_table_pages
     79171            -3.6%      76354        proc-vmstat.nr_slab_reclaimable
     21575 ± 22%    +109.8%      45270 ±  8%  proc-vmstat.numa_hint_faults
 1.071e+09           -12.1%  9.412e+08 ±  4%  proc-vmstat.numa_hit
 1.063e+09           -11.6%  9.402e+08 ±  4%  proc-vmstat.numa_local
    198294 ± 27%    +104.9%     406310 ± 37%  proc-vmstat.numa_pte_updates
 1.063e+09           -11.4%  9.423e+08 ±  4%  proc-vmstat.pgalloc_normal
  1.06e+09           -11.4%  9.389e+08 ±  4%  proc-vmstat.pgfault
 1.063e+09           -11.4%  9.421e+08 ±  4%  proc-vmstat.pgfree
    396442            -5.6%     374046 ±  3%  proc-vmstat.pgreuse
      5826 ±  3%     +42.1%       8276 ±  9%  proc-vmstat.unevictable_pgs_culled
      0.01           +65.2%       0.01 ±  4%  vm-scalability.free_time
   1315486           -86.6%     175811 ±  4%  vm-scalability.median
      0.85 ± 17%      +5.3        6.19 ± 57%  vm-scalability.median_stddev%
      1.24 ± 22%      +4.7        5.92 ± 58%  vm-scalability.stddev%
 3.401e+08           -86.8%   45006220 ±  5%  vm-scalability.throughput
    197.44           +55.0%     305.97        vm-scalability.time.elapsed_time
    197.44           +55.0%     305.97        vm-scalability.time.elapsed_time.max
     30562          +186.6%      87583 ±  7%  vm-scalability.time.involuntary_context_switches
 1.059e+09           -11.5%  9.375e+08 ±  4%  vm-scalability.time.minor_page_faults
      1964          +326.5%       8377 ±  4%  vm-scalability.time.percent_of_cpu_this_job_got
      2761          +777.9%      24242 ±  4%  vm-scalability.time.system_time
      1117           +24.6%       1393 ±  4%  vm-scalability.time.user_time
    107496           -12.3%      94315 ±  6%  vm-scalability.time.voluntary_context_switches
 4.742e+09           -11.5%  4.199e+09 ±  4%  vm-scalability.workload
 3.489e+10           -43.5%  1.972e+10 ±  5%  perf-stat.i.branch-instructions
      0.27            -0.1        0.19 ±  3%  perf-stat.i.branch-miss-rate%
  26446125           -31.2%   18182985 ±  2%  perf-stat.i.branch-misses
 1.417e+08 ±  2%     -39.7%   85414690 ± 11%  perf-stat.i.cache-misses
 3.788e+08           -45.6%   2.06e+08 ± 17%  perf-stat.i.cache-references
      6654           -10.2%       5975 ±  3%  perf-stat.i.context-switches
      0.69 ±  7%    +239.3%       2.34 ±  5%  perf-stat.i.cpi
 6.799e+10          +259.9%  2.447e+11 ±  5%  perf-stat.i.cpu-cycles
    588.49 ±  3%     -16.5%     491.20 ±  4%  perf-stat.i.cpu-migrations
    524.75 ±  9%    +285.2%       2021 ± 13%  perf-stat.i.cycles-between-cache-misses
 1.258e+11           -43.1%  7.162e+10 ±  5%  perf-stat.i.instructions
      1.56 ±  3%     -50.9%       0.77 ±  4%  perf-stat.i.ipc
      1.02 ±  6%     -53.4%       0.47 ± 10%  perf-stat.i.major-faults
     41.68           -44.5%      23.14 ±  6%  perf-stat.i.metric.K/sec
   5383984           -44.8%    2971563 ±  6%  perf-stat.i.minor-faults
   5383985           -44.8%    2971564 ±  6%  perf-stat.i.page-faults
      0.08            +0.0        0.09 ±  2%  perf-stat.overall.branch-miss-rate%
     37.40            +4.6       41.96 ±  8%  perf-stat.overall.cache-miss-rate%
      0.54          +542.1%       3.47 ±  3%  perf-stat.overall.cpi
    480.08          +516.6%       2960 ± 13%  perf-stat.overall.cycles-between-cache-misses
      1.85           -84.4%       0.29 ±  3%  perf-stat.overall.ipc
      5222            +2.4%       5350        perf-stat.overall.path-length
 3.465e+10           -41.7%  2.019e+10 ±  4%  perf-stat.ps.branch-instructions
  26086104           -30.8%   18041050 ±  2%  perf-stat.ps.branch-misses
 1.408e+08 ±  2%     -38.1%   87092040 ± 12%  perf-stat.ps.cache-misses
 3.763e+08           -44.1%  2.104e+08 ± 17%  perf-stat.ps.cache-references
      6606            -9.6%       5969 ±  3%  perf-stat.ps.context-switches
 6.754e+10          +275.5%  2.536e+11 ±  3%  perf-stat.ps.cpu-cycles
    584.17 ±  3%     -16.0%     490.74 ±  4%  perf-stat.ps.cpu-migrations
  1.25e+11           -41.5%  7.316e+10 ±  4%  perf-stat.ps.instructions
      1.01 ±  6%     -53.0%       0.48 ±  9%  perf-stat.ps.major-faults
   5346919           -42.8%    3057237 ±  4%  perf-stat.ps.minor-faults
   5346920           -42.8%    3057238 ±  4%  perf-stat.ps.page-faults
 2.476e+13            -9.3%  2.246e+13 ±  4%  perf-stat.total.instructions
    931893 ± 12%    +892.4%    9248299 ± 18%  sched_debug.cfs_rq:/.avg_vruntime.avg
   1446903 ± 12%    +687.3%   11391069 ± 16%  sched_debug.cfs_rq:/.avg_vruntime.max
    481792 ± 17%   +1257.4%    6539912 ± 25%  sched_debug.cfs_rq:/.avg_vruntime.min
    204874 ± 10%    +338.4%     898146 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.09 ± 67%    +287.2%       0.36 ± 16%  sched_debug.cfs_rq:/.h_nr_queued.avg
      0.09 ± 70%    +296.6%       0.36 ± 16%  sched_debug.cfs_rq:/.h_nr_runnable.avg
    708874 ± 42%    +532.5%    4483644 ± 67%  sched_debug.cfs_rq:/.left_deadline.max
     63364 ± 46%    +468.8%     360399 ± 72%  sched_debug.cfs_rq:/.left_deadline.stddev
    708816 ± 42%    +532.5%    4483542 ± 67%  sched_debug.cfs_rq:/.left_vruntime.max
     63359 ± 46%    +468.8%     360393 ± 72%  sched_debug.cfs_rq:/.left_vruntime.stddev
    931893 ± 12%    +892.4%    9248299 ± 18%  sched_debug.cfs_rq:/.min_vruntime.avg
   1446903 ± 12%    +687.3%   11391069 ± 16%  sched_debug.cfs_rq:/.min_vruntime.max
    481792 ± 17%   +1257.4%    6539912 ± 25%  sched_debug.cfs_rq:/.min_vruntime.min
    204874 ± 10%    +338.4%     898146 ±  8%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.10 ± 64%    +278.4%       0.36 ± 16%  sched_debug.cfs_rq:/.nr_queued.avg
    708816 ± 42%    +532.5%    4483542 ± 67%  sched_debug.cfs_rq:/.right_vruntime.max
     63359 ± 46%    +468.8%     360393 ± 72%  sched_debug.cfs_rq:/.right_vruntime.stddev
    103.73 ± 59%    +264.8%     378.38 ± 16%  sched_debug.cfs_rq:/.util_avg.avg
    195.87 ± 22%     +53.5%     300.66 ± 13%  sched_debug.cfs_rq:/.util_avg.stddev
     31.73 ± 86%    +384.5%     153.73 ± 17%  sched_debug.cfs_rq:/.util_est.avg
     82.97 ± 51%     +78.2%     147.83 ± 10%  sched_debug.cfs_rq:/.util_est.stddev
    183331 ± 20%     +37.5%     252021 ± 19%  sched_debug.cpu.avg_idle.min
    120841 ±  9%     +41.6%     171158 ±  8%  sched_debug.cpu.clock.avg
    120861 ±  9%     +41.7%     171224 ±  8%  sched_debug.cpu.clock.max
    120779 ±  9%     +41.6%     171052 ±  8%  sched_debug.cpu.clock.min
     13.07 ± 15%    +239.4%      44.36 ± 49%  sched_debug.cpu.clock.stddev
    120651 ±  9%     +41.5%     170689 ±  8%  sched_debug.cpu.clock_task.avg
    120788 ±  9%     +41.6%     171006 ±  8%  sched_debug.cpu.clock_task.max
    105855 ± 10%     +45.7%     154255 ±  9%  sched_debug.cpu.clock_task.min
      2562 ± 99%    +325.1%      10895 ± 17%  sched_debug.cpu.curr->pid.avg
      4577 ± 51%     +81.1%       8289 ± 16%  sched_debug.cpu.curr->pid.stddev
     19843 ± 57%    +640.0%     146849 ±124%  sched_debug.cpu.max_idle_balance_cost.stddev
      0.00 ± 65%    +146.2%       0.00 ± 53%  sched_debug.cpu.next_balance.stddev
      0.08 ± 72%    +313.0%       0.35 ± 17%  sched_debug.cpu.nr_running.avg
      0.20 ± 21%     +48.7%       0.30 ± 11%  sched_debug.cpu.nr_running.stddev
      2694 ± 10%     +29.2%       3482 ± 10%  sched_debug.cpu.nr_switches.avg
    946.51 ± 11%     +27.7%       1208 ± 13%  sched_debug.cpu.nr_switches.min
      2562 ±  8%     +40.0%       3586 ± 22%  sched_debug.cpu.nr_switches.stddev
    120822 ±  9%     +41.6%     171065 ±  8%  sched_debug.cpu_clk
    119783 ±  9%     +41.9%     170023 ±  8%  sched_debug.ktime
    122204 ±  9%     +41.1%     172446 ±  8%  sched_debug.sched_clk
      0.19 ± 12%    +115.1%       0.40 ± 40%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.05 ± 41%    +346.6%       0.21 ± 27%  perf-sched.sch_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      0.01 ± 59%    +577.5%       0.05 ± 30%  perf-sched.sch_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.05 ± 92%    +167.9%       0.12 ± 19%  perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
      0.01 ± 14%    +623.5%       0.10 ± 74%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.01 ±  5%    +798.7%       0.12 ± 21%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.02 ± 38%    +525.5%       0.16 ± 52%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.01 ± 20%    +223.8%       0.04 ± 47%  perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.04 ± 38%    +852.4%       0.36 ± 66%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.02 ± 18%    +771.7%       0.14 ± 65%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.15 ± 11%     -36.6%       0.10 ± 28%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.03 ± 21%    +865.9%       0.28 ± 82%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.15 ± 17%     -66.0%       0.05 ± 35%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      0.20 ± 50%  +24616.3%      50.38 ± 80%  perf-sched.sch_delay.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      0.02 ± 97%   +1023.3%       0.17 ± 38%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3.42 ± 12%    +101.3%       6.88 ± 61%  perf-sched.sch_delay.max.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap_unlocked.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
      0.11 ± 74%    +483.3%       0.62 ± 44%  perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
      0.02 ± 15%   +1525.2%       0.29 ± 68%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.02 ± 15%   +2244.7%       0.40 ± 51%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      1.02 ±103%   +1337.3%      14.66 ±118%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.02 ± 25%    +418.0%       0.11 ± 54%  perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.68 ±101%   +6041.8%      41.46 ±129%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      1.12 ± 60%  +49035.6%     551.87 ± 84%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.42 ± 71%  +12165.4%      50.94 ±116%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.27 ± 32%  +31119.3%      84.55 ±221%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      1.98 ± 14%    +800.7%      17.87 ± 72%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      0.10 ± 18%     +70.7%       0.18 ± 14%  perf-sched.total_sch_delay.average.ms
    120.83 ±144%    +528.3%     759.23 ± 51%  perf-sched.total_sch_delay.max.ms
      0.12 ±109%    +533.6%       0.74 ± 43%  perf-sched.wait_and_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      0.51 ± 43%    +568.9%       3.44 ± 49%  perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.06 ±101%   +1002.8%       0.66 ± 57%  perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      1.39 ± 20%    +425.5%       7.30 ± 12%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      1.48 ±100%    +754.1%      12.63 ± 13%  perf-sched.wait_and_delay.avg.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe
     77.50 ±108%    +341.7%     342.33 ± 10%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     70.33 ±100%    +189.6%     203.67 ± 17%  perf-sched.wait_and_delay.count.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1209 ± 34%     +46.0%       1766 ±  4%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.84 ±101%   +1408.4%      12.67 ± 56%  perf-sched.wait_and_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      1006 ± 36%     -73.0%     272.14 ± 55%  perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      1.16 ±129%   +5576.0%      66.07 ±143%  perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     12.52 ±101%    +426.3%      65.90 ± 75%  perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     10.40 ±102%    +926.8%     106.78 ± 31%  perf-sched.wait_and_delay.max.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.05 ± 36%    +196.2%       0.16 ± 27%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
      0.05 ± 31%    +196.0%       0.16 ± 25%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      0.16 ± 57%    +311.3%       0.65 ± 43%  perf-sched.wait_time.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
     11.56 ±122%    +807.8%     104.94 ± 93%  perf-sched.wait_time.avg.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap_unlocked.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
      0.49 ± 46%    +570.7%       3.29 ± 50%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.01 ±129%    +403.4%       0.05 ± 36%  perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
      0.05 ± 40%    +494.9%       0.29 ± 50%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      1.36 ± 21%    +416.0%       7.02 ±  9%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      1.45 ±100%    +768.6%      12.57 ± 13%  perf-sched.wait_time.avg.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.37 ± 52%    +362.9%       1.72 ± 37%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
      0.23 ± 46%   +4734.9%      11.31 ± 81%  perf-sched.wait_time.max.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      1.14 ± 46%    +951.6%      12.02 ± 60%  perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
     92.03 ± 49%    +336.8%     402.04 ± 64%  perf-sched.wait_time.max.ms.__cond_resched.ww_mutex_lock.drm_gem_vunmap_unlocked.drm_gem_fb_vunmap.drm_atomic_helper_commit_planes
      1006 ± 36%     -73.0%     272.14 ± 55%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      0.05 ±121%    +406.2%       0.24 ± 54%  perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault
      0.76 ± 85%   +3220.5%      25.09 ±190%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
     10.39 ±102%    +927.7%     106.76 ± 31%  perf-sched.wait_time.max.ms.sigsuspend.__x64_sys_rt_sigsuspend.do_syscall_64.entry_SYSCALL_64_after_hwframe



***************************************************************************************************
lkp-icl-2sp2: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
  cs-localhost/gcc-12/performance/ipv4/x86_64-rhel-9.4/200%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp2/TCP_MAERTS/netperf

commit: 
  8c57b687e8 ("mm, bpf: Introduce free_pages_nolock()")
  01d37228d3 ("memcg: Use trylock to access memcg stock_lock.")

8c57b687e8331eb8 01d37228d331047a0bbbd1026ce 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 4.421e+08 ±  2%     +22.9%  5.434e+08 ±  6%  numa-numastat.node0.local_node
 4.421e+08 ±  2%     +22.9%  5.435e+08 ±  6%  numa-numastat.node0.numa_hit
     15990 ± 95%     -97.3%     437.17 ± 20%  perf-c2c.DRAM.local
      9053 ±  9%     -34.0%       5974 ± 20%  perf-c2c.HITM.local
      9888 ±  9%     -31.2%       6799 ± 20%  perf-c2c.HITM.total
    462.20           -23.0%     355.75        vmstat.procs.r
     70900 ±  2%   +9238.1%    6620774        vmstat.system.cs
    172829           +22.5%     211697 ±  2%  vmstat.system.in
      0.71 ±  3%      -0.2        0.46 ±  7%  mpstat.cpu.all.irq%
      5.89 ±  4%     +11.9       17.74        mpstat.cpu.all.soft%
     90.68           -12.1       78.54        mpstat.cpu.all.sys%
      1.76 ± 10%      +0.6        2.37 ±  2%  mpstat.cpu.all.usr%
   1502311 ± 10%     +50.7%    2263993 ± 13%  meminfo.Active
   1502311 ± 10%     +50.7%    2263993 ± 13%  meminfo.Active(anon)
   4135865 ±  3%     +18.9%    4915987 ±  5%  meminfo.Cached
   1777441 ±  9%     +42.6%    2534680 ± 11%  meminfo.Committed_AS
    305395 ± 16%     -31.8%     208208 ± 14%  meminfo.Mapped
    631369 ± 24%    +123.6%    1411635 ± 20%  meminfo.Shmem
    113990 ± 48%     -78.4%      24658 ±134%  numa-meminfo.node0.Mapped
   4394098 ± 35%     -43.4%    2485612 ± 63%  numa-meminfo.node0.MemUsed
    178139 ± 70%     -94.8%       9344 ± 30%  numa-meminfo.node0.Shmem
   1013397 ± 22%     +92.7%    1952777 ± 19%  numa-meminfo.node1.Active
   1013397 ± 22%     +92.7%    1952777 ± 19%  numa-meminfo.node1.Active(anon)
   1654163 ± 81%    +121.6%    3665933 ± 45%  numa-meminfo.node1.FilePages
    452337 ± 52%    +210.1%    1402518 ± 20%  numa-meminfo.node1.Shmem
     28672 ± 48%     -78.2%       6247 ±135%  numa-vmstat.node0.nr_mapped
     44638 ± 70%     -94.8%       2335 ± 30%  numa-vmstat.node0.nr_shmem
 4.422e+08 ±  2%     +22.9%  5.434e+08 ±  6%  numa-vmstat.node0.numa_hit
 4.421e+08 ±  2%     +22.9%  5.434e+08 ±  6%  numa-vmstat.node0.numa_local
    253743 ± 23%     +92.3%     488048 ± 19%  numa-vmstat.node1.nr_active_anon
    413918 ± 81%    +121.4%     916363 ± 45%  numa-vmstat.node1.nr_file_pages
    113461 ± 52%    +208.9%     350509 ± 20%  numa-vmstat.node1.nr_shmem
    253743 ± 23%     +92.3%     488048 ± 19%  numa-vmstat.node1.nr_zone_active_anon
      3009 ±  2%     +39.9%       4209        netperf.ThroughputBoth_Mbps
    770338 ±  2%     +39.9%    1077702        netperf.ThroughputBoth_total_Mbps
      3009 ±  2%     +39.9%       4209        netperf.Throughput_Mbps
    770338 ±  2%     +39.9%    1077702        netperf.Throughput_total_Mbps
   6400961 ±  3%     -88.6%     729834 ±  8%  netperf.time.involuntary_context_switches
    110445 ±  3%      -9.0%     100508 ±  2%  netperf.time.minor_page_faults
      6907           -62.3%       2601        netperf.time.percent_of_cpu_this_job_got
     20894           -64.5%       7412        netperf.time.system_time
    102.48          +346.5%     457.58 ±  3%  netperf.time.user_time
   4016592 ± 11%  +25096.8%  1.012e+09        netperf.time.voluntary_context_switches
 1.763e+09 ±  2%     +39.9%  2.467e+09        netperf.workload
    375497 ± 10%     +50.7%     565977 ± 12%  proc-vmstat.nr_active_anon
   1033912 ±  3%     +18.9%    1228968 ±  5%  proc-vmstat.nr_file_pages
     76489 ± 16%     -31.5%      52364 ± 14%  proc-vmstat.nr_mapped
    157787 ± 24%    +123.6%     352879 ± 20%  proc-vmstat.nr_shmem
     78136            -1.9%      76672        proc-vmstat.nr_slab_unreclaimable
    375497 ± 10%     +50.7%     565977 ± 12%  proc-vmstat.nr_zone_active_anon
  8.82e+08 ±  2%     +11.7%  9.854e+08        proc-vmstat.numa_hit
 8.819e+08 ±  2%     +11.7%  9.853e+08        proc-vmstat.numa_local
 7.042e+09 ±  2%     +11.7%  7.868e+09        proc-vmstat.pgalloc_normal
   1045876            +3.9%    1086806        proc-vmstat.pgfault
 7.042e+09 ±  2%     +11.7%  7.867e+09        proc-vmstat.pgfree
     55939 ±  3%      -7.1%      51982 ±  2%  proc-vmstat.pgreuse
  20348384           -15.8%   17136384        sched_debug.cfs_rq:/.avg_vruntime.avg
  32341494 ±  2%     -23.6%   24701136 ±  3%  sched_debug.cfs_rq:/.avg_vruntime.max
  16723296 ±  3%     -25.4%   12475358        sched_debug.cfs_rq:/.avg_vruntime.min
   2336769 ±  9%     +58.2%    3697913 ±  9%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      3.08           -22.1%       2.40        sched_debug.cfs_rq:/.h_nr_queued.avg
      5.39 ±  4%     -20.1%       4.31 ±  5%  sched_debug.cfs_rq:/.h_nr_queued.max
      1.01 ±  2%     -24.6%       0.76 ±  3%  sched_debug.cfs_rq:/.h_nr_queued.stddev
      4.89 ±  3%     -12.5%       4.28 ±  5%  sched_debug.cfs_rq:/.h_nr_runnable.max
      0.95 ±  2%     -20.3%       0.76 ±  3%  sched_debug.cfs_rq:/.h_nr_runnable.stddev
  20348384           -15.8%   17136384        sched_debug.cfs_rq:/.min_vruntime.avg
  32341494 ±  2%     -23.6%   24701136 ±  3%  sched_debug.cfs_rq:/.min_vruntime.max
  16723296 ±  3%     -25.4%   12475358        sched_debug.cfs_rq:/.min_vruntime.min
   2336769 ±  9%     +58.2%    3697913 ±  9%  sched_debug.cfs_rq:/.min_vruntime.stddev
    709.08 ± 33%     +59.9%       1134 ± 12%  sched_debug.cfs_rq:/.runnable_avg.min
    128.48 ±  8%     -13.5%     111.13 ±  3%  sched_debug.cfs_rq:/.util_avg.stddev
      1775 ±  2%     -20.0%       1420 ±  2%  sched_debug.cfs_rq:/.util_est.avg
      3785 ±  3%     -32.8%       2545 ±  3%  sched_debug.cfs_rq:/.util_est.max
    197.58 ± 45%    +137.3%     468.89 ±  3%  sched_debug.cfs_rq:/.util_est.min
    705.82 ±  3%     -41.9%     410.12 ±  3%  sched_debug.cfs_rq:/.util_est.stddev
    454866 ±  2%     -40.2%     271876 ±  5%  sched_debug.cpu.avg_idle.avg
     36878 ±  7%     -76.2%       8790 ±  7%  sched_debug.cpu.avg_idle.min
    266282 ±  3%     -10.9%     237226 ±  5%  sched_debug.cpu.avg_idle.stddev
     65.17 ± 31%     -50.7%      32.10 ± 13%  sched_debug.cpu.clock.stddev
    192576           -10.0%     173278        sched_debug.cpu.clock_task.avg
    183688           -10.2%     164932        sched_debug.cpu.clock_task.min
    860.66 ±  3%     +39.5%       1200 ± 17%  sched_debug.cpu.clock_task.stddev
      0.00 ± 29%     -44.2%       0.00 ± 12%  sched_debug.cpu.next_balance.stddev
      3.08           -22.8%       2.38        sched_debug.cpu.nr_running.avg
      5.42 ±  3%     -20.5%       4.31 ±  5%  sched_debug.cpu.nr_running.max
      1.00 ±  2%     -23.6%       0.77 ±  4%  sched_debug.cpu.nr_running.stddev
     73412 ±  2%  +10555.4%    7822363        sched_debug.cpu.nr_switches.avg
    127466 ± 11%   +8550.4%   11026390        sched_debug.cpu.nr_switches.max
     61589         +7830.8%    4884533 ±  4%  sched_debug.cpu.nr_switches.min
      8655 ± 12%  +23368.3%    2031246 ±  7%  sched_debug.cpu.nr_switches.stddev
     59.31           -81.0%      11.29 ±  2%  perf-stat.i.MPKI
 7.435e+09 ±  3%    +194.0%  2.186e+10        perf-stat.i.branch-instructions
      0.73            +0.3        1.00        perf-stat.i.branch-miss-rate%
  54199426 ±  3%    +291.1%   2.12e+08        perf-stat.i.branch-misses
     60.28           -12.6       47.65 ±  2%  perf-stat.i.cache-miss-rate%
 2.227e+09 ±  2%     -44.8%  1.228e+09 ±  2%  perf-stat.i.cache-misses
 3.682e+09 ±  2%     -30.3%  2.566e+09        perf-stat.i.cache-references
     68267 ±  5%   +9653.1%    6658139        perf-stat.i.context-switches
      8.72 ±  2%     -65.8%       2.98        perf-stat.i.cpi
    684.12 ±  9%     +84.7%       1263 ±  7%  perf-stat.i.cpu-migrations
    151.44           +80.1%     272.70 ±  2%  perf-stat.i.cycles-between-cache-misses
  3.76e+10 ±  2%    +195.0%  1.109e+11        perf-stat.i.instructions
      0.12 ±  2%    +184.1%       0.35        perf-stat.i.ipc
      0.04 ± 36%     -86.1%       0.01 ± 82%  perf-stat.i.major-faults
      0.07 ± 29%  +73057.1%      51.98        perf-stat.i.metric.K/sec
      3130            +3.6%       3244        perf-stat.i.minor-faults
      3130            +3.6%       3244        perf-stat.i.page-faults
     59.27           -81.3%      11.08 ±  2%  perf-stat.overall.MPKI
      0.73            +0.2        0.97        perf-stat.overall.branch-miss-rate%
     60.49           -12.6       47.88 ±  2%  perf-stat.overall.cache-miss-rate%
      8.72 ±  2%     -66.3%       2.94        perf-stat.overall.cpi
    147.03 ±  2%     +80.5%     265.35 ±  2%  perf-stat.overall.cycles-between-cache-misses
      0.11 ±  2%    +196.4%       0.34        perf-stat.overall.ipc
      6473          +110.0%      13594        perf-stat.overall.path-length
 7.398e+09 ±  2%    +194.3%  2.177e+10        perf-stat.ps.branch-instructions
  53832186 ±  3%    +292.2%  2.111e+08        perf-stat.ps.branch-misses
 2.217e+09 ±  2%     -44.8%  1.224e+09 ±  2%  perf-stat.ps.cache-misses
 3.665e+09           -30.3%  2.556e+09        perf-stat.ps.cache-references
     67640 ±  4%   +9705.3%    6632391        perf-stat.ps.context-switches
    679.53 ± 10%     +84.9%       1256 ±  7%  perf-stat.ps.cpu-migrations
 3.741e+10 ±  2%    +195.4%  1.105e+11        perf-stat.ps.instructions
      0.04 ± 36%     -86.0%       0.01 ± 82%  perf-stat.ps.major-faults
      3080            +4.6%       3223        perf-stat.ps.minor-faults
      3080            +4.6%       3223        perf-stat.ps.page-faults
 1.141e+13 ±  2%    +193.8%  3.353e+13        perf-stat.total.instructions
      4.32 ± 95%    -100.0%       0.00 ±223%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      4.98 ±  6%     -92.3%       0.38 ±116%  perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
      2.74 ± 28%     -59.9%       1.10 ± 21%  perf-sched.sch_delay.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
      0.59 ±  3%   +5402.4%      32.50 ± 48%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.79 ± 60%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      1.36 ± 38%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
      3.70 ±  6%     -71.5%       1.05 ± 20%  perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
      1.57 ± 54%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      7.59 ±  5%     -96.3%       0.28 ±129%  perf-sched.sch_delay.avg.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
      3.45 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
      7.85 ± 13%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      1.80 ± 93%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.__mmap_region.do_mmap
      1.04 ± 16%     -85.5%       0.15 ±216%  perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      2.26 ± 53%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
      3.58 ± 26%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
      3.26 ± 37%     -99.7%       0.01 ±223%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      4.38 ± 10%     -88.4%       0.51 ±223%  perf-sched.sch_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      1.51 ± 11%     -98.4%       0.02 ±145%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.08 ± 23%     -91.8%       0.01 ±143%  perf-sched.sch_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      2.61 ± 64%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      1.98 ± 31%     -98.8%       0.02 ±223%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      5.98 ± 81%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      3.48 ± 67%     -97.9%       0.07 ±195%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
      0.27 ± 10%   +6194.5%      16.97 ±193%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1.25 ± 14%     -97.9%       0.03 ±138%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.06 ± 12%     -76.3%       0.01 ± 54%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      0.11 ± 87%  +98576.8%     104.10 ± 72%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      3.68 ±  2%   +1523.9%      59.75 ± 36%  perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
      7.44 ±  3%     -92.5%       0.55 ± 21%  perf-sched.sch_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      5.16 ±  7%     -76.7%       1.20 ± 23%  perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1.47 ± 18%     -89.8%       0.15 ±196%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      6.65 ± 57%    -100.0%       0.00 ±223%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
     48.97 ± 17%     -87.5%       6.12 ±176%  perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
     29.59 ± 49%  +22847.2%       6789 ± 21%  perf-sched.sch_delay.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
     12.47 ± 31%   +8381.5%       1057        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      1.85 ± 63%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      1.89 ± 34%     -86.5%       0.26 ±105%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      2.91 ± 45%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.dput.step_into.link_path_walk.part
     44.99 ± 27%  +14850.3%       6726 ± 27%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
     14.01 ± 12%     -29.2%       9.92 ± 46%  perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.tcp_stream_alloc_skb
      2.25 ± 44%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
     20.58 ±  6%     -93.4%       1.35 ±137%  perf-sched.sch_delay.max.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
     19.88 ± 29%  +10263.8%       2060 ± 71%  perf-sched.sch_delay.max.ms.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
      4.85 ± 54%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
     11.41 ± 20%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      2.78 ± 71%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.__mmap_region.do_mmap
      4.15 ± 16%     -92.8%       0.30 ±218%  perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
      4.57 ± 32%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
      6.25 ± 35%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
      6.35 ± 20%     -99.9%       0.01 ±223%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      7.68 ± 18%     -93.4%       0.51 ±223%  perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      7.37 ± 18%     -98.1%       0.14 ±136%  perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.19 ± 37%     -95.6%       0.01 ±141%  perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      4.31 ± 65%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      6.15 ± 49%     -99.6%       0.02 ±223%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      8.42 ± 62%    -100.0%       0.00        perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      8.41 ± 66%     -99.1%       0.07 ±192%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
     10.83 ± 36%     -87.0%       1.41 ±149%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      8.83 ± 15%     -99.1%       0.08 ±128%  perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      0.16 ± 21%     -87.1%       0.02 ± 61%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
     33.75 ±193%   +4375.3%       1510 ± 74%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    129.45 ±106%   +5168.2%       6819 ± 26%  perf-sched.sch_delay.max.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
     61.49 ± 28%  +11976.0%       7425 ± 20%  perf-sched.sch_delay.max.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
    180.09 ±145%   +4079.9%       7527 ± 26%  perf-sched.sch_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      5.14 ± 21%     -96.7%       0.17 ±169%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      4.61 ±  2%     -79.7%       0.93 ± 19%  perf-sched.total_sch_delay.average.ms
    201.76 ±126%   +3741.3%       7750 ± 22%  perf-sched.total_sch_delay.max.ms
     17.32 ±  2%     -82.6%       3.01 ± 19%  perf-sched.total_wait_and_delay.average.ms
    298061 ±  3%    +450.2%    1640034 ± 23%  perf-sched.total_wait_and_delay.count.ms
      2725 ± 27%    +468.8%      15500 ± 22%  perf-sched.total_wait_and_delay.max.ms
     12.71 ±  3%     -83.6%       2.08 ± 19%  perf-sched.total_wait_time.average.ms
      2725 ± 27%    +198.3%       8130 ± 19%  perf-sched.total_wait_time.max.ms
      8.84 ±  7%     -76.2%       2.11 ± 20%  perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
    359.01 ±198%     -99.1%       3.30 ±223%  perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
    370.11 ± 37%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    229.06 ±  4%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    422.68 ± 21%     -92.2%      32.86 ±174%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     11.18 ±  2%    +969.4%     119.55 ± 36%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
     15.98 ±  4%     -86.3%       2.19 ± 19%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
     10.75 ±  7%     -77.4%       2.42 ± 23%  perf-sched.wait_and_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      9037 ± 14%     -99.9%       4.67 ±223%  perf-sched.wait_and_delay.count.__cond_resched.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
      5102 ± 29%   +3851.2%     201595 ± 23%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
     11.17 ± 71%     -98.5%       0.17 ±223%  perf-sched.wait_and_delay.count.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
      1.00 ±141%    +816.7%       9.17 ± 26%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      5.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     18.33 ±  4%    -100.0%       0.00        perf-sched.wait_and_delay.count.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    649.83 ± 20%     -96.9%      19.83 ± 85%  perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
     28.83 ± 11%     -78.6%       6.17 ± 49%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range.do_poll.constprop.0.do_sys_poll
     19.67 ±  2%     -89.8%       2.00 ± 76%  perf-sched.wait_and_delay.count.schedule_timeout.kcompactd.kthread.ret_from_fork
    122545           -98.2%       2193 ± 30%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
     37034 ± 21%   +2124.3%     823774 ± 23%  perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      7013 ± 11%     -41.7%       4087 ± 22%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    109813 ±  2%    +271.1%     407480 ± 23%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    732.33           -89.7%      75.50 ± 30%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     62.30 ± 24%  +21492.3%      13452 ± 27%  perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
    521.50 ±141%     -99.4%       3.30 ±223%  perf-sched.wait_and_delay.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
      1007          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    915.75 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
    228.64 ±127%   +5865.5%      13639 ± 26%  perf-sched.wait_and_delay.max.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
     97.53 ± 50%  +15128.8%      14851 ± 20%  perf-sched.wait_and_delay.max.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      1840 ± 10%    +340.7%       8112 ± 19%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1098 ± 17%   +1271.0%      15055 ± 26%  perf-sched.wait_and_delay.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      2293 ± 25%    +218.7%       7309 ± 27%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      4.06 ±105%    -100.0%       0.00 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
      3.65 ± 28%     -69.8%       1.10 ± 21%  perf-sched.wait_time.avg.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
      5.89 ±  6%    +605.9%      41.54 ± 24%  perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      7.37 ± 57%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      0.32 ± 71%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      0.99 ± 53%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.dput.step_into.link_path_walk.part
      5.14 ±  8%     -79.5%       1.06 ± 20%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
      5.59 ±  9%     -64.1%       2.01 ± 80%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.tcp_stream_alloc_skb
      1.48 ± 61%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
      7.85 ±  5%     -88.2%       0.93 ±141%  perf-sched.wait_time.avg.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
      3.45 ± 55%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
      8.04 ± 13%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      1.69 ±103%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.__mmap_region.do_mmap
    350.00 ±205%     -99.5%       1.65 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
      1.93 ± 77%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
      3.56 ± 26%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
    366.85 ± 37%    -100.0%       0.01 ±223%  perf-sched.wait_time.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     59.19 ±133%     -99.1%       0.51 ±223%  perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      4.25 ±  9%     -74.7%       1.08 ± 81%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    228.98 ±  4%    -100.0%       0.07 ±141%  perf-sched.wait_time.avg.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.78 ± 35%     -98.6%       0.02 ±223%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      5.98 ± 81%    -100.0%       0.00        perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      4.51 ± 66%     -98.2%       0.08 ±175%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
    422.15 ± 21%     -95.0%      21.25 ±214%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      7.55 ±  7%   +1856.9%     147.84 ± 40%  perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      7.50 ±  3%    +697.4%      59.80 ± 36%  perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
      8.53 ±  5%     -80.8%       1.64 ± 19%  perf-sched.wait_time.avg.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      5.59 ±  6%     -78.2%       1.22 ± 23%  perf-sched.wait_time.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      1.06 ± 28%     -98.6%       0.02 ±138%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
    130.58 ±116%     -88.6%      14.92 ± 24%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
      6.44 ± 63%    -100.0%       0.00 ±223%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_cache_noprof.perf_event_mmap_event.perf_event_mmap.__mmap_region
     26.97 ± 35%  +25069.5%       6789 ± 21%  perf-sched.wait_time.max.ms.__cond_resched.__release_sock.release_sock.tcp_sendmsg.__sys_sendto
     10.50 ± 46%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.wait_for_completion_state.kernel_clone.__x64_sys_vfork
      1.68 ± 49%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.exec_mmap.begin_new_exec.load_elf_binary
      2.87 ± 47%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.dput.step_into.link_path_walk.part
    531.77 ± 89%     -96.1%      20.68 ± 37%  perf-sched.wait_time.max.ms.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
     31.98 ± 26%  +20930.2%       6726 ± 27%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.__alloc_skb.tcp_stream_alloc_skb.tcp_sendmsg_locked
     16.76 ± 28%     -40.7%       9.94 ± 46%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_node_noprof.kmalloc_reserve.__alloc_skb.tcp_stream_alloc_skb
      2.23 ± 45%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.vm_area_alloc.__mmap_new_vma.__mmap_region
     20.58 ±  6%     -77.6%       4.62 ±138%  perf-sched.wait_time.max.ms.__cond_resched.lock_sock_nested.tcp_recvmsg.inet_recvmsg.sock_recvmsg
     20.91 ± 16%   +9755.3%       2060 ± 71%  perf-sched.wait_time.max.ms.__cond_resched.lock_sock_nested.tcp_sendmsg.__sys_sendto.__x64_sys_sendto
      4.85 ± 54%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.mmput.m_stop.seq_read_iter.seq_read
     12.63 ± 32%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_poll.do_poll.constprop
      2.78 ± 71%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.remove_vma.vms_complete_munmap_vmas.__mmap_region.do_mmap
    513.55 ±144%     -99.7%       1.65 ±223%  perf-sched.wait_time.max.ms.__cond_resched.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter
      3.68 ± 64%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.vms_clear_ptes.part.0
      6.25 ± 35%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
      1001          -100.0%       0.01 ±223%  perf-sched.wait_time.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    339.35 ±138%     -99.9%       0.51 ±223%  perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
     21.71 ± 47%     -87.3%       2.75 ± 88%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    915.66 ±  9%    -100.0%       0.14 ±141%  perf-sched.wait_time.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      6.15 ± 49%     -99.6%       0.02 ±223%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      8.42 ± 62%    -100.0%       0.00        perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown]
      9.31 ± 65%     -99.1%       0.08 ±173%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_call_function_single.[unknown].[unknown]
     11.32 ± 34%     -87.5%       1.41 ±149%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
    283.33 ± 21%    +484.6%       1656 ± 66%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    121.31 ±117%   +5522.0%       6819 ± 26%  perf-sched.wait_time.max.ms.schedule_timeout.wait_woken.sk_stream_wait_memory.tcp_sendmsg_locked
     54.73 ± 45%  +14104.3%       7773 ± 22%  perf-sched.wait_time.max.ms.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked
      1840 ± 10%    +337.4%       8051 ± 18%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1008          +646.1%       7527 ± 26%  perf-sched.wait_time.max.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
      5.09 ± 22%     -99.3%       0.04 ±129%  perf-sched.wait_time.max.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
      2293 ± 25%    +218.7%       7309 ± 27%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm



***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/fallocate1/will-it-scale

commit: 
  8c57b687e8 ("mm, bpf: Introduce free_pages_nolock()")
  01d37228d3 ("memcg: Use trylock to access memcg stock_lock.")

8c57b687e8331eb8 01d37228d331047a0bbbd1026ce 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    125406           -10.4%     112308        meminfo.KReclaimable
    125406           -10.4%     112308        meminfo.SReclaimable
      0.03            -0.0        0.02 ± 10%  mpstat.cpu.all.soft%
      2.85 ±  3%      -1.6        1.27 ± 26%  mpstat.cpu.all.usr%
      3597           -42.0%       2086 ±  5%  vmstat.system.cs
    119319            -1.1%     117948        vmstat.system.in
   4530881 ±  3%     -68.8%    1411930 ± 12%  will-it-scale.104.threads
     43565 ±  3%     -68.8%      13575 ± 12%  will-it-scale.per_thread_ops
   4530881 ±  3%     -68.8%    1411930 ± 12%  will-it-scale.workload
 1.375e+09 ±  3%     -64.9%  4.824e+08 ± 23%  numa-numastat.node0.local_node
 1.376e+09 ±  3%     -64.9%  4.826e+08 ± 23%  numa-numastat.node0.numa_hit
 1.356e+09 ±  3%     -72.8%   3.69e+08 ± 19%  numa-numastat.node1.local_node
 1.357e+09 ±  3%     -72.8%  3.691e+08 ± 19%  numa-numastat.node1.numa_hit
 1.376e+09 ±  3%     -64.9%  4.824e+08 ± 23%  numa-vmstat.node0.numa_hit
 1.375e+09 ±  3%     -64.9%  4.823e+08 ± 23%  numa-vmstat.node0.numa_local
 1.357e+09 ±  3%     -72.8%   3.69e+08 ± 19%  numa-vmstat.node1.numa_hit
 1.356e+09 ±  3%     -72.8%  3.689e+08 ± 19%  numa-vmstat.node1.numa_local
      3542 ± 89%  +1.7e+05%    6202792 ±222%  sched_debug.cfs_rq:/.runnable_avg.max
      6301           -34.5%       4126 ±  3%  sched_debug.cpu.nr_switches.avg
      4167 ±  2%     -62.6%       1558 ± 14%  sched_debug.cpu.nr_switches.min
      2174 ±  7%     +10.9%       2410 ±  6%  sched_debug.cpu.nr_switches.stddev
    136.67 ± 26%    +803.4%       1234 ± 10%  perf-c2c.DRAM.local
    366.17 ± 21%   +1413.7%       5542 ± 10%  perf-c2c.DRAM.remote
      6364 ±  6%    +208.8%      19652 ±  2%  perf-c2c.HITM.local
    154.83 ±  6%    +834.1%       1446 ±  5%  perf-c2c.HITM.remote
      6519 ±  6%    +223.7%      21099 ±  2%  perf-c2c.HITM.total
    300086            -2.3%     293262        proc-vmstat.nr_active_anon
    129160            -5.6%     121896        proc-vmstat.nr_shmem
     31344           -10.4%      28076        proc-vmstat.nr_slab_reclaimable
    300085            -2.3%     293262        proc-vmstat.nr_zone_active_anon
 2.733e+09 ±  3%     -68.8%  8.517e+08 ± 12%  proc-vmstat.numa_hit
 2.732e+09 ±  3%     -68.8%  8.514e+08 ± 12%  proc-vmstat.numa_local
  2.73e+09 ±  3%     -68.8%  8.514e+08 ± 12%  proc-vmstat.pgalloc_normal
  2.73e+09 ±  3%     -68.8%  8.513e+08 ± 12%  proc-vmstat.pgfree
      0.18 ± 39%   +3637.9%       6.73 ± 23%  perf-stat.i.MPKI
 1.074e+10 ±  3%     -57.2%  4.591e+09 ±  5%  perf-stat.i.branch-instructions
  66071729 ±  3%     -55.2%   29625865 ±  8%  perf-stat.i.branch-misses
     11.40 ± 27%     +21.7       33.11 ±  7%  perf-stat.i.cache-miss-rate%
   9285062 ± 35%   +1457.0%  1.446e+08 ± 27%  perf-stat.i.cache-misses
  79584428 ±  8%    +454.4%  4.412e+08 ± 30%  perf-stat.i.cache-references
      3563           -42.6%       2043 ±  5%  perf-stat.i.context-switches
      5.51 ±  3%    +146.8%      13.60 ±  6%  perf-stat.i.cpi
    148.01            -6.4%     138.61        perf-stat.i.cpu-migrations
     35747 ± 29%     -94.1%       2107 ± 18%  perf-stat.i.cycles-between-cache-misses
 5.253e+10 ±  3%     -59.2%  2.144e+10 ±  6%  perf-stat.i.instructions
      0.18 ±  3%     -58.7%       0.08 ±  6%  perf-stat.i.ipc
      0.18 ± 39%   +3650.7%       6.72 ± 23%  perf-stat.overall.MPKI
      0.62            +0.0        0.64 ±  4%  perf-stat.overall.branch-miss-rate%
     11.42 ± 26%     +21.7       33.11 ±  7%  perf-stat.overall.cache-miss-rate%
      5.52 ±  3%    +145.9%      13.57 ±  5%  perf-stat.overall.cpi
     34676 ± 28%     -93.9%       2109 ± 18%  perf-stat.overall.cycles-between-cache-misses
      0.18 ±  3%     -59.2%       0.07 ±  6%  perf-stat.overall.ipc
   3496124           +31.6%    4602126 ±  6%  perf-stat.overall.path-length
  1.07e+10 ±  3%     -57.3%  4.571e+09 ±  5%  perf-stat.ps.branch-instructions
  65846459 ±  3%     -55.3%   29454957 ±  8%  perf-stat.ps.branch-misses
   9255594 ± 35%   +1455.7%   1.44e+08 ± 27%  perf-stat.ps.cache-misses
  79350913 ±  8%    +453.9%  4.395e+08 ± 30%  perf-stat.ps.cache-references
      3551           -42.6%       2038 ±  5%  perf-stat.ps.context-switches
    147.51            -6.6%     137.73        perf-stat.ps.cpu-migrations
 5.236e+10 ±  3%     -59.2%  2.135e+10 ±  6%  perf-stat.ps.instructions
      2724            -1.8%       2676        perf-stat.ps.minor-faults
      2724            -1.8%       2676        perf-stat.ps.page-faults
 1.584e+13 ±  3%     -59.3%   6.45e+12 ±  6%  perf-stat.total.instructions
      0.07 ±164%    +606.2%       0.52 ± 34%  perf-sched.sch_delay.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
      0.83 ± 11%    +182.4%       2.35 ±  6%  perf-sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
      0.06 ±179%   +2166.3%       1.29 ± 48%  perf-sched.sch_delay.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.part
      0.19 ± 20%    +154.9%       0.49 ± 44%  perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.02 ±223%   +8165.9%       1.25 ±138%  perf-sched.sch_delay.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.part
      0.47 ±  9%    +111.3%       0.99 ± 10%  perf-sched.sch_delay.avg.ms.__cond_resched.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64
      0.50 ±  6%     +92.8%       0.97 ±  8%  perf-sched.sch_delay.avg.ms.__cond_resched.shmem_undo_range.shmem_setattr.notify_change.do_truncate
      0.12 ± 13%    +189.9%       0.33 ± 27%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.19 ±104%    +505.3%       1.13 ± 47%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.28 ± 68%    +249.6%       0.97 ± 27%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      0.19 ± 52%    +355.0%       0.86 ± 52%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      0.05 ± 57%    +557.6%       0.30 ±107%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.16 ± 14%     +80.9%       0.30 ±  7%  perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     17.11 ±108%     -99.9%       0.02 ±223%  perf-sched.sch_delay.avg.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread
      0.02 ± 14%    +109.3%       0.04 ± 15%  perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      6.68 ± 62%     -49.1%       3.40 ± 41%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
      0.12 ±179%    +518.2%       0.76 ± 43%  perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
      3.87 ±  7%     +69.2%       6.54 ± 47%  perf-sched.sch_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
      0.06 ±169%   +3507.7%       2.19 ± 49%  perf-sched.sch_delay.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.part
      0.02 ±223%   +9304.4%       1.43 ±119%  perf-sched.sch_delay.max.ms.__cond_resched.down_read.walk_component.link_path_walk.part
      3.19 ± 48%     -78.0%       0.70 ±158%  perf-sched.sch_delay.max.ms.__cond_resched.down_write.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate
      0.26 ±132%    +953.7%       2.71 ± 40%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.61 ± 64%    +559.5%       4.03 ± 35%  perf-sched.sch_delay.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      0.18 ± 48%   +1582.7%       3.00 ±133%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
     17.11 ±108%     -99.9%       0.02 ±223%  perf-sched.sch_delay.max.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread
      0.35 ±  6%     +87.0%       0.66 ±  7%  perf-sched.total_sch_delay.average.ms
     75.94 ±  3%     +41.9%     107.78 ±  6%  perf-sched.total_wait_and_delay.average.ms
     20535 ±  4%     -41.7%      11978 ±  9%  perf-sched.total_wait_and_delay.count.ms
     75.58 ±  3%     +41.7%     107.13 ±  6%  perf-sched.total_wait_time.average.ms
      0.94 ±  9%    +111.4%       1.98 ± 10%  perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64
      1.00 ±  6%     +92.7%       1.93 ±  8%  perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_undo_range.shmem_setattr.notify_change.do_truncate
     87.71 ± 50%    +365.7%     408.48 ± 14%  perf-sched.wait_and_delay.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.97 ± 30%     -76.5%       0.23 ±141%  perf-sched.wait_and_delay.avg.ms.__cond_resched.vfs_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe
      3.03 ±147%    +782.6%      26.71 ± 21%  perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      2.05 ± 17%     +94.1%       3.98 ± 15%  perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
    175.25 ±  3%    +199.2%     524.30 ± 11%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3879 ±  5%     -47.8%       2024 ±  7%  perf-sched.wait_and_delay.count.__cond_resched.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64
    328.67 ± 16%    +181.5%     925.17 ± 29%  perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate
      4765 ±  3%     -52.5%       2262 ±  3%  perf-sched.wait_and_delay.count.__cond_resched.shmem_undo_range.shmem_setattr.notify_change.do_truncate
    154.83 ± 13%     -87.0%      20.17 ±141%  perf-sched.wait_and_delay.count.__cond_resched.vfs_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe
    551.67 ± 10%     -29.9%     386.83 ± 11%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
      5573 ±  6%     -75.9%       1341 ± 22%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    297.16 ± 49%    +551.6%       1936 ± 28%  perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     34.48 ±114%     -91.8%       2.84 ±141%  perf-sched.wait_and_delay.max.ms.__cond_resched.vfs_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe
    370.78 ±148%    +321.1%       1561 ± 48%  perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      1077 ±  3%    +230.9%       3566 ± 10%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.07 ±162%    +641.0%       0.50 ± 33%  perf-sched.wait_time.avg.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
      0.77 ± 11%    +569.6%       5.18 ±125%  perf-sched.wait_time.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
      0.05 ±190%   +2300.0%       1.29 ± 48%  perf-sched.wait_time.avg.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.part
      0.02 ±223%   +7438.5%       1.14 ±156%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.walk_component.link_path_walk.part
      0.47 ±  9%    +111.3%       0.99 ± 10%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64
      0.50 ±  6%     +92.8%       0.97 ±  8%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_undo_range.shmem_setattr.notify_change.do_truncate
     99.70 ± 24%    +309.7%     408.42 ± 14%  perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      3.03 ±147%    +782.6%      26.71 ± 21%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      1.93 ± 18%     +88.5%       3.64 ± 14%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.13 ± 82%    +634.0%       0.93 ± 68%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.16 ±129%    +455.8%       0.88 ± 31%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
      0.19 ± 52%    +355.0%       0.86 ± 52%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
      4.61 ± 13%     +90.0%       8.76 ± 23%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
     17.15 ±108%     -99.9%       0.02 ±223%  perf-sched.wait_time.avg.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread
    175.21 ±  3%    +199.2%     524.18 ± 11%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      6.68 ± 62%     -49.1%       3.40 ± 41%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.shmem_alloc_folio
      0.12 ±179%    +518.2%       0.76 ± 43%  perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_mpol_noprof.vma_alloc_folio_noprof
      3.87 ±  7%   +4323.6%     171.00 ±216%  perf-sched.wait_time.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.exit_mmap.__mmput
      0.06 ±169%   +3507.7%       2.19 ± 49%  perf-sched.wait_time.max.ms.__cond_resched.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.part
      0.02 ±223%   +8481.3%       1.30 ±134%  perf-sched.wait_time.max.ms.__cond_resched.down_read.walk_component.link_path_walk.part
      3.19 ± 48%     -78.0%       0.70 ±158%  perf-sched.wait_time.max.ms.__cond_resched.down_write.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate
    329.79 ± 26%    +487.1%       1936 ± 28%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    370.78 ±148%    +321.1%       1561 ± 48%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
      0.19 ±108%   +1175.5%       2.48 ± 51%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown]
      0.36 ±121%   +1007.0%       4.03 ± 35%  perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_exc_page_fault.[unknown].[unknown]
     17.15 ±108%     -99.9%       0.02 ±223%  perf-sched.wait_time.max.ms.schedule_timeout.khugepaged_wait_work.khugepaged.kthread
      5.22           +11.9%       5.85 ±  4%  perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      1077 ±  3%    +230.9%       3566 ± 10%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     50.56            -6.0       44.54 ± 10%  perf-profile.calltrace.cycles-pp.ftruncate64
     50.54            -6.0       44.53 ± 10%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64
     50.52            -6.0       44.53 ± 10%  perf-profile.calltrace.cycles-pp.do_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
     50.52            -6.0       44.53 ± 10%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
     50.52            -6.0       44.52 ± 10%  perf-profile.calltrace.cycles-pp.do_ftruncate.do_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
     50.52            -6.0       44.52 ± 10%  perf-profile.calltrace.cycles-pp.do_truncate.do_ftruncate.do_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe
     50.52            -6.0       44.52 ± 10%  perf-profile.calltrace.cycles-pp.notify_change.do_truncate.do_ftruncate.do_sys_ftruncate.do_syscall_64
     50.51            -6.0       44.52 ± 10%  perf-profile.calltrace.cycles-pp.shmem_setattr.notify_change.do_truncate.do_ftruncate.do_sys_ftruncate
     50.48            -6.0       44.51 ± 10%  perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_setattr.notify_change.do_truncate.do_ftruncate
     41.58            -5.3       36.26 ± 11%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.shmem_undo_range
     41.59            -5.3       36.26 ± 11%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.shmem_undo_range.shmem_setattr
     41.56            -5.3       36.25 ± 12%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
     43.12            -5.2       37.90 ± 11%  perf-profile.calltrace.cycles-pp.folios_put_refs.shmem_undo_range.shmem_setattr.notify_change.do_truncate
     42.26            -5.2       37.05 ± 11%  perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.shmem_undo_range.shmem_setattr.notify_change
      4.64 ±  2%      -0.5        4.13 ±  9%  perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.__folio_batch_release.shmem_undo_range.shmem_setattr.notify_change
      4.65 ±  2%      -0.5        4.13 ±  9%  perf-profile.calltrace.cycles-pp.__folio_batch_release.shmem_undo_range.shmem_setattr.notify_change.do_truncate
      4.64 ±  2%      -0.5        4.13 ±  9%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.__folio_batch_release.shmem_undo_range.shmem_setattr
      4.58 ±  2%      -0.5        4.08 ±  9%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.__folio_batch_release
      4.58 ±  2%      -0.5        4.08 ±  9%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.__folio_batch_release.shmem_undo_range
      4.57 ±  2%      -0.5        4.08 ±  9%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
      0.63 ±  3%      +0.2        0.82 ±  7%  perf-profile.calltrace.cycles-pp.lru_gen_add_folio.lru_add.folio_batch_move_lru.__folio_batch_add_and_move.shmem_alloc_and_add_folio
      0.00            +0.6        0.63 ± 19%  perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate
      1.47 ±  2%      +0.8        2.23 ± 15%  perf-profile.calltrace.cycles-pp.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate
      0.00            +0.8        0.82 ± 32%  perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_try_charge.try_charge_memcg.charge_memcg.__mem_cgroup_charge
      0.00            +0.9        0.92 ±  9%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__lruvec_stat_mod_folio.shmem_update_stats.shmem_add_to_page_cache.shmem_alloc_and_add_folio
      0.66 ± 10%      +1.0        1.69 ± 14%  perf-profile.calltrace.cycles-pp.filemap_unaccount_folio.__filemap_remove_folio.filemap_remove_folio.truncate_inode_folio.shmem_undo_range
      0.59 ± 12%      +1.1        1.68 ± 14%  perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.filemap_unaccount_folio.__filemap_remove_folio.filemap_remove_folio.truncate_inode_folio
      0.09 ±223%      +1.2        1.28 ±  7%  perf-profile.calltrace.cycles-pp.shmem_update_stats.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate
      0.08 ±223%      +1.2        1.31 ± 18%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__lruvec_stat_mod_folio.filemap_unaccount_folio.__filemap_remove_folio.filemap_remove_folio
      0.00            +1.3        1.27 ±  7%  perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.shmem_update_stats.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp
     49.10            +5.9       55.01 ±  8%  perf-profile.calltrace.cycles-pp.fallocate64
     46.15            +7.7       53.86 ±  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fallocate64
     45.49            +8.1       53.55 ±  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64
      0.00            +8.1        8.09 ± 56%  perf-profile.calltrace.cycles-pp.page_counter_try_charge.try_charge_memcg.charge_memcg.__mem_cgroup_charge.shmem_alloc_and_add_folio
     45.32            +8.2       53.49 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64
     45.09            +8.3       53.41 ±  7%  perf-profile.calltrace.cycles-pp.vfs_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64
     44.92            +8.4       53.36 ±  7%  perf-profile.calltrace.cycles-pp.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe
     43.94            +9.0       52.96 ±  7%  perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64
     43.39            +9.4       52.79 ±  7%  perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate
      0.00           +11.7       11.68 ± 57%  perf-profile.calltrace.cycles-pp.try_charge_memcg.charge_memcg.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      0.00           +12.7       12.68 ± 50%  perf-profile.calltrace.cycles-pp.charge_memcg.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate
      0.34 ±103%     +14.2       14.51 ± 39%  perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate
     83.55            -9.1       74.42 ±  9%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     83.55            -9.1       74.44 ±  9%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     83.52            -9.0       74.51 ±  9%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     50.56            -6.0       44.54 ± 10%  perf-profile.children.cycles-pp.ftruncate64
     50.52            -6.0       44.52 ± 10%  perf-profile.children.cycles-pp.do_ftruncate
     50.52            -6.0       44.53 ± 10%  perf-profile.children.cycles-pp.do_sys_ftruncate
     50.52            -6.0       44.52 ± 10%  perf-profile.children.cycles-pp.do_truncate
     50.52            -6.0       44.52 ± 10%  perf-profile.children.cycles-pp.notify_change
     50.51            -6.0       44.52 ± 10%  perf-profile.children.cycles-pp.shmem_setattr
     50.50            -6.0       44.52 ± 10%  perf-profile.children.cycles-pp.shmem_undo_range
     43.26            -5.2       38.02 ± 11%  perf-profile.children.cycles-pp.folios_put_refs
     42.30            -5.2       37.12 ± 11%  perf-profile.children.cycles-pp.__page_cache_release
      1.22 ±  6%      -0.8        0.39 ± 18%  perf-profile.children.cycles-pp.shmem_inode_acct_blocks
      1.16 ±  5%      -0.8        0.38 ± 16%  perf-profile.children.cycles-pp.shmem_alloc_folio
      1.20 ±  6%      -0.8        0.44 ± 23%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.22 ±  5%      -0.7        0.54 ± 34%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.99 ±  6%      -0.7        0.32 ± 15%  perf-profile.children.cycles-pp.folio_alloc_mpol_noprof
      0.96 ±  6%      -0.6        0.32 ± 16%  perf-profile.children.cycles-pp.alloc_pages_mpol
      0.80 ±  4%      -0.5        0.27 ± 19%  perf-profile.children.cycles-pp.xas_store
      0.78 ±  6%      -0.5        0.26 ± 18%  perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
      4.65 ±  2%      -0.5        4.13 ±  9%  perf-profile.children.cycles-pp.__folio_batch_release
      4.64 ±  2%      -0.5        4.13 ±  9%  perf-profile.children.cycles-pp.lru_add_drain_cpu
      0.54 ±  6%      -0.4        0.16 ± 10%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      0.50 ±  6%      -0.3        0.16 ± 13%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.48 ±  5%      -0.3        0.17 ± 18%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.36            -0.3        0.11 ± 16%  perf-profile.children.cycles-pp.free_unref_folios
      0.32 ±  4%      -0.2        0.10 ± 15%  perf-profile.children.cycles-pp.xas_load
      0.38 ±  2%      -0.2        0.16 ± 28%  perf-profile.children.cycles-pp.find_lock_entries
      0.31 ±  6%      -0.2        0.11 ± 20%  perf-profile.children.cycles-pp.rmqueue
      0.24 ±  3%      -0.2        0.07 ± 16%  perf-profile.children.cycles-pp.xas_clear_mark
      0.23 ±  4%      -0.2        0.07 ± 16%  perf-profile.children.cycles-pp.filemap_get_entry
      0.21 ±  4%      -0.1        0.07 ± 14%  perf-profile.children.cycles-pp.xas_init_marks
      0.16 ±  4%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.19 ±  7%      -0.1        0.06 ± 14%  perf-profile.children.cycles-pp.__cond_resched
      0.20 ±  4%      -0.1        0.08 ± 41%  perf-profile.children.cycles-pp.__dquot_alloc_space
      0.18 ±  3%      -0.1        0.06 ± 23%  perf-profile.children.cycles-pp.__folio_cancel_dirty
      0.15 ±  7%      -0.1        0.04 ± 45%  perf-profile.children.cycles-pp.fdget
      0.16 ±  8%      -0.1        0.06 ± 13%  perf-profile.children.cycles-pp.file_modified
      0.15 ±  6%      -0.1        0.05 ±111%  perf-profile.children.cycles-pp.noop_dirty_folio
      0.17 ±  4%      -0.1        0.08 ±  4%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.04 ± 44%      +0.1        0.14 ± 16%  perf-profile.children.cycles-pp.handle_internal_command
      0.04 ± 44%      +0.1        0.14 ± 16%  perf-profile.children.cycles-pp.main
      0.04 ± 44%      +0.1        0.14 ± 16%  perf-profile.children.cycles-pp.run_builtin
      0.08 ±  5%      +0.1        0.20 ± 62%  perf-profile.children.cycles-pp.page_counter_cancel
      0.00            +0.1        0.12 ± 20%  perf-profile.children.cycles-pp.shmem_write_begin
      0.00            +0.1        0.13 ± 19%  perf-profile.children.cycles-pp.generic_perform_write
      0.00            +0.1        0.13 ± 18%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.00            +0.1        0.13 ± 18%  perf-profile.children.cycles-pp.vfs_write
      0.01 ±223%      +0.1        0.14 ± 16%  perf-profile.children.cycles-pp.__cmd_record
      0.01 ±223%      +0.1        0.14 ± 16%  perf-profile.children.cycles-pp.cmd_record
      0.00            +0.1        0.13 ± 17%  perf-profile.children.cycles-pp.ksys_write
      0.00            +0.1        0.14 ± 15%  perf-profile.children.cycles-pp.record__pushfn
      0.00            +0.1        0.14 ± 15%  perf-profile.children.cycles-pp.writen
      0.00            +0.1        0.14 ± 18%  perf-profile.children.cycles-pp.perf_mmap__push
      0.00            +0.1        0.14 ± 18%  perf-profile.children.cycles-pp.record__mmap_read_evlist
      0.00            +0.1        0.14 ± 18%  perf-profile.children.cycles-pp.write
      0.09 ±  8%      +0.1        0.23 ± 55%  perf-profile.children.cycles-pp.page_counter_uncharge
      0.10 ±  9%      +0.2        0.26 ± 46%  perf-profile.children.cycles-pp.uncharge_batch
      0.66 ±  3%      +0.2        0.86 ±  7%  perf-profile.children.cycles-pp.lru_gen_add_folio
      0.23 ±  8%      +0.4        0.58 ± 19%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
      1.50 ±  2%      +0.7        2.24 ± 15%  perf-profile.children.cycles-pp.shmem_add_to_page_cache
      0.46 ±  6%      +0.8        1.28 ±  7%  perf-profile.children.cycles-pp.shmem_update_stats
      0.34 ± 36%      +0.8        1.18 ± 39%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
      0.00            +0.9        0.85 ± 32%  perf-profile.children.cycles-pp.propagate_protected_usage
      0.67 ± 10%      +1.0        1.69 ± 14%  perf-profile.children.cycles-pp.filemap_unaccount_folio
     96.78            +1.8       98.62        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     96.10            +2.2       98.31        perf-profile.children.cycles-pp.do_syscall_64
      1.31 ±  8%      +2.3        3.59 ± 11%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
      1.14 ± 15%      +2.8        3.97 ± 14%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
     49.17            +5.9       55.03 ±  8%  perf-profile.children.cycles-pp.fallocate64
      0.00            +8.1        8.14 ± 56%  perf-profile.children.cycles-pp.page_counter_try_charge
     45.32            +8.2       53.49 ±  7%  perf-profile.children.cycles-pp.__x64_sys_fallocate
     45.09            +8.3       53.41 ±  7%  perf-profile.children.cycles-pp.vfs_fallocate
     44.94            +8.4       53.36 ±  7%  perf-profile.children.cycles-pp.shmem_fallocate
     43.99            +9.1       53.08 ±  7%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
     43.48            +9.5       52.93 ±  7%  perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
      0.14 ±  6%     +11.6       11.74 ± 57%  perf-profile.children.cycles-pp.try_charge_memcg
      0.20 ±  9%     +12.5       12.72 ± 50%  perf-profile.children.cycles-pp.charge_memcg
      0.58 ± 24%     +14.0       14.54 ± 39%  perf-profile.children.cycles-pp.__mem_cgroup_charge
     83.52            -9.0       74.51 ±  9%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      1.19 ±  6%      -0.8        0.44 ± 23%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      1.21 ±  5%      -0.7        0.54 ± 35%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.68 ±  6%      -0.4        0.32 ± 38%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.46 ±  5%      -0.3        0.13 ± 16%  perf-profile.self.cycles-pp.xas_store
      0.44 ±  6%      -0.3        0.14 ± 13%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.40 ±  8%      -0.2        0.15 ± 47%  perf-profile.self.cycles-pp.lru_gen_add_folio
      0.35 ±  2%      -0.2        0.14 ± 34%  perf-profile.self.cycles-pp.lru_gen_del_folio
      0.31 ±  6%      -0.2        0.10 ± 22%  perf-profile.self.cycles-pp.shmem_fallocate
      0.28 ±  5%      -0.2        0.09 ± 13%  perf-profile.self.cycles-pp.security_vm_enough_memory_mm
      0.31 ±  4%      -0.2        0.14 ± 29%  perf-profile.self.cycles-pp.find_lock_entries
      0.30 ±  7%      -0.2        0.12 ± 63%  perf-profile.self.cycles-pp.shmem_add_to_page_cache
      0.24 ±  7%      -0.2        0.08 ± 12%  perf-profile.self.cycles-pp.__alloc_frozen_pages_noprof
      0.22 ±  5%      -0.2        0.07 ± 14%  perf-profile.self.cycles-pp.xas_load
      0.22 ±  3%      -0.2        0.07 ± 17%  perf-profile.self.cycles-pp.xas_clear_mark
      0.26 ±  9%      -0.1        0.11 ± 49%  perf-profile.self.cycles-pp.lru_add
      0.26            -0.1        0.11 ± 30%  perf-profile.self.cycles-pp.folios_put_refs
      0.19 ±  7%      -0.1        0.06 ± 13%  perf-profile.self.cycles-pp.shmem_alloc_and_add_folio
      0.17 ±  6%      -0.1        0.04 ± 71%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
      0.18 ±  2%      -0.1        0.05 ± 47%  perf-profile.self.cycles-pp.free_unref_folios
      0.18 ±  7%      -0.1        0.06 ± 16%  perf-profile.self.cycles-pp.shmem_inode_acct_blocks
      0.15 ±  7%      -0.1        0.04 ± 71%  perf-profile.self.cycles-pp.fdget
      0.15 ±  5%      -0.1        0.05 ± 47%  perf-profile.self.cycles-pp.fallocate64
      0.16 ±  3%      -0.1        0.06 ±  6%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.08 ±  5%      +0.1        0.20 ± 61%  perf-profile.self.cycles-pp.page_counter_cancel
      0.40 ±  3%      +0.5        0.87 ± 53%  perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
      0.04 ± 44%      +0.6        0.65 ± 60%  perf-profile.self.cycles-pp.__mem_cgroup_charge
      0.00            +0.8        0.84 ± 32%  perf-profile.self.cycles-pp.propagate_protected_usage
      0.33 ± 37%      +0.8        1.18 ± 39%  perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
      0.03 ±141%      +1.0        1.01 ± 44%  perf-profile.self.cycles-pp.charge_memcg
      0.98 ± 17%      +2.9        3.90 ± 15%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      0.10 ±  8%      +3.5        3.60 ± 60%  perf-profile.self.cycles-pp.try_charge_memcg
      0.00            +7.3        7.30 ± 59%  perf-profile.self.cycles-pp.page_counter_try_charge



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-next:master] [memcg] 01d37228d3: netperf.Throughput_Mbps 37.9% regression
  2025-03-10  5:50 [linux-next:master] [memcg] 01d37228d3: netperf.Throughput_Mbps 37.9% regression kernel test robot
@ 2025-03-10  9:55 ` Vlastimil Babka
  2025-03-10 10:18   ` Alexei Starovoitov
  0 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2025-03-10  9:55 UTC (permalink / raw)
  To: kernel test robot, Alexei Starovoitov
  Cc: oe-lkp, lkp, Michal Hocko, Shakeel Butt, cgroups, linux-mm

On 3/10/25 06:50, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a 37.9% regression of netperf.Throughput_Mbps on:

I assume this is some network receive context where gfpflags do not allow
blocking.
 
> commit: 01d37228d331047a0bbbd1026cec2ccabef6d88d ("memcg: Use trylock to access memcg stock_lock.")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> [test failed on linux-next/master 7ec162622e66a4ff886f8f28712ea1b13069e1aa]
> 
> testcase: netperf
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> parameters:
> 
> 	ip: ipv4
> 	runtime: 300s
> 	nr_threads: 50%
> 	cluster: cs-localhost
> 	test: TCP_MAERTS
> 	cpufreq_governor: performance
> 
> 
> In addition to that, the commit also has significant impact on the following tests:
> 
> +------------------+----------------------------------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.mmapfork.ops_per_sec  63.5% regression                                        |

Hm interesting, this one at least from the name would be a GFP_KERNEL context?

> | test machine     | 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 8592+ (Emerald Rapids) with 256G memory            |
> | test parameters  | cpufreq_governor=performance                                                                       |
> |                  | nr_threads=100%                                                                                    |
> |                  | test=mmapfork                                                                                      |
> |                  | testtime=60s                                                                                       |
> +------------------+----------------------------------------------------------------------------------------------------+
> | testcase: change | hackbench: hackbench.throughput  26.6% regression                                                  |
> | test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory         |
> | test parameters  | cpufreq_governor=performance                                                                       |
> |                  | ipc=socket                                                                                         |
> |                  | iterations=4                                                                                       |
> |                  | mode=threads                                                                                       |
> |                  | nr_threads=100%                                                                                    |
> +------------------+----------------------------------------------------------------------------------------------------+
> | testcase: change | lmbench3: lmbench3.TCP.socket.bandwidth.64B.MB/sec  33.0% regression                               |
> | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory        |
> | test parameters  | cpufreq_governor=performance                                                                       |
> |                  | mode=development                                                                                   |
> |                  | nr_threads=100%                                                                                    |
> |                  | test=TCP                                                                                           |
> |                  | test_memory_size=50%                                                                               |
> +------------------+----------------------------------------------------------------------------------------------------+
> | testcase: change | vm-scalability: vm-scalability.throughput  86.8% regression                                        |
> | test machine     | 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 8592+ (Emerald Rapids) with 256G memory            |
> | test parameters  | cpufreq_governor=performance                                                                       |
> |                  | runtime=300s                                                                                       |
> |                  | size=1T                                                                                            |
> |                  | test=lru-shm                                                                                       |
> +------------------+----------------------------------------------------------------------------------------------------+
> | testcase: change | netperf: netperf.Throughput_Mbps 39.9% improvement                                                 |

An improvement? Weird.

> | test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory         |
> | test parameters  | cluster=cs-localhost                                                                               |
> |                  | cpufreq_governor=performance                                                                       |
> |                  | ip=ipv4                                                                                            |
> |                  | nr_threads=200%                                                                                    |
> |                  | runtime=300s                                                                                       |
> |                  | test=TCP_MAERTS                                                                                    |
> +------------------+----------------------------------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_thread_ops  68.8% regression                                      |
> | test machine     | 104 threads 2 sockets (Skylake) with 192G memory                                                   |
> | test parameters  | cpufreq_governor=performance                                                                       |
> |                  | mode=thread                                                                                        |
> |                  | nr_task=100%                                                                                       |
> |                  | test=fallocate1                                                                                    |
> +------------------+----------------------------------------------------------------------------------------------------+

Some of those as well.

Anyway we should not be expecting the localtry_trylock_irqsave() itself be
failing and resulting in a slow path, as that woulre require an allocation
attempt from a nmi. So what else the commit does?

>       0.10 ±  4%     +11.3       11.43 ±  3%  perf-profile.self.cycles-pp.try_charge_memcg
>       0.00           +13.7       13.72 ±  2%  perf-profile.self.cycles-pp.page_counter_try_charge

This does suggest more time spent in try_charge_memcg() because consume_stock() has failed.

And I suspect this:

+       if (!gfpflags_allow_spinning(gfp_mask))
+               /* Avoid the refill and flush of the older stock */
+               batch = nr_pages;

because this will affect the refill even if consume_stock() fails not due to
a trylock failure (which should not be happening), but also just because the
stock was of a wrong memcg or depleted. So in the nowait context we deny the
refill even if we have the memory. Attached patch could be used to see if it
if fixes things. I'm not sure about the testcases where it doesn't look like
nowait context would be used though, let's see.

I've also found this:
https://lore.kernel.org/all/7s6fbpwsynadnzybhdqg3jwhls4pq2sptyxuyghxpaufhissj5@iadb6ibzscjj/

> 
> BTW after the done_restock tag in try_charge_memcg(), we will another
> gfpflags_allow_spinning() check to avoid schedule_work() and
> mem_cgroup_handle_over_high(). Maybe simply return early for
> gfpflags_allow_spinning() without checking high marks.

looks like a small possible optimization that was forgotten?
 ----8<----
From 29e7d18645577ce13d8a0140c0df050ce1ce0f95 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Mon, 10 Mar 2025 10:32:14 +0100
Subject: [PATCH] memcg: Avoid stock refill only if stock_lock can't be
 acquired

Since commit 01d37228d331 ("memcg: Use trylock to access memcg
stock_lock.") consume_stock() can fail if it can't obtain
memcg_stock.stock_lock. In that case try_charge_memcg() also avoids
refilling or flushing the stock when gfp flags indicate we are in the
context where obtaining the lock could fail.

However consume_stock() can also fail because the stock was depleted, or
belonged to a different memcg. Avoiding the stock refill then reduces
the caching efficiency, as the refill could still succeed with memory
available. This has caused various regressions to be reported by the
kernel test robot.

To fix this, make the decision to avoid stock refill more precise by
making consume_stock() return -EBUSY when it fails to obtain stock_lock,
and using that for the no-refill decision.

Fixes: 01d37228d331 ("memcg: Use trylock to access memcg stock_lock.")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202503101254.cfd454df-lkp@intel.com

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/memcontrol.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 092cab99dec7..a8371a22c7f4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1772,22 +1772,23 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
  * stock, and at least @nr_pages are available in that stock.  Failure to
  * service an allocation will refill the stock.
  *
- * returns true if successful, false otherwise.
+ * returns 0 if successful, -EBUSY if lock cannot be acquired, or -ENOMEM
+ * if the memcg does not match or there are not enough pages
  */
-static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
+static int consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
 			  gfp_t gfp_mask)
 {
 	struct memcg_stock_pcp *stock;
 	unsigned int stock_pages;
 	unsigned long flags;
-	bool ret = false;
+	bool ret = -ENOMEM;
 
 	if (nr_pages > MEMCG_CHARGE_BATCH)
 		return ret;
 
 	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
 		if (!gfpflags_allow_spinning(gfp_mask))
-			return ret;
+			return -EBUSY;
 		localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
 	}
 
@@ -1795,7 +1796,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
 	stock_pages = READ_ONCE(stock->nr_pages);
 	if (memcg == READ_ONCE(stock->cached) && stock_pages >= nr_pages) {
 		WRITE_ONCE(stock->nr_pages, stock_pages - nr_pages);
-		ret = true;
+		ret = 0;
 	}
 
 	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
@@ -2228,13 +2229,18 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	bool drained = false;
 	bool raised_max_event = false;
 	unsigned long pflags;
+	int consume_ret;
 
 retry:
-	if (consume_stock(memcg, nr_pages, gfp_mask))
+	consume_ret = consume_stock(memcg, nr_pages, gfp_mask);
+	if (!consume_ret)
 		return 0;
 
-	if (!gfpflags_allow_spinning(gfp_mask))
-		/* Avoid the refill and flush of the older stock */
+	/*
+	 * Avoid the refill and flush of the older stock if we failed to acquire
+	 * the stock_lock
+	 */
+	if (consume_ret == -EBUSY)
 		batch = nr_pages;
 
 	if (!do_memsw_account() ||
-- 
2.48.1




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-next:master] [memcg] 01d37228d3: netperf.Throughput_Mbps 37.9% regression
  2025-03-10  9:55 ` Vlastimil Babka
@ 2025-03-10 10:18   ` Alexei Starovoitov
  2025-03-10 10:34     ` Vlastimil Babka
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2025-03-10 10:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: kernel test robot, Alexei Starovoitov, oe-lkp, kbuild test robot,
	Michal Hocko, Shakeel Butt, open list:CONTROL GROUP (CGROUP),
	linux-mm

On Mon, Mar 10, 2025 at 10:55 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 3/10/25 06:50, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed a 37.9% regression of netperf.Throughput_Mbps on:
>
> I assume this is some network receive context where gfpflags do not allow
> blocking.

gfpflags_allow_spinning() should be true for all current callers
including networking.

> > commit: 01d37228d331047a0bbbd1026cec2ccabef6d88d ("memcg: Use trylock to access memcg stock_lock.")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > [test failed on linux-next/master 7ec162622e66a4ff886f8f28712ea1b13069e1aa]
> >
> > testcase: netperf
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
> > parameters:
> >
> >       ip: ipv4
> >       runtime: 300s
> >       nr_threads: 50%
> >       cluster: cs-localhost
> >       test: TCP_MAERTS
> >       cpufreq_governor: performance
> >
> >
> > In addition to that, the commit also has significant impact on the following tests:
> >
> > +------------------+----------------------------------------------------------------------------------------------------+
> > | testcase: change | stress-ng: stress-ng.mmapfork.ops_per_sec  63.5% regression                                        |
>
> Hm interesting, this one at least from the name would be a GFP_KERNEL context?

weird indeed.

> > | test machine     | 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 8592+ (Emerald Rapids) with 256G memory            |
> > | test parameters  | cpufreq_governor=performance                                                                       |
> > |                  | nr_threads=100%                                                                                    |
> > |                  | test=mmapfork                                                                                      |
> > |                  | testtime=60s                                                                                       |
> > +------------------+----------------------------------------------------------------------------------------------------+
> > | testcase: change | hackbench: hackbench.throughput  26.6% regression                                                  |
> > | test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory         |
> > | test parameters  | cpufreq_governor=performance                                                                       |
> > |                  | ipc=socket                                                                                         |
> > |                  | iterations=4                                                                                       |
> > |                  | mode=threads                                                                                       |
> > |                  | nr_threads=100%                                                                                    |
> > +------------------+----------------------------------------------------------------------------------------------------+
> > | testcase: change | lmbench3: lmbench3.TCP.socket.bandwidth.64B.MB/sec  33.0% regression                               |
> > | test machine     | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory        |
> > | test parameters  | cpufreq_governor=performance                                                                       |
> > |                  | mode=development                                                                                   |
> > |                  | nr_threads=100%                                                                                    |
> > |                  | test=TCP                                                                                           |
> > |                  | test_memory_size=50%                                                                               |
> > +------------------+----------------------------------------------------------------------------------------------------+
> > | testcase: change | vm-scalability: vm-scalability.throughput  86.8% regression                                        |
> > | test machine     | 256 threads 4 sockets INTEL(R) XEON(R) PLATINUM 8592+ (Emerald Rapids) with 256G memory            |
> > | test parameters  | cpufreq_governor=performance                                                                       |
> > |                  | runtime=300s                                                                                       |
> > |                  | size=1T                                                                                            |
> > |                  | test=lru-shm                                                                                       |
> > +------------------+----------------------------------------------------------------------------------------------------+
> > | testcase: change | netperf: netperf.Throughput_Mbps 39.9% improvement                                                 |
>
> An improvement? Weird.

Even more weird and makes no sense to me so far.

>
> > | test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory         |
> > | test parameters  | cluster=cs-localhost                                                                               |
> > |                  | cpufreq_governor=performance                                                                       |
> > |                  | ip=ipv4                                                                                            |
> > |                  | nr_threads=200%                                                                                    |
> > |                  | runtime=300s                                                                                       |
> > |                  | test=TCP_MAERTS                                                                                    |
> > +------------------+----------------------------------------------------------------------------------------------------+
> > | testcase: change | will-it-scale: will-it-scale.per_thread_ops  68.8% regression                                      |
> > | test machine     | 104 threads 2 sockets (Skylake) with 192G memory                                                   |
> > | test parameters  | cpufreq_governor=performance                                                                       |
> > |                  | mode=thread                                                                                        |
> > |                  | nr_task=100%                                                                                       |
> > |                  | test=fallocate1                                                                                    |
> > +------------------+----------------------------------------------------------------------------------------------------+
>
> Some of those as well.
>
> Anyway we should not be expecting the localtry_trylock_irqsave() itself be
> failing and resulting in a slow path, as that woulre require an allocation
> attempt from a nmi. So what else the commit does?
>
> >       0.10 ±  4%     +11.3       11.43 ±  3%  perf-profile.self.cycles-pp.try_charge_memcg
> >       0.00           +13.7       13.72 ±  2%  perf-profile.self.cycles-pp.page_counter_try_charge
>
> This does suggest more time spent in try_charge_memcg() because consume_stock() has failed.
>
> And I suspect this:
>
> +       if (!gfpflags_allow_spinning(gfp_mask))
> +               /* Avoid the refill and flush of the older stock */
> +               batch = nr_pages;
>
> because this will affect the refill even if consume_stock() fails not due to
> a trylock failure (which should not be happening), but also just because the
> stock was of a wrong memcg or depleted. So in the nowait context we deny the
> refill even if we have the memory. Attached patch could be used to see if it
> if fixes things. I'm not sure about the testcases where it doesn't look like
> nowait context would be used though, let's see.

Not quite.
GFP_NOWAIT includes __GFP_KSWAPD_RECLAIM,
so gfpflags_allow_spinning() will return true.

So 'batch' won't change.

> I've also found this:
> https://lore.kernel.org/all/7s6fbpwsynadnzybhdqg3jwhls4pq2sptyxuyghxpaufhissj5@iadb6ibzscjj/

Right. And notice Shakeel's suggestion doesn't include '!' in
the condition.
I assumed it's a typo.
Hence added it as "if (!gfpflags_allow_spinning(gfp_mask))"

> >
> > BTW after the done_restock tag in try_charge_memcg(), we will another
> > gfpflags_allow_spinning() check to avoid schedule_work() and
> > mem_cgroup_handle_over_high(). Maybe simply return early for
> > gfpflags_allow_spinning() without checking high marks.
>
> looks like a small possible optimization that was forgotten?
>  ----8<----
> From 29e7d18645577ce13d8a0140c0df050ce1ce0f95 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Mon, 10 Mar 2025 10:32:14 +0100
> Subject: [PATCH] memcg: Avoid stock refill only if stock_lock can't be
>  acquired
>
> Since commit 01d37228d331 ("memcg: Use trylock to access memcg
> stock_lock.") consume_stock() can fail if it can't obtain
> memcg_stock.stock_lock. In that case try_charge_memcg() also avoids
> refilling or flushing the stock when gfp flags indicate we are in the
> context where obtaining the lock could fail.
>
> However consume_stock() can also fail because the stock was depleted, or
> belonged to a different memcg. Avoiding the stock refill then reduces
> the caching efficiency, as the refill could still succeed with memory
> available. This has caused various regressions to be reported by the
> kernel test robot.
>
> To fix this, make the decision to avoid stock refill more precise by
> making consume_stock() return -EBUSY when it fails to obtain stock_lock,
> and using that for the no-refill decision.
>
> Fixes: 01d37228d331 ("memcg: Use trylock to access memcg stock_lock.")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202503101254.cfd454df-lkp@intel.com
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/memcontrol.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 092cab99dec7..a8371a22c7f4 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1772,22 +1772,23 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
>   * stock, and at least @nr_pages are available in that stock.  Failure to
>   * service an allocation will refill the stock.
>   *
> - * returns true if successful, false otherwise.
> + * returns 0 if successful, -EBUSY if lock cannot be acquired, or -ENOMEM
> + * if the memcg does not match or there are not enough pages
>   */
> -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> +static int consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
>                           gfp_t gfp_mask)
>  {
>         struct memcg_stock_pcp *stock;
>         unsigned int stock_pages;
>         unsigned long flags;
> -       bool ret = false;
> +       bool ret = -ENOMEM;
>
>         if (nr_pages > MEMCG_CHARGE_BATCH)
>                 return ret;
>
>         if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
>                 if (!gfpflags_allow_spinning(gfp_mask))
> -                       return ret;
> +                       return -EBUSY;
>                 localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
>         }
>
> @@ -1795,7 +1796,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
>         stock_pages = READ_ONCE(stock->nr_pages);
>         if (memcg == READ_ONCE(stock->cached) && stock_pages >= nr_pages) {
>                 WRITE_ONCE(stock->nr_pages, stock_pages - nr_pages);
> -               ret = true;
> +               ret = 0;
>         }
>
>         localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
> @@ -2228,13 +2229,18 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
>         bool drained = false;
>         bool raised_max_event = false;
>         unsigned long pflags;
> +       int consume_ret;
>
>  retry:
> -       if (consume_stock(memcg, nr_pages, gfp_mask))
> +       consume_ret = consume_stock(memcg, nr_pages, gfp_mask);
> +       if (!consume_ret)
>                 return 0;
>
> -       if (!gfpflags_allow_spinning(gfp_mask))
> -               /* Avoid the refill and flush of the older stock */
> +       /*
> +        * Avoid the refill and flush of the older stock if we failed to acquire
> +        * the stock_lock
> +        */
> +       if (consume_ret == -EBUSY)
>                 batch = nr_pages;

Sure.
I think it's a good optimization, but I don't think it will make
any difference here.
Fixes tag is not appropriate and the commit log is off too.

I have strong suspicion that this bot report is bogus.

I'll try to repro anyway.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-next:master] [memcg] 01d37228d3: netperf.Throughput_Mbps 37.9% regression
  2025-03-10 10:18   ` Alexei Starovoitov
@ 2025-03-10 10:34     ` Vlastimil Babka
  2025-03-10 10:56       ` Alexei Starovoitov
  0 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2025-03-10 10:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: kernel test robot, Alexei Starovoitov, oe-lkp, kbuild test robot,
	Michal Hocko, Shakeel Butt, open list:CONTROL GROUP (CGROUP),
	linux-mm

On 3/10/25 11:18, Alexei Starovoitov wrote:
>> because this will affect the refill even if consume_stock() fails not due to
>> a trylock failure (which should not be happening), but also just because the
>> stock was of a wrong memcg or depleted. So in the nowait context we deny the
>> refill even if we have the memory. Attached patch could be used to see if it
>> if fixes things. I'm not sure about the testcases where it doesn't look like
>> nowait context would be used though, let's see.
> 
> Not quite.
> GFP_NOWAIT includes __GFP_KSWAPD_RECLAIM,
> so gfpflags_allow_spinning() will return true.

Uh right, it's the new gfpflags_allow_spinning(), not the
gfpflags_allow_blocking() I'm used to and implicitly assumed, sorry.

But then it's very simple because it has a bug:
gfpflags_allow_spinning() does

return !(gfp_flags & __GFP_RECLAIM);

should be !!



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-next:master] [memcg] 01d37228d3: netperf.Throughput_Mbps 37.9% regression
  2025-03-10 10:34     ` Vlastimil Babka
@ 2025-03-10 10:56       ` Alexei Starovoitov
  2025-03-10 11:03         ` Vlastimil Babka
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2025-03-10 10:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: kernel test robot, Alexei Starovoitov, oe-lkp, kbuild test robot,
	Michal Hocko, Shakeel Butt, open list:CONTROL GROUP (CGROUP),
	linux-mm

On Mon, Mar 10, 2025 at 11:34 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 3/10/25 11:18, Alexei Starovoitov wrote:
> >> because this will affect the refill even if consume_stock() fails not due to
> >> a trylock failure (which should not be happening), but also just because the
> >> stock was of a wrong memcg or depleted. So in the nowait context we deny the
> >> refill even if we have the memory. Attached patch could be used to see if it
> >> if fixes things. I'm not sure about the testcases where it doesn't look like
> >> nowait context would be used though, let's see.
> >
> > Not quite.
> > GFP_NOWAIT includes __GFP_KSWAPD_RECLAIM,
> > so gfpflags_allow_spinning() will return true.
>
> Uh right, it's the new gfpflags_allow_spinning(), not the
> gfpflags_allow_blocking() I'm used to and implicitly assumed, sorry.
>
> But then it's very simple because it has a bug:
> gfpflags_allow_spinning() does
>
> return !(gfp_flags & __GFP_RECLAIM);
>
> should be !!

Ouch.
So I accidentally exposed the whole linux-next to this stress testing
of new trylock facilities :(
But the silver lining is that this is the only thing that blew up :)
Could you send a patch or I will do it later today.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-next:master] [memcg] 01d37228d3: netperf.Throughput_Mbps 37.9% regression
  2025-03-10 10:56       ` Alexei Starovoitov
@ 2025-03-10 11:03         ` Vlastimil Babka
  0 siblings, 0 replies; 6+ messages in thread
From: Vlastimil Babka @ 2025-03-10 11:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: kernel test robot, Alexei Starovoitov, oe-lkp, kbuild test robot,
	Michal Hocko, Shakeel Butt, open list:CONTROL GROUP (CGROUP),
	linux-mm

On 3/10/25 11:56, Alexei Starovoitov wrote:
> On Mon, Mar 10, 2025 at 11:34 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 3/10/25 11:18, Alexei Starovoitov wrote:
>> >> because this will affect the refill even if consume_stock() fails not due to
>> >> a trylock failure (which should not be happening), but also just because the
>> >> stock was of a wrong memcg or depleted. So in the nowait context we deny the
>> >> refill even if we have the memory. Attached patch could be used to see if it
>> >> if fixes things. I'm not sure about the testcases where it doesn't look like
>> >> nowait context would be used though, let's see.
>> >
>> > Not quite.
>> > GFP_NOWAIT includes __GFP_KSWAPD_RECLAIM,
>> > so gfpflags_allow_spinning() will return true.
>>
>> Uh right, it's the new gfpflags_allow_spinning(), not the
>> gfpflags_allow_blocking() I'm used to and implicitly assumed, sorry.
>>
>> But then it's very simple because it has a bug:
>> gfpflags_allow_spinning() does
>>
>> return !(gfp_flags & __GFP_RECLAIM);
>>
>> should be !!
> 
> Ouch.
> So I accidentally exposed the whole linux-next to this stress testing
> of new trylock facilities :(
> But the silver lining is that this is the only thing that blew up :)
> Could you send a patch or I will do it later today.

OK
----8<----
From 69b3d1631645c82d9d88f17fb01184d24034df2b Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Mon, 10 Mar 2025 11:57:52 +0100
Subject: [PATCH] mm: Fix the flipped condition in gfpflags_allow_spinning()

The function gfpflags_allow_spinning() has a bug that makes it return
the opposite result than intended. This could contribute to deadlocks as
usage profilerates, for now it was noticed as a performance regression
due to try_charge_memcg() not refilling memcg stock when it could. Fix
the flipped condition.

Fixes: 97769a53f117 ("mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202503101254.cfd454df-lkp@intel.com

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/gfp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index ceb226c2e25c..c9fa6309c903 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -55,7 +55,7 @@ static inline bool gfpflags_allow_spinning(const gfp_t gfp_flags)
 	 * regular page allocator doesn't fully support this
 	 * allocation mode.
 	 */
-	return !(gfp_flags & __GFP_RECLAIM);
+	return !!(gfp_flags & __GFP_RECLAIM);
 }
 
 #ifdef CONFIG_HIGHMEM
-- 
2.48.1




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-03-10 11:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-10  5:50 [linux-next:master] [memcg] 01d37228d3: netperf.Throughput_Mbps 37.9% regression kernel test robot
2025-03-10  9:55 ` Vlastimil Babka
2025-03-10 10:18   ` Alexei Starovoitov
2025-03-10 10:34     ` Vlastimil Babka
2025-03-10 10:56       ` Alexei Starovoitov
2025-03-10 11:03         ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox