linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
@ 2024-05-17  5:56 kernel test robot
  2024-05-17 23:38 ` Yosry Ahmed
  2024-05-18  6:28 ` Shakeel Butt
  0 siblings, 2 replies; 15+ messages in thread
From: kernel test robot @ 2024-05-17  5:56 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin, oliver.sang



Hello,

kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on:


commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: will-it-scale
test machine: 104 threads 2 sockets (Skylake) with 192G memory
parameters:

	nr_task: 100%
	mode: process
	test: page_fault2
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405171353.b56b845-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240517/202405171353.b56b845-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale

commit: 
  59142d87ab ("memcg: reduce memory size of mem_cgroup_events_index")
  70a64b7919 ("memcg: dynamically allocate lruvec_stats")

59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      7.14            -0.8        6.32        mpstat.cpu.all.usr%
    245257 ±  7%     -13.8%     211354 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.stddev
    245258 ±  7%     -13.8%     211353 ±  4%  sched_debug.cfs_rq:/.min_vruntime.stddev
     21099 ±  5%     -14.9%      17946 ±  5%  perf-c2c.DRAM.local
      4025 ±  2%     +29.1%       5197 ±  3%  perf-c2c.HITM.local
    105.17 ±  8%     -12.7%      91.83 ±  6%  perf-c2c.HITM.remote
   9538291           -11.9%    8402170        will-it-scale.104.processes
     91713           -11.9%      80789        will-it-scale.per_process_ops
   9538291           -11.9%    8402170        will-it-scale.workload
 1.438e+09           -11.2%  1.276e+09        numa-numastat.node0.local_node
  1.44e+09           -11.3%  1.278e+09        numa-numastat.node0.numa_hit
     83001 ± 15%     -68.9%      25774 ± 34%  numa-numastat.node0.other_node
 1.453e+09           -12.5%  1.271e+09        numa-numastat.node1.local_node
 1.454e+09           -12.5%  1.272e+09        numa-numastat.node1.numa_hit
     24752 ± 51%    +230.9%      81910 ± 10%  numa-numastat.node1.other_node
  1.44e+09           -11.3%  1.278e+09        numa-vmstat.node0.numa_hit
 1.438e+09           -11.3%  1.276e+09        numa-vmstat.node0.numa_local
     83001 ± 15%     -68.9%      25774 ± 34%  numa-vmstat.node0.numa_other
 1.454e+09           -12.5%  1.272e+09        numa-vmstat.node1.numa_hit
 1.453e+09           -12.5%  1.271e+09        numa-vmstat.node1.numa_local
     24752 ± 51%    +230.9%      81910 ± 10%  numa-vmstat.node1.numa_other
     14952            -3.2%      14468        proc-vmstat.nr_mapped
 2.894e+09           -11.9%   2.55e+09        proc-vmstat.numa_hit
 2.891e+09           -11.9%  2.548e+09        proc-vmstat.numa_local
  2.88e+09           -11.8%  2.539e+09        proc-vmstat.pgalloc_normal
 2.869e+09           -11.9%  2.529e+09        proc-vmstat.pgfault
  2.88e+09           -11.8%  2.539e+09        proc-vmstat.pgfree
     17.51            -2.6%      17.05        perf-stat.i.MPKI
 9.457e+09            -9.2%  8.585e+09        perf-stat.i.branch-instructions
  45022022            -8.2%   41340795        perf-stat.i.branch-misses
     84.38            -4.9       79.51        perf-stat.i.cache-miss-rate%
 8.353e+08           -12.1%  7.345e+08        perf-stat.i.cache-misses
 9.877e+08            -6.7%  9.216e+08        perf-stat.i.cache-references
      6.06           +10.8%       6.72        perf-stat.i.cpi
    136.25            -1.2%     134.59        perf-stat.i.cpu-migrations
    348.56           +13.9%     396.93        perf-stat.i.cycles-between-cache-misses
 4.763e+10            -9.7%  4.302e+10        perf-stat.i.instructions
      0.17            -9.6%       0.15        perf-stat.i.ipc
    182.56           -11.9%     160.88        perf-stat.i.metric.K/sec
   9494393           -11.9%    8368012        perf-stat.i.minor-faults
   9494393           -11.9%    8368012        perf-stat.i.page-faults
     17.54            -2.6%      17.08        perf-stat.overall.MPKI
      0.47            +0.0        0.48        perf-stat.overall.branch-miss-rate%
     84.57            -4.9       79.71        perf-stat.overall.cache-miss-rate%
      6.07           +10.8%       6.73        perf-stat.overall.cpi
    346.33           +13.8%     393.97        perf-stat.overall.cycles-between-cache-misses
      0.16            -9.7%       0.15        perf-stat.overall.ipc
   1503802            +2.6%    1542599        perf-stat.overall.path-length
 9.424e+09            -9.2%  8.553e+09        perf-stat.ps.branch-instructions
  44739120            -8.3%   41034189        perf-stat.ps.branch-misses
 8.326e+08           -12.1%  7.321e+08        perf-stat.ps.cache-misses
 9.846e+08            -6.7%  9.185e+08        perf-stat.ps.cache-references
    134.98            -1.3%     133.26        perf-stat.ps.cpu-migrations
 4.747e+10            -9.7%  4.286e+10        perf-stat.ps.instructions
   9463902           -11.9%    8339836        perf-stat.ps.minor-faults
   9463902           -11.9%    8339836        perf-stat.ps.page-faults
 1.434e+13            -9.6%  1.296e+13        perf-stat.total.instructions
     64.15            -2.4       61.72        perf-profile.calltrace.cycles-pp.testcase
     58.30            -1.9       56.41        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     52.64            -1.4       51.28        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
     52.50            -1.3       51.16        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     50.81            -1.0       49.86        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     49.86            -0.8       49.02        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      9.27            -0.8        8.45 ±  3%  perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     49.21            -0.8       48.43        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      5.15            -0.5        4.68        perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
      3.24            -0.5        2.77        perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.82            -0.3        0.51        perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      1.68            -0.3        1.42        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      2.52            -0.2        2.28        perf-profile.calltrace.cycles-pp.error_entry.testcase
      1.50 ±  2%      -0.2        1.30        perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      1.85            -0.1        1.70 ±  3%  perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.68            -0.1        0.55 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
      1.55            -0.1        1.44 ±  3%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault
      0.55            -0.1        0.43 ± 44%  perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc
      1.07            -0.1        0.98        perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault
      0.90            -0.1        0.81        perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
      0.89            -0.0        0.86        perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault
      1.00            +0.1        1.05        perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      3.85            +0.2        4.10        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      3.85            +0.2        4.10        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      3.85            +0.2        4.10        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      3.82            +0.3        4.07        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
      3.68            +0.3        3.94        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
      0.83            +0.3        1.10 ±  2%  perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault.__handle_mm_fault
      0.00            +0.5        0.54        perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range
      0.00            +0.7        0.66        perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
     32.87            +0.7       33.62        perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
     29.54            +2.3       31.80        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
     29.54            +2.3       31.80        perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
     29.53            +2.3       31.80        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
     30.66            +2.3       32.93        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
     30.66            +2.3       32.93        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
     30.66            +2.3       32.93        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
     30.66            +2.3       32.93        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
     29.26            +2.3       31.60        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
     28.41            +2.4       30.78        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu
     34.56            +2.5       37.08        perf-profile.calltrace.cycles-pp.__munmap
     34.56            +2.5       37.08        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     34.56            +2.5       37.08        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     34.55            +2.5       37.07        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
     34.55            +2.5       37.08        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     34.55            +2.5       37.08        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     34.55            +2.5       37.08        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     34.55            +2.5       37.08        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     31.41            +2.8       34.20        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
     31.42            +2.8       34.23        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
     31.38            +2.8       34.19        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
     65.26            -2.5       62.73        perf-profile.children.cycles-pp.testcase
     56.09            -1.7       54.41        perf-profile.children.cycles-pp.asm_exc_page_fault
     52.66            -1.4       51.30        perf-profile.children.cycles-pp.exc_page_fault
     52.52            -1.3       51.18        perf-profile.children.cycles-pp.do_user_addr_fault
     50.83            -1.0       49.88        perf-profile.children.cycles-pp.handle_mm_fault
     49.87            -0.8       49.02        perf-profile.children.cycles-pp.__handle_mm_fault
      9.35            -0.8        8.53 ±  3%  perf-profile.children.cycles-pp.copy_page
     49.23            -0.8       48.45        perf-profile.children.cycles-pp.do_fault
      5.15            -0.5        4.68        perf-profile.children.cycles-pp.__irqentry_text_end
      3.27            -0.5        2.80        perf-profile.children.cycles-pp.folio_prealloc
      0.82            -0.3        0.52        perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.57            -0.3        0.32        perf-profile.children.cycles-pp.mas_walk
      1.69            -0.3        1.43        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      2.54            -0.2        2.30        perf-profile.children.cycles-pp.error_entry
      1.52 ±  2%      -0.2        1.31        perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.95            -0.2        0.79 ±  4%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      1.87            -0.2        1.72 ±  3%  perf-profile.children.cycles-pp.__pte_offset_map_lock
      0.60 ±  4%      -0.1        0.46 ±  6%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
      0.70            -0.1        0.56 ±  2%  perf-profile.children.cycles-pp.lru_add_fn
      1.57            -0.1        1.45 ±  3%  perf-profile.children.cycles-pp._raw_spin_lock
      1.16            -0.1        1.04        perf-profile.children.cycles-pp.native_irq_return_iret
      1.12            -0.1        1.01        perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
      0.44            -0.1        0.35        perf-profile.children.cycles-pp.get_vma_policy
      0.94            -0.1        0.85        perf-profile.children.cycles-pp.sync_regs
      0.96            -0.1        0.87        perf-profile.children.cycles-pp.__perf_sw_event
      0.43            -0.1        0.34 ±  2%  perf-profile.children.cycles-pp.free_unref_folios
      0.21 ±  3%      -0.1        0.13 ±  3%  perf-profile.children.cycles-pp._compound_head
      0.75            -0.1        0.68        perf-profile.children.cycles-pp.___perf_sw_event
      0.31            -0.1        0.25        perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
      0.94            -0.0        0.90        perf-profile.children.cycles-pp.__alloc_pages_noprof
      0.41 ±  4%      -0.0        0.37 ±  4%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      0.44 ±  5%      -0.0        0.40 ±  5%  perf-profile.children.cycles-pp.__count_memcg_events
      0.17 ±  2%      -0.0        0.13 ±  4%  perf-profile.children.cycles-pp.uncharge_batch
      0.57            -0.0        0.53 ±  2%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.13 ±  2%      -0.0        0.09 ±  5%  perf-profile.children.cycles-pp.__mod_zone_page_state
      0.19 ±  3%      -0.0        0.16 ±  6%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.15 ±  2%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.free_unref_page_commit
      0.10 ±  3%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      0.08            -0.0        0.05        perf-profile.children.cycles-pp.policy_nodemask
      0.13 ±  3%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.page_counter_uncharge
      0.32 ±  3%      -0.0        0.30 ±  2%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.17 ±  2%      -0.0        0.15 ±  3%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.16 ±  2%      -0.0        0.14 ±  2%  perf-profile.children.cycles-pp.shmem_get_policy
      0.16            -0.0        0.14 ±  2%  perf-profile.children.cycles-pp.handle_pte_fault
      0.16 ±  4%      -0.0        0.14 ±  4%  perf-profile.children.cycles-pp.__pte_offset_map
      0.09            -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.12 ±  3%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.uncharge_folio
      0.36            -0.0        0.34        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.10 ±  3%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.pte_offset_map_nolock
      0.30            -0.0        0.28 ±  2%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.09 ±  4%      -0.0        0.08        perf-profile.children.cycles-pp.down_read_trylock
      0.08            -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.folio_unlock
      0.40            +0.0        0.43        perf-profile.children.cycles-pp.__mod_lruvec_state
      1.02            +0.0        1.06        perf-profile.children.cycles-pp.zap_present_ptes
      0.47            +0.2        0.67        perf-profile.children.cycles-pp.folio_remove_rmap_ptes
      3.87            +0.3        4.12        perf-profile.children.cycles-pp.tlb_finish_mmu
      1.17            +0.5        1.71 ±  2%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
     32.88            +0.8       33.63        perf-profile.children.cycles-pp.set_pte_range
     29.54            +2.3       31.80        perf-profile.children.cycles-pp.tlb_flush_mmu
     30.66            +2.3       32.93        perf-profile.children.cycles-pp.zap_pte_range
     30.66            +2.3       32.94        perf-profile.children.cycles-pp.unmap_page_range
     30.66            +2.3       32.94        perf-profile.children.cycles-pp.zap_pmd_range
     30.66            +2.3       32.94        perf-profile.children.cycles-pp.unmap_vmas
     33.41            +2.5       35.92        perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
     33.40            +2.5       35.92        perf-profile.children.cycles-pp.free_pages_and_swap_cache
     34.56            +2.5       37.08        perf-profile.children.cycles-pp.__munmap
     34.56            +2.5       37.08        perf-profile.children.cycles-pp.__vm_munmap
     34.56            +2.5       37.08        perf-profile.children.cycles-pp.__x64_sys_munmap
     34.56            +2.5       37.09        perf-profile.children.cycles-pp.do_vmi_munmap
     34.56            +2.5       37.09        perf-profile.children.cycles-pp.do_vmi_align_munmap
     34.67            +2.5       37.20        perf-profile.children.cycles-pp.do_syscall_64
     34.67            +2.5       37.20        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     34.56            +2.5       37.09        perf-profile.children.cycles-pp.unmap_region
     33.22            +2.6       35.80        perf-profile.children.cycles-pp.folios_put_refs
     32.12            +2.6       34.75        perf-profile.children.cycles-pp.__page_cache_release
     61.97            +3.3       65.27        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     61.94            +3.3       65.26        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     61.98            +3.3       65.30        perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      9.32            -0.8        8.49 ±  3%  perf-profile.self.cycles-pp.copy_page
      5.15            -0.5        4.68        perf-profile.self.cycles-pp.__irqentry_text_end
      0.56            -0.3        0.31        perf-profile.self.cycles-pp.mas_walk
      2.58            -0.2        2.33        perf-profile.self.cycles-pp.testcase
      2.53            -0.2        2.30        perf-profile.self.cycles-pp.error_entry
      0.60 ±  4%      -0.2        0.44 ±  6%  perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
      0.85            -0.1        0.71 ±  4%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      1.54            -0.1        1.43 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock
      1.15            -0.1        1.04        perf-profile.self.cycles-pp.native_irq_return_iret
      0.94            -0.1        0.85        perf-profile.self.cycles-pp.sync_regs
      0.20 ±  3%      -0.1        0.13 ±  3%  perf-profile.self.cycles-pp._compound_head
      0.27 ±  3%      -0.1        0.20 ±  3%  perf-profile.self.cycles-pp.free_pages_and_swap_cache
      0.26            -0.1        0.18 ±  2%  perf-profile.self.cycles-pp.get_vma_policy
      0.26            -0.1        0.19 ±  2%  perf-profile.self.cycles-pp.__page_cache_release
      0.16            -0.1        0.09 ±  5%  perf-profile.self.cycles-pp.vma_alloc_folio_noprof
      0.28 ±  2%      -0.1        0.22 ±  3%  perf-profile.self.cycles-pp.zap_present_ptes
      0.66            -0.1        0.60        perf-profile.self.cycles-pp.___perf_sw_event
      0.32            -0.1        0.27 ±  5%  perf-profile.self.cycles-pp.lru_add_fn
      0.47            -0.0        0.43 ±  2%  perf-profile.self.cycles-pp.__handle_mm_fault
      0.16 ±  4%      -0.0        0.12        perf-profile.self.cycles-pp.lock_vma_under_rcu
      0.20            -0.0        0.16 ±  4%  perf-profile.self.cycles-pp.free_unref_folios
      0.30            -0.0        0.26        perf-profile.self.cycles-pp.handle_mm_fault
      0.10 ±  4%      -0.0        0.07        perf-profile.self.cycles-pp.zap_pte_range
      0.09 ±  5%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
      0.14 ±  2%      -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.mem_cgroup_commit_charge
      0.14 ±  3%      -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.folio_remove_rmap_ptes
      0.12 ±  4%      -0.0        0.09 ±  7%  perf-profile.self.cycles-pp.__mod_zone_page_state
      0.10 ±  4%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.alloc_pages_mpol_noprof
      0.11            -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.free_unref_page_commit
      0.22 ±  2%      -0.0        0.19        perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.21            -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.__perf_sw_event
      0.21            -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.do_user_addr_fault
      0.31 ±  2%      -0.0        0.29        perf-profile.self.cycles-pp.__mod_node_page_state
      0.16 ±  2%      -0.0        0.14 ±  5%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.17 ±  2%      -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.11            -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.page_counter_uncharge
      0.09            -0.0        0.07        perf-profile.self.cycles-pp.get_pfnblock_flags_mask
      0.28 ±  2%      -0.0        0.26 ±  2%  perf-profile.self.cycles-pp.xas_load
      0.16 ±  2%      -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.get_page_from_freelist
      0.12            -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.uncharge_folio
      0.16 ±  4%      -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.__pte_offset_map
      0.20 ±  2%      -0.0        0.19 ±  2%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
      0.16 ±  3%      -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.shmem_get_policy
      0.14 ±  3%      -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.do_fault
      0.08            -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.folio_unlock
      0.12 ±  3%      -0.0        0.11        perf-profile.self.cycles-pp.folio_add_new_anon_rmap
      0.09            -0.0        0.08        perf-profile.self.cycles-pp.down_read_trylock
      0.07            -0.0        0.06        perf-profile.self.cycles-pp.folio_prealloc
      0.38 ±  2%      +0.0        0.42 ±  3%  perf-profile.self.cycles-pp.filemap_get_entry
      0.26            +0.1        0.36        perf-profile.self.cycles-pp.folios_put_refs
      0.33            +0.1        0.44 ±  3%  perf-profile.self.cycles-pp.folio_batch_move_lru
      0.40 ±  5%      +0.6        0.98        perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
     61.94            +3.3       65.26        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-17  5:56 [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression kernel test robot
@ 2024-05-17 23:38 ` Yosry Ahmed
  2024-05-18  6:28 ` Shakeel Butt
  1 sibling, 0 replies; 15+ messages in thread
From: Yosry Ahmed @ 2024-05-17 23:38 UTC (permalink / raw)
  To: kernel test robot
  Cc: Shakeel Butt, oe-lkp, lkp, Linux Memory Management List,
	Andrew Morton, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin

On Thu, May 16, 2024 at 10:56 PM kernel test robot
<oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on:
>
>
> commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

I think we may want to go back to the approach of reordering the
indices to separate memcg and non-memcg stats. If we really want to
conserve the order in which the stats are exported to userspace, we
can use a translation table on the read path instead of the update
path.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-17  5:56 [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression kernel test robot
  2024-05-17 23:38 ` Yosry Ahmed
@ 2024-05-18  6:28 ` Shakeel Butt
  2024-05-19  9:14   ` Oliver Sang
  1 sibling, 1 reply; 15+ messages in thread
From: Shakeel Butt @ 2024-05-18  6:28 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin

On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on:
> 
> 
> commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 

Thanks for the report. Can you please run the same benchmark but with
the full series (of 8 patches) or at least include the ff48c71c26aa
("memcg: reduce memory for the lruvec and memcg stats").

thanks,
Shakeel



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-18  6:28 ` Shakeel Butt
@ 2024-05-19  9:14   ` Oliver Sang
  2024-05-19 17:20     ` Shakeel Butt
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Sang @ 2024-05-19  9:14 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin, oliver.sang

hi, Shakeel,

On Fri, May 17, 2024 at 11:28:10PM -0700, Shakeel Butt wrote:
> On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote:
> > 
> > 
> > Hello,
> > 
> > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on:
> > 
> > 
> > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > 
> 
> Thanks for the report. Can you please run the same benchmark but with
> the full series (of 8 patches) or at least include the ff48c71c26aa
> ("memcg: reduce memory for the lruvec and memcg stats").

while this bisect, ff48c71c26aa has been checked. it has silimar data as
70a64b7919 (a little worse actually)

59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     91713           -11.9%      80789           -13.2%      79612        will-it-scale.per_process_ops


ok, we will run tests on tip of the series which should be below if I understand
it correctly.

* a94032b35e5f9 memcg: use proper type for mod_memcg_state


> 
> thanks,
> Shakeel
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-19  9:14   ` Oliver Sang
@ 2024-05-19 17:20     ` Shakeel Butt
  2024-05-20  2:43       ` Oliver Sang
  0 siblings, 1 reply; 15+ messages in thread
From: Shakeel Butt @ 2024-05-19 17:20 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin

On Sun, May 19, 2024 at 05:14:39PM +0800, Oliver Sang wrote:
> hi, Shakeel,
> 
> On Fri, May 17, 2024 at 11:28:10PM -0700, Shakeel Butt wrote:
> > On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote:
> > > 
> > > 
> > > Hello,
> > > 
> > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on:
> > > 
> > > 
> > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats")
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > 
> > 
> > Thanks for the report. Can you please run the same benchmark but with
> > the full series (of 8 patches) or at least include the ff48c71c26aa
> > ("memcg: reduce memory for the lruvec and memcg stats").
> 
> while this bisect, ff48c71c26aa has been checked. it has silimar data as
> 70a64b7919 (a little worse actually)
> 
> 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803
> ---------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \
>      91713           -11.9%      80789           -13.2%      79612        will-it-scale.per_process_ops
> 
> 
> ok, we will run tests on tip of the series which should be below if I understand
> it correctly.
> 
> * a94032b35e5f9 memcg: use proper type for mod_memcg_state
> 
> 

Thanks a lot Oliver. One question: what is the filesystem mounted at
/tmp on your test machine? I just wanted to make sure I run the test
with minimal changes from your setup.

> > 
> > thanks,
> > Shakeel
> > 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-19 17:20     ` Shakeel Butt
@ 2024-05-20  2:43       ` Oliver Sang
  2024-05-20  3:49         ` Shakeel Butt
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Sang @ 2024-05-20  2:43 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin, oliver.sang

hi, Shakeel,

On Sun, May 19, 2024 at 10:20:28AM -0700, Shakeel Butt wrote:
> On Sun, May 19, 2024 at 05:14:39PM +0800, Oliver Sang wrote:
> > hi, Shakeel,
> > 
> > On Fri, May 17, 2024 at 11:28:10PM -0700, Shakeel Butt wrote:
> > > On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote:
> > > > 
> > > > 
> > > > Hello,
> > > > 
> > > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on:
> > > > 
> > > > 
> > > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats")
> > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > > 
> > > 
> > > Thanks for the report. Can you please run the same benchmark but with
> > > the full series (of 8 patches) or at least include the ff48c71c26aa
> > > ("memcg: reduce memory for the lruvec and memcg stats").
> > 
> > while this bisect, ff48c71c26aa has been checked. it has silimar data as
> > 70a64b7919 (a little worse actually)
> > 
> > 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803
> > ---------------- --------------------------- ---------------------------
> >          %stddev     %change         %stddev     %change         %stddev
> >              \          |                \          |                \
> >      91713           -11.9%      80789           -13.2%      79612        will-it-scale.per_process_ops
> > 
> > 
> > ok, we will run tests on tip of the series which should be below if I understand
> > it correctly.
> > 
> > * a94032b35e5f9 memcg: use proper type for mod_memcg_state
> > 
> > 
> 
> Thanks a lot Oliver. One question: what is the filesystem mounted at
> /tmp on your test machine? I just wanted to make sure I run the test
> with minimal changes from your setup.

we don't have specific partition for /tmp, just use tmpfs

tmp on /tmp type tmpfs (rw,relatime)


BTW, the test on a94032b35e5f9 finished, still have similar score to 70a64b7919

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale

59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
     91713           -11.9%      80789           -13.2%      79612           -13.0%      79833        will-it-scale.per_process_ops



> 
> > > 
> > > thanks,
> > > Shakeel
> > > 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-20  2:43       ` Oliver Sang
@ 2024-05-20  3:49         ` Shakeel Butt
  2024-05-21  2:43           ` Oliver Sang
  0 siblings, 1 reply; 15+ messages in thread
From: Shakeel Butt @ 2024-05-20  3:49 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin

On Mon, May 20, 2024 at 10:43:35AM +0800, Oliver Sang wrote:
> hi, Shakeel,
> 
> On Sun, May 19, 2024 at 10:20:28AM -0700, Shakeel Butt wrote:
> > On Sun, May 19, 2024 at 05:14:39PM +0800, Oliver Sang wrote:
> > > hi, Shakeel,
> > > 
> > > On Fri, May 17, 2024 at 11:28:10PM -0700, Shakeel Butt wrote:
> > > > On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote:
> > > > > 
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on:
> > > > > 
> > > > > 
> > > > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > > > 
> > > > 
> > > > Thanks for the report. Can you please run the same benchmark but with
> > > > the full series (of 8 patches) or at least include the ff48c71c26aa
> > > > ("memcg: reduce memory for the lruvec and memcg stats").
> > > 
> > > while this bisect, ff48c71c26aa has been checked. it has silimar data as
> > > 70a64b7919 (a little worse actually)
> > > 
> > > 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803
> > > ---------------- --------------------------- ---------------------------
> > >          %stddev     %change         %stddev     %change         %stddev
> > >              \          |                \          |                \
> > >      91713           -11.9%      80789           -13.2%      79612        will-it-scale.per_process_ops
> > > 
> > > 
> > > ok, we will run tests on tip of the series which should be below if I understand
> > > it correctly.
> > > 
> > > * a94032b35e5f9 memcg: use proper type for mod_memcg_state
> > > 
> > > 
> > 
> > Thanks a lot Oliver. One question: what is the filesystem mounted at
> > /tmp on your test machine? I just wanted to make sure I run the test
> > with minimal changes from your setup.
> 
> we don't have specific partition for /tmp, just use tmpfs
> 
> tmp on /tmp type tmpfs (rw,relatime)
> 
> 
> BTW, the test on a94032b35e5f9 finished, still have similar score to 70a64b7919
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
>   gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale
> 
> 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \          |                \
>      91713           -11.9%      80789           -13.2%      79612           -13.0%      79833        will-it-scale.per_process_ops
> 

Thanks again. I am not sure if you have a single node machine but if you
have, can you try to repro this issue on such machine. At the moment, I
don't have access to such machine but I will try to repro myself as
well.

Shakeel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-20  3:49         ` Shakeel Butt
@ 2024-05-21  2:43           ` Oliver Sang
  2024-05-22  4:18             ` Shakeel Butt
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Sang @ 2024-05-21  2:43 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin, oliver.sang

hi, Shakeel,

On Sun, May 19, 2024 at 08:49:33PM -0700, Shakeel Butt wrote:
> On Mon, May 20, 2024 at 10:43:35AM +0800, Oliver Sang wrote:
> > hi, Shakeel,
> > 
> > On Sun, May 19, 2024 at 10:20:28AM -0700, Shakeel Butt wrote:
> > > On Sun, May 19, 2024 at 05:14:39PM +0800, Oliver Sang wrote:
> > > > hi, Shakeel,
> > > > 
> > > > On Fri, May 17, 2024 at 11:28:10PM -0700, Shakeel Butt wrote:
> > > > > On Fri, May 17, 2024 at 01:56:30PM +0800, kernel test robot wrote:
> > > > > > 
> > > > > > 
> > > > > > Hello,
> > > > > > 
> > > > > > kernel test robot noticed a -11.9% regression of will-it-scale.per_process_ops on:
> > > > > > 
> > > > > > 
> > > > > > commit: 70a64b7919cbd6c12306051ff2825839a9d65605 ("memcg: dynamically allocate lruvec_stats")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > > > > 
> > > > > 
> > > > > Thanks for the report. Can you please run the same benchmark but with
> > > > > the full series (of 8 patches) or at least include the ff48c71c26aa
> > > > > ("memcg: reduce memory for the lruvec and memcg stats").
> > > > 
> > > > while this bisect, ff48c71c26aa has been checked. it has silimar data as
> > > > 70a64b7919 (a little worse actually)
> > > > 
> > > > 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803
> > > > ---------------- --------------------------- ---------------------------
> > > >          %stddev     %change         %stddev     %change         %stddev
> > > >              \          |                \          |                \
> > > >      91713           -11.9%      80789           -13.2%      79612        will-it-scale.per_process_ops
> > > > 
> > > > 
> > > > ok, we will run tests on tip of the series which should be below if I understand
> > > > it correctly.
> > > > 
> > > > * a94032b35e5f9 memcg: use proper type for mod_memcg_state
> > > > 
> > > > 
> > > 
> > > Thanks a lot Oliver. One question: what is the filesystem mounted at
> > > /tmp on your test machine? I just wanted to make sure I run the test
> > > with minimal changes from your setup.
> > 
> > we don't have specific partition for /tmp, just use tmpfs
> > 
> > tmp on /tmp type tmpfs (rw,relatime)
> > 
> > 
> > BTW, the test on a94032b35e5f9 finished, still have similar score to 70a64b7919
> > 
> > =========================================================================================
> > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> >   gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale
> > 
> > 59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929
> > ---------------- --------------------------- --------------------------- ---------------------------
> >          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
> >              \          |                \          |                \          |                \
> >      91713           -11.9%      80789           -13.2%      79612           -13.0%      79833        will-it-scale.per_process_ops
> > 
> 
> Thanks again. I am not sure if you have a single node machine but if you
> have, can you try to repro this issue on such machine. At the moment, I
> don't have access to such machine but I will try to repro myself as
> well.

we reported regression on a 2-node Skylake server. so I found a 1-node Skylake
desktop (we don't have 1 node server) to check.

model: Skylake
nr_node: 1
nr_cpu: 36
memory: 32G
brand: Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz

but cannot reproduce this regression:

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-d08/page_fault2/will-it-scale

59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
    136040            -0.2%     135718            -0.2%     135829            -0.1%     135881        will-it-scale.per_process_ops


then I tried on 2-node servers with other models

for
model: Ice Lake
nr_node: 2
nr_cpu: 64
memory: 256G
brand: Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz

similar regression
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/page_fault2/will-it-scale

59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
    240373           -14.4%     205702           -14.1%     206368           -12.9%     209394        will-it-scale.per_process_ops

full data is as below [1]


for
model: Sapphire Rapids
nr_node: 2
nr_cpu: 224
memory: 512G
brand: Intel(R) Xeon(R) Platinum 8480CTDX

the regression is smaller but still exists.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/page_fault2/will-it-scale

59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
     78072            -3.4%      75386            -6.0%      73363            -5.6%      73683        will-it-scale.per_process_ops


full data is as below [2]

hope these data are useful.



[1]
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/page_fault2/will-it-scale

59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
      0.27 ±  3%      -0.0        0.24 ±  3%      -0.0        0.23 ±  3%      -0.0        0.24 ±  2%  mpstat.cpu.all.irq%
      3.83            -0.7        3.17 ±  2%      -0.6        3.23 ±  3%      -0.6        3.21        mpstat.cpu.all.usr%
     62547           -10.1%      56227           -10.8%      55807            -8.9%      56984        perf-c2c.DRAM.local
    194.40 ±  9%     -11.5%     172.00 ±  4%     -11.5%     172.00 ±  5%     -13.9%     167.40 ±  2%  perf-c2c.HITM.remote
  15383898           -14.4%   13164951           -14.1%   13207631           -12.9%   13401271        will-it-scale.64.processes
    240373           -14.4%     205702           -14.1%     206368           -12.9%     209394        will-it-scale.per_process_ops
  15383898           -14.4%   13164951           -14.1%   13207631           -12.9%   13401271        will-it-scale.workload
 2.359e+09           -12.9%  2.055e+09           -14.2%  2.023e+09           -12.8%  2.057e+09        numa-numastat.node0.local_node
 2.359e+09           -12.9%  2.055e+09           -14.2%  2.023e+09           -12.8%  2.057e+09        numa-numastat.node0.numa_hit
 2.346e+09           -16.1%  1.967e+09           -14.2%  2.013e+09           -13.2%  2.035e+09 ±  2%  numa-numastat.node1.local_node
 2.345e+09           -16.1%  1.967e+09           -14.2%  2.013e+09           -13.2%  2.036e+09 ±  2%  numa-numastat.node1.numa_hit
    567382 ±  8%      +2.1%     579061 ± 10%      -9.5%     513215 ±  5%      +1.2%     574201 ±  9%  numa-vmstat.node0.nr_anon_pages
  2.36e+09           -12.9%  2.055e+09           -14.3%  2.023e+09           -12.9%  2.056e+09        numa-vmstat.node0.numa_hit
  2.36e+09           -12.9%  2.055e+09           -14.3%  2.023e+09           -12.9%  2.056e+09        numa-vmstat.node0.numa_local
 2.346e+09           -16.2%  1.966e+09           -14.2%  2.012e+09           -13.3%  2.035e+09 ±  2%  numa-vmstat.node1.numa_hit
 2.347e+09           -16.2%  1.967e+09           -14.2%  2.013e+09           -13.3%  2.034e+09 ±  2%  numa-vmstat.node1.numa_local
   1137116            -1.9%    1115597            -1.5%    1119624            -1.8%    1116759        proc-vmstat.nr_anon_pages
      4575            +2.1%       4673            +2.1%       4671            +1.7%       4654        proc-vmstat.nr_page_table_pages
 4.705e+09           -14.5%  4.022e+09           -14.2%  4.036e+09           -13.0%  4.093e+09        proc-vmstat.numa_hit
 4.706e+09           -14.5%  4.023e+09           -14.2%  4.037e+09           -13.0%  4.092e+09        proc-vmstat.numa_local
 4.645e+09           -14.3%  3.979e+09           -14.1%  3.991e+09           -12.8%   4.05e+09        proc-vmstat.pgalloc_normal
 4.631e+09           -14.3%  3.967e+09           -14.1%  3.979e+09           -12.8%  4.038e+09        proc-vmstat.pgfault
 4.643e+09           -14.3%  3.978e+09           -14.1%   3.99e+09           -12.8%  4.049e+09        proc-vmstat.pgfree
     29780 ± 54%     -49.0%      15173 ± 50%     -87.2%       3818 ±199%     -33.2%      19878 ±112%  sched_debug.cfs_rq:/.left_deadline.avg
   1905931 ± 54%     -49.1%     971033 ± 50%     -87.2%     244356 ±199%     -33.2%    1272254 ±112%  sched_debug.cfs_rq:/.left_deadline.max
    236372 ± 54%     -49.1%     120428 ± 50%     -87.2%      30306 ±199%     -33.2%     157784 ±112%  sched_debug.cfs_rq:/.left_deadline.stddev
     29779 ± 54%     -49.0%      15172 ± 50%     -87.2%       3818 ±199%     -33.2%      19878 ±112%  sched_debug.cfs_rq:/.left_vruntime.avg
   1905916 ± 54%     -49.1%     971025 ± 50%     -87.2%     244349 ±199%     -33.2%    1272236 ±112%  sched_debug.cfs_rq:/.left_vruntime.max
    236371 ± 54%     -49.1%     120427 ± 50%     -87.2%      30304 ±199%     -33.2%     157782 ±112%  sched_debug.cfs_rq:/.left_vruntime.stddev
     12745 ±  8%      +2.4%      13045            -9.7%      11510 ± 11%      -6.0%      11984 ± 10%  sched_debug.cfs_rq:/.load.min
    253.83 ± 24%     +56.9%     398.30 ± 27%     +58.4%     402.13 ± 56%     +23.8%     314.20 ± 23%  sched_debug.cfs_rq:/.load_avg.max
     22.93 ±  4%     -12.2%      20.14 ± 17%     -12.0%      20.17 ± 17%     -18.5%      18.68 ± 15%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
     22.93 ±  4%     -13.0%      19.94 ± 16%     -12.1%      20.16 ± 17%     -19.9%      18.35 ± 14%  sched_debug.cfs_rq:/.removed.util_avg.stddev
     29779 ± 54%     -49.0%      15172 ± 50%     -87.2%       3818 ±199%     -33.2%      19878 ±112%  sched_debug.cfs_rq:/.right_vruntime.avg
   1905916 ± 54%     -49.1%     971025 ± 50%     -87.2%     244349 ±199%     -33.2%    1272236 ±112%  sched_debug.cfs_rq:/.right_vruntime.max
    236371 ± 54%     -49.1%     120427 ± 50%     -87.2%      30304 ±199%     -33.2%     157782 ±112%  sched_debug.cfs_rq:/.right_vruntime.stddev
    149.50 ± 33%     -81.3%      28.00 ±180%     -71.2%      43.03 ±120%     -70.9%      43.57 ±125%  sched_debug.cfs_rq:/.util_est.min
      1930 ±  4%     -15.5%       1631 ±  7%     -18.1%       1581 ±  5%     -10.5%       1729 ± 16%  sched_debug.cpu.nr_switches.min
      0.79 ± 98%     +89.1%       1.49 ± 48%    +147.8%       1.96 ± 16%     -12.4%       0.69 ± 91%  sched_debug.rt_rq:.rt_time.avg
     50.52 ± 98%     +89.2%      95.60 ± 48%    +147.8%     125.19 ± 17%     -12.3%      44.29 ± 91%  sched_debug.rt_rq:.rt_time.max
      6.27 ± 98%     +89.2%      11.86 ± 48%    +147.8%      15.53 ± 17%     -12.3%       5.49 ± 91%  sched_debug.rt_rq:.rt_time.stddev
     21.14           -10.1%      19.00           -10.1%      19.01 ±  2%      -9.9%      19.05        perf-stat.i.MPKI
 1.468e+10            -9.4%   1.33e+10            -9.0%  1.336e+10            -7.9%  1.351e+10        perf-stat.i.branch-instructions
  14349180            -7.8%   13236560            -6.6%   13407521            -6.2%   13464962        perf-stat.i.branch-misses
     69.58            -5.1       64.51            -4.8       64.81            -4.6       64.96        perf-stat.i.cache-miss-rate%
  1.57e+09           -19.5%  1.263e+09 ±  2%     -18.9%  1.273e+09 ±  3%     -17.8%  1.291e+09        perf-stat.i.cache-misses
 2.252e+09           -13.2%  1.955e+09           -12.9%  1.961e+09           -11.9%  1.985e+09        perf-stat.i.cache-references
      3.00           +12.8%       3.39           +12.0%       3.36           +10.6%       3.32        perf-stat.i.cpi
     99.00            -0.9%      98.11            -1.1%      97.90            -0.9%      98.13        perf-stat.i.cpu-migrations
    143.06           +25.2%     179.10 ±  2%     +24.5%     178.15 ±  3%     +22.4%     175.18        perf-stat.i.cycles-between-cache-misses
 7.403e+10           -10.4%  6.634e+10            -9.8%  6.679e+10            -8.7%   6.76e+10        perf-stat.i.instructions
      0.34           -11.4%       0.30           -10.7%       0.30            -9.7%       0.30        perf-stat.i.ipc
    478.41           -14.3%     410.14           -14.0%     411.31           -12.7%     417.50        perf-stat.i.metric.K/sec
  15310132           -14.3%   13125768           -14.0%   13162999           -12.7%   13361235        perf-stat.i.minor-faults
  15310132           -14.3%   13125768           -14.0%   13163000           -12.7%   13361235        perf-stat.i.page-faults
     21.21           -28.4%      15.17 ± 50%     -10.2%      19.05 ±  2%     -28.3%      15.20 ± 50%  perf-stat.overall.MPKI
      0.10            -0.0        0.08 ± 50%      +0.0        0.10            -0.0        0.08 ± 50%  perf-stat.overall.branch-miss-rate%
     69.71           -18.2       51.52 ± 50%      -4.8       64.89           -17.9       51.83 ± 50%  perf-stat.overall.cache-miss-rate%
      3.01            -9.7%       2.72 ± 50%     +11.9%       3.37           -11.4%       2.67 ± 50%  perf-stat.overall.cpi
    141.98            +1.0%     143.41 ± 50%     +24.6%     176.94 ±  3%      -1.2%     140.33 ± 50%  perf-stat.overall.cycles-between-cache-misses
      0.33           -29.1%       0.24 ± 50%     -10.6%       0.30           -27.7%       0.24 ± 50%  perf-stat.overall.ipc
   1453908           -16.2%    1217875 ± 50%      +4.9%    1524841           -16.2%    1218410 ± 50%  perf-stat.overall.path-length
 1.463e+10           -27.6%  1.059e+10 ± 50%      -9.0%  1.332e+10           -26.4%  1.077e+10 ± 50%  perf-stat.ps.branch-instructions
  14253731           -25.8%   10569701 ± 50%      -6.6%   13307817           -25.1%   10681742 ± 50%  perf-stat.ps.branch-misses
 1.565e+09           -36.0%  1.002e+09 ± 50%     -18.9%  1.269e+09 ±  3%     -34.6%  1.023e+09 ± 50%  perf-stat.ps.cache-misses
 2.245e+09           -30.7%  1.556e+09 ± 50%     -12.9%  1.954e+09           -29.6%  1.579e+09 ± 50%  perf-stat.ps.cache-references
     98.42           -20.7%      78.08 ± 50%      -1.0%      97.40           -20.6%      78.12 ± 50%  perf-stat.ps.cpu-migrations
 7.378e+10           -28.4%  5.281e+10 ± 50%      -9.8%  6.656e+10           -27.0%  5.385e+10 ± 50%  perf-stat.ps.instructions
  15260342           -31.6%   10437993 ± 50%     -14.0%   13119215           -30.3%   10633461 ± 50%  perf-stat.ps.minor-faults
  15260342           -31.6%   10437993 ± 50%     -14.0%   13119215           -30.3%   10633461 ± 50%  perf-stat.ps.page-faults
 2.237e+13           -28.5%  1.599e+13 ± 50%     -10.0%  2.014e+13           -27.2%  1.629e+13 ± 50%  perf-stat.total.instructions
     75.68            -6.2       69.50            -6.1       69.63            -5.4       70.26        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     72.31            -5.8       66.56            -5.6       66.68            -5.1       67.25        perf-profile.calltrace.cycles-pp.testcase
     63.50            -4.4       59.13            -4.4       59.13            -3.9       59.64        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
     63.32            -4.4       58.97            -4.4       58.97            -3.8       59.48        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     61.04            -4.1       56.99            -4.1       56.98            -3.6       57.49        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     21.29            -3.9       17.43 ±  3%      -3.6       17.67 ±  3%      -3.5       17.77 ±  2%  perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     59.53            -3.8       55.69            -3.9       55.68            -3.3       56.21        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     58.35            -3.7       54.65            -3.7       54.65            -3.2       55.17        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      5.31            -0.9        4.40 ±  2%      -0.9        4.44 ±  2%      -0.8        4.50        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
      4.97            -0.8        4.13 ±  2%      -0.8        4.15 ±  2%      -0.8        4.21        perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault
      4.40            -0.7        3.72 ±  3%      -0.6        3.79 ±  3%      -0.6        3.78 ±  2%  perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      2.63            -0.4        2.23 ±  2%      -0.4        2.26 ±  2%      -0.3        2.29        perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
      1.82            -0.4        1.44 ±  2%      -0.4        1.47 ±  2%      -0.3        1.49        perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      2.21            -0.3        1.89 ±  2%      -0.3        1.88 ±  2%      -0.3        1.90        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      2.01            -0.3        1.69 ±  4%      -0.2        1.76 ±  5%      -0.3        1.73 ±  2%  perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      1.80            -0.3        1.52 ±  2%      -0.3        1.52 ±  2%      -0.3        1.54        perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault
      1.74            -0.2        1.50 ±  3%      -0.2        1.51 ±  3%      -0.2        1.52 ±  2%  perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      1.55            -0.2        1.31 ±  2%      -0.2        1.30 ±  2%      -0.2        1.33        perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault
      1.60            -0.2        1.37 ±  3%      -0.2        1.39 ±  3%      -0.2        1.39 ±  2%  perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault
      1.29            -0.2        1.08 ±  3%      -0.2        1.14 ±  4%      -0.2        1.11 ±  3%  perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault
      1.42            -0.2        1.21 ±  3%      -0.2        1.23 ±  3%      -0.2        1.24 ±  2%  perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault
      1.50            -0.2        1.31 ±  2%      -0.1        1.41 ±  2%      -0.1        1.36 ±  3%  perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault.__handle_mm_fault
      1.12            -0.2        0.93 ±  3%      -0.2        0.93 ±  2%      -0.2        0.95 ±  2%  perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc
      0.92            -0.1        0.78 ±  4%      -0.1        0.80 ±  3%      -0.1        0.81 ±  3%  perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
      0.74            -0.1        0.61 ±  2%      -0.1        0.65 ±  2%      -0.1        0.64        perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
      0.98            -0.1        0.86 ±  2%      -0.1        0.87 ±  2%      -0.1        0.87 ±  2%  perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.72 ±  2%      -0.1        0.61 ±  2%      -0.1        0.61 ±  2%      -0.1        0.60 ±  3%  perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.63 ±  2%      -0.1        0.53            -0.1        0.53 ±  2%      -0.2        0.41 ± 50%  perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault
      1.15            -0.1        1.05            -0.1        1.08            -0.1        1.07        perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
      0.66            -0.1        0.56 ±  2%      -0.1        0.56 ±  2%      -0.1        0.56 ±  2%  perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.64            -0.1        0.55 ±  4%      -0.1        0.54 ±  3%      -0.1        0.56 ±  2%  perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof
      0.66            -0.1        0.58 ±  2%      -0.1        0.59 ±  3%      -0.1        0.58 ±  2%  perf-profile.calltrace.cycles-pp.mas_walk.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      2.71            +0.7        3.39            +0.7        3.36            +0.6        3.31 ±  2%  perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      2.71            +0.7        3.39            +0.7        3.36            +0.6        3.31 ±  2%  perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      2.71            +0.7        3.39            +0.7        3.37            +0.6        3.31 ±  2%  perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      2.65            +0.7        3.34            +0.7        3.32            +0.6        3.26 ±  2%  perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
      2.44            +0.7        3.15            +0.7        3.13            +0.6        3.07 ±  2%  perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
     24.39            +2.2       26.56 ±  5%      +1.8       26.19 ±  4%      +2.1       26.54 ±  3%  perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
     22.46            +2.4       24.88 ±  5%      +2.0       24.41 ±  5%      +2.3       24.81 ±  4%  perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault
     22.25            +2.5       24.70 ±  5%      +2.0       24.24 ±  5%      +2.4       24.63 ±  4%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault
     20.38            +2.5       22.90 ±  6%      +2.0       22.42 ±  5%      +2.5       22.84 ±  4%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
     20.37            +2.5       22.89 ±  6%      +2.0       22.41 ±  5%      +2.5       22.83 ±  4%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range
     20.30            +2.5       22.83 ±  6%      +2.0       22.35 ±  5%      +2.5       22.77 ±  4%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma
     22.59            +5.3       27.93            +5.3       27.84            +4.7       27.29 ±  2%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
     22.59            +5.3       27.93            +5.3       27.84            +4.7       27.29 ±  2%  perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
     22.59            +5.3       27.93            +5.3       27.84            +4.7       27.29 ±  2%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
     22.58            +5.3       27.92            +5.3       27.83            +4.7       27.28 ±  2%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
     20.59            +5.8       26.34            +5.6       26.22            +5.1       25.64 ±  2%  perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
     20.59            +5.8       26.34            +5.6       26.22            +5.1       25.64 ±  2%  perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
     20.56            +5.8       26.32            +5.6       26.20            +5.1       25.62 ±  2%  perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
     20.07            +5.9       25.95            +5.8       25.83            +5.2       25.23 ±  3%  perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
     18.73            +6.0       24.73            +5.9       24.63            +5.3       24.01 ±  3%  perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu
     25.34            +6.0       31.36            +5.9       31.24            +5.3       30.64 ±  2%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     25.34            +6.0       31.36            +5.9       31.24            +5.3       30.64 ±  2%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     25.34            +6.0       31.36            +5.9       31.24            +5.3       30.64 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     25.34            +6.0       31.36            +5.9       31.24            +5.3       30.64 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     25.34            +6.0       31.36            +5.9       31.24            +5.3       30.64 ±  2%  perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     25.33            +6.0       31.36            +5.9       31.24            +5.3       30.64 ±  2%  perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
     25.34            +6.0       31.37            +5.9       31.25            +5.3       30.65 ±  2%  perf-profile.calltrace.cycles-pp.__munmap
     25.33            +6.0       31.36            +5.9       31.24            +5.3       30.64 ±  2%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     20.35            +6.7       27.09            +6.6       26.96            +5.9       26.29 ±  3%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
     20.36            +6.7       27.11            +6.6       26.98            +5.9       26.30 ±  3%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
     20.28            +6.8       27.04            +6.6       26.91            +6.0       26.24 ±  3%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
     74.49            -6.0       68.46            -5.9       68.59            -5.3       69.18        perf-profile.children.cycles-pp.testcase
     71.15            -5.5       65.63            -5.4       65.72            -4.8       66.30        perf-profile.children.cycles-pp.asm_exc_page_fault
     63.55            -4.4       59.16            -4.4       59.17            -3.9       59.68        perf-profile.children.cycles-pp.exc_page_fault
     63.38            -4.4       59.03            -4.4       59.03            -3.8       59.54        perf-profile.children.cycles-pp.do_user_addr_fault
     61.10            -4.1       57.04            -4.1       57.03            -3.6       57.54        perf-profile.children.cycles-pp.handle_mm_fault
     21.32            -3.9       17.45 ±  3%      -3.6       17.70 ±  3%      -3.5       17.80 ±  2%  perf-profile.children.cycles-pp.copy_page
     59.57            -3.9       55.72            -3.9       55.72            -3.3       56.24        perf-profile.children.cycles-pp.__handle_mm_fault
     58.44            -3.7       54.74            -3.7       54.74            -3.2       55.25        perf-profile.children.cycles-pp.do_fault
      5.36            -0.9        4.44 ±  2%      -0.9        4.48 ±  2%      -0.8        4.54        perf-profile.children.cycles-pp.__pte_offset_map_lock
      5.02            -0.9        4.16 ±  2%      -0.8        4.19 ±  2%      -0.8        4.25        perf-profile.children.cycles-pp._raw_spin_lock
      4.45            -0.7        3.76 ±  3%      -0.6        3.83 ±  3%      -0.6        3.82 ±  2%  perf-profile.children.cycles-pp.folio_prealloc
      2.64            -0.4        2.24 ±  2%      -0.4        2.27 ±  2%      -0.3        2.30        perf-profile.children.cycles-pp.sync_regs
      1.89            -0.4        1.49 ±  2%      -0.4        1.52 ±  2%      -0.3        1.55        perf-profile.children.cycles-pp.zap_present_ptes
      2.42            -0.4        2.04 ±  2%      -0.3        2.08 ±  3%      -0.3        2.09 ±  2%  perf-profile.children.cycles-pp.native_irq_return_iret
      2.24            -0.3        1.91 ±  2%      -0.3        1.91 ±  2%      -0.3        1.93        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      2.07            -0.3        1.74 ±  3%      -0.3        1.80 ±  5%      -0.3        1.77 ±  2%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      1.89            -0.3        1.61 ±  2%      -0.3        1.60 ±  2%      -0.3        1.62        perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
      2.04            -0.3        1.77 ±  2%      -0.1        1.90 ±  2%      -0.2        1.83 ±  3%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
      1.64            -0.3        1.39 ±  2%      -0.3        1.39 ±  2%      -0.2        1.41        perf-profile.children.cycles-pp.__alloc_pages_noprof
      1.77            -0.2        1.52 ±  3%      -0.2        1.53 ±  3%      -0.2        1.54 ±  2%  perf-profile.children.cycles-pp.__do_fault
      1.62            -0.2        1.39 ±  3%      -0.2        1.41 ±  3%      -0.2        1.41 ±  2%  perf-profile.children.cycles-pp.shmem_fault
      1.32            -0.2        1.10 ±  3%      -0.2        1.16 ±  4%      -0.2        1.13 ±  2%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      1.42            -0.2        1.21 ±  2%      -0.2        1.20 ±  2%      -0.2        1.19 ±  2%  perf-profile.children.cycles-pp.__perf_sw_event
      1.47            -0.2        1.27 ±  3%      -0.2        1.28 ±  3%      -0.2        1.29 ±  2%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
      1.13 ±  2%      -0.2        0.93 ±  4%      -0.1        1.06 ±  2%      -0.1        1.03 ±  3%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      1.17            -0.2        0.98 ±  2%      -0.2        0.98 ±  2%      -0.2        1.00 ±  2%  perf-profile.children.cycles-pp.get_page_from_freelist
      1.25            -0.2        1.06 ±  2%      -0.2        1.06 ±  2%      -0.2        1.05 ±  2%  perf-profile.children.cycles-pp.___perf_sw_event
      0.84            -0.2        0.67 ±  3%      -0.2        0.68 ±  4%      -0.2        0.69 ±  2%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.61            -0.2        0.44 ±  3%      -0.2        0.43 ±  3%      -0.2        0.46 ±  2%  perf-profile.children.cycles-pp._compound_head
      0.65            -0.1        0.51 ±  2%      -0.1        0.53 ±  4%      -0.1        0.53 ±  2%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.94            -0.1        0.80 ±  4%      -0.1        0.82 ±  4%      -0.1        0.82 ±  3%  perf-profile.children.cycles-pp.filemap_get_entry
      1.02            -0.1        0.89 ±  2%      -0.1        0.90 ±  3%      -0.1        0.90 ±  2%  perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.76            -0.1        0.63 ±  2%      -0.1        0.67 ±  2%      -0.1        0.66        perf-profile.children.cycles-pp.folio_remove_rmap_ptes
      1.20            -0.1        1.10            -0.1        1.13            -0.1        1.11        perf-profile.children.cycles-pp.lru_add_fn
      0.69            -0.1        0.59 ±  4%      -0.1        0.58 ±  2%      -0.1        0.60 ±  2%  perf-profile.children.cycles-pp.rmqueue
      0.47            -0.1        0.38 ±  2%      -0.1        0.37 ±  2%      -0.1        0.38        perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
      0.59            -0.1        0.49 ±  2%      -0.1        0.49            -0.1        0.50        perf-profile.children.cycles-pp.free_unref_folios
      0.54            -0.1        0.45 ±  4%      -0.1        0.46 ±  3%      -0.1        0.47 ±  3%  perf-profile.children.cycles-pp.xas_load
      0.67            -0.1        0.58 ±  3%      -0.1        0.60 ±  3%      -0.1        0.59 ±  2%  perf-profile.children.cycles-pp.mas_walk
      0.63 ±  3%      -0.1        0.55 ±  3%      -0.0        0.61 ±  4%      -0.1        0.55 ±  3%  perf-profile.children.cycles-pp.__count_memcg_events
      0.27 ±  3%      -0.1        0.21 ±  3%      -0.1        0.21 ±  3%      -0.1        0.21        perf-profile.children.cycles-pp.uncharge_batch
      0.38            -0.1        0.32 ±  5%      -0.0        0.33            -0.0        0.33        perf-profile.children.cycles-pp.try_charge_memcg
      0.22 ±  3%      -0.1        0.17 ±  4%      -0.1        0.17 ±  4%      -0.1        0.17 ±  2%  perf-profile.children.cycles-pp.page_counter_uncharge
      0.32            -0.1        0.27            -0.0        0.28            -0.1        0.26        perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.26 ±  3%      -0.0        0.21 ±  4%      -0.0        0.22 ±  2%      -0.0        0.23 ±  5%  perf-profile.children.cycles-pp.__pte_offset_map
      0.30            -0.0        0.26 ±  2%      -0.0        0.26            -0.0        0.26 ±  3%  perf-profile.children.cycles-pp.handle_pte_fault
      0.28            -0.0        0.24 ±  2%      -0.0        0.25 ±  3%      -0.0        0.25        perf-profile.children.cycles-pp.error_entry
      0.31            -0.0        0.27            -0.0        0.26 ±  5%      -0.0        0.26        perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.31 ±  2%      -0.0        0.27 ±  6%      -0.0        0.27 ±  4%      -0.0        0.27 ±  3%  perf-profile.children.cycles-pp.get_vma_policy
      0.22            -0.0        0.19 ±  2%      -0.0        0.19 ±  2%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.free_unref_page_commit
      0.22 ±  2%      -0.0        0.19 ±  3%      -0.0        0.19            -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.folio_add_new_anon_rmap
      0.26 ±  2%      -0.0        0.22 ±  9%      -0.0        0.22 ±  4%      -0.0        0.23 ±  5%  perf-profile.children.cycles-pp._raw_spin_trylock
      0.28 ±  2%      -0.0        0.25 ±  3%      -0.0        0.25            -0.0        0.25 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.32 ±  2%      -0.0        0.29 ±  4%      -0.0        0.28 ±  2%      -0.0        0.29 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.26 ±  3%      -0.0        0.22 ±  4%      -0.0        0.22 ±  3%      -0.0        0.23        perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.25 ±  3%      -0.0        0.21 ±  4%      -0.0        0.21 ±  2%      -0.0        0.22        perf-profile.children.cycles-pp.hrtimer_interrupt
      0.22 ±  2%      -0.0        0.19 ±  3%      -0.0        0.19 ±  2%      -0.0        0.19 ±  3%  perf-profile.children.cycles-pp.pte_offset_map_nolock
      0.17 ±  2%      -0.0        0.14 ±  4%      -0.0        0.14 ±  5%      -0.0        0.15 ±  3%  perf-profile.children.cycles-pp.folio_unlock
      0.14 ±  2%      -0.0        0.11            -0.0        0.11 ±  3%      -0.0        0.11        perf-profile.children.cycles-pp.__mod_zone_page_state
      0.19 ±  2%      -0.0        0.16 ±  2%      -0.0        0.17 ±  2%      -0.0        0.17 ±  3%  perf-profile.children.cycles-pp.down_read_trylock
      0.18            -0.0        0.15 ±  3%      -0.0        0.15 ±  4%      -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      0.14 ±  2%      -0.0        0.11 ±  6%      -0.0        0.11 ±  8%      -0.0        0.12 ±  6%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
      0.14 ±  3%      -0.0        0.12 ±  3%      -0.0        0.12 ±  5%      -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.perf_exclude_event
      0.19 ±  2%      -0.0        0.17 ±  4%      -0.0        0.17 ±  2%      -0.0        0.17 ±  4%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.16 ±  2%      -0.0        0.14            -0.0        0.13 ±  3%      -0.0        0.14 ±  2%  perf-profile.children.cycles-pp.uncharge_folio
      0.12 ±  3%      -0.0        0.10 ±  5%      -0.0        0.09 ±  5%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.18 ±  3%      -0.0        0.16 ±  4%      -0.0        0.16 ±  3%      -0.0        0.16 ±  3%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.13 ±  3%      -0.0        0.10 ±  4%      -0.0        0.10 ±  4%      -0.0        0.11 ±  4%  perf-profile.children.cycles-pp.page_counter_try_charge
      0.16            -0.0        0.14 ±  2%      -0.0        0.14 ±  4%      -0.0        0.14 ±  2%  perf-profile.children.cycles-pp.folio_put
      0.18 ±  2%      -0.0        0.16 ±  3%      -0.0        0.16 ±  3%      -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__cond_resched
      0.18 ±  2%      -0.0        0.16 ±  5%      -0.0        0.16 ±  4%      -0.0        0.16 ±  2%  perf-profile.children.cycles-pp.up_read
      0.14            -0.0        0.12            -0.0        0.12            -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.policy_nodemask
      0.16 ±  2%      -0.0        0.14 ±  3%      -0.0        0.14 ±  2%      -0.0        0.14 ±  3%  perf-profile.children.cycles-pp.update_process_times
      0.11 ±  3%      -0.0        0.09 ±  8%      -0.0        0.09 ±  4%      -0.0        0.09 ±  4%  perf-profile.children.cycles-pp.xas_start
      0.13 ±  3%      -0.0        0.11            -0.0        0.11 ±  3%      -0.0        0.11        perf-profile.children.cycles-pp.access_error
      0.09 ±  4%      -0.0        0.08 ±  5%      -0.0        0.08 ±  5%      -0.0        0.08        perf-profile.children.cycles-pp.__irqentry_text_end
      0.07 ±  5%      -0.0        0.05 ±  9%      -0.0        0.06 ±  6%      -0.0        0.06        perf-profile.children.cycles-pp.vm_normal_page
      0.06 ±  7%      -0.0        0.05 ±  7%      -0.0        0.05            -0.0        0.05 ±  7%  perf-profile.children.cycles-pp.__tlb_remove_folio_pages_size
      0.08            -0.0        0.07 ±  5%      -0.0        0.07 ±  5%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.memcg_check_events
      0.12 ±  3%      -0.0        0.11 ±  6%      -0.0        0.11 ±  4%      -0.0        0.11 ±  3%  perf-profile.children.cycles-pp.perf_swevent_event
      0.06            -0.0        0.05 ±  7%      -0.0        0.05            -0.0        0.05        perf-profile.children.cycles-pp.pte_alloc_one
      0.06            -0.0        0.05 ±  7%      -0.0        0.05            -0.0        0.05 ±  7%  perf-profile.children.cycles-pp.irqentry_enter
      0.06            -0.0        0.05 ±  7%      -0.0        0.05            -0.0        0.05 ±  7%  perf-profile.children.cycles-pp.vmf_anon_prepare
      0.05            +0.0        0.06 ±  8%      +0.0        0.06            +0.0        0.06 ±  8%  perf-profile.children.cycles-pp.write
      0.05            +0.0        0.06            +0.0        0.06            +0.0        0.06        perf-profile.children.cycles-pp.perf_mmap__push
      0.19 ±  2%      +0.2        0.40 ±  6%      +0.2        0.37 ±  7%      +0.2        0.35 ±  4%  perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      2.72            +0.7        3.40            +0.7        3.38            +0.6        3.32 ±  2%  perf-profile.children.cycles-pp.tlb_finish_mmu
     24.44            +2.2       26.60 ±  5%      +1.8       26.23 ±  4%      +2.1       26.58 ±  3%  perf-profile.children.cycles-pp.set_pte_range
     22.47            +2.4       24.89 ±  5%      +2.0       24.42 ±  5%      +2.3       24.81 ±  4%  perf-profile.children.cycles-pp.folio_add_lru_vma
     22.31            +2.5       24.77 ±  5%      +2.0       24.30 ±  5%      +2.4       24.70 ±  4%  perf-profile.children.cycles-pp.folio_batch_move_lru
     22.59            +5.3       27.93            +5.2       27.84            +4.7       27.29 ±  2%  perf-profile.children.cycles-pp.zap_pmd_range
     22.59            +5.3       27.93            +5.3       27.84            +4.7       27.29 ±  2%  perf-profile.children.cycles-pp.unmap_page_range
     22.59            +5.3       27.93            +5.3       27.84            +4.7       27.29 ±  2%  perf-profile.children.cycles-pp.zap_pte_range
     22.59            +5.3       27.93            +5.3       27.84            +4.7       27.29 ±  2%  perf-profile.children.cycles-pp.unmap_vmas
     20.59            +5.8       26.34            +5.6       26.22            +5.1       25.64 ±  2%  perf-profile.children.cycles-pp.tlb_flush_mmu
     25.34            +6.0       31.36            +5.9       31.24            +5.3       30.64 ±  2%  perf-profile.children.cycles-pp.__x64_sys_munmap
     25.34            +6.0       31.36            +5.9       31.24            +5.3       30.64 ±  2%  perf-profile.children.cycles-pp.__vm_munmap
     25.34            +6.0       31.37            +5.9       31.25            +5.3       30.65 ±  2%  perf-profile.children.cycles-pp.__munmap
     25.33            +6.0       31.36            +5.9       31.24            +5.3       30.64 ±  2%  perf-profile.children.cycles-pp.unmap_region
     25.34            +6.0       31.37            +5.9       31.25            +5.3       30.65 ±  2%  perf-profile.children.cycles-pp.do_vmi_align_munmap
     25.34            +6.0       31.37            +5.9       31.25            +5.3       30.65 ±  2%  perf-profile.children.cycles-pp.do_vmi_munmap
     25.46            +6.0       31.49            +5.9       31.37            +5.3       30.77 ±  2%  perf-profile.children.cycles-pp.do_syscall_64
     25.46            +6.0       31.49            +5.9       31.37            +5.3       30.77 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     23.30            +6.4       29.74            +6.3       29.59            +5.7       28.96 ±  2%  perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
     23.29            +6.4       29.73            +6.3       29.58            +5.7       28.95 ±  2%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
     23.00            +6.5       29.52            +6.4       29.38            +5.7       28.73 ±  2%  perf-profile.children.cycles-pp.folios_put_refs
     21.22            +6.7       27.93            +6.6       27.81            +5.9       27.13 ±  3%  perf-profile.children.cycles-pp.__page_cache_release
     40.79            +9.3       50.07 ±  2%      +8.7       49.46 ±  2%      +8.4       49.20        perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     40.78            +9.3       50.06 ±  2%      +8.7       49.44 ±  2%      +8.4       49.19        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     40.64            +9.3       49.96 ±  2%      +8.7       49.34 ±  2%      +8.4       49.09        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     21.23            -3.9       17.38 ±  3%      -3.6       17.63 ±  3%      -3.5       17.73 ±  2%  perf-profile.self.cycles-pp.copy_page
      4.99            -0.8        4.14 ±  2%      -0.8        4.17 ±  2%      -0.8        4.22        perf-profile.self.cycles-pp._raw_spin_lock
      5.21            -0.8        4.45 ±  2%      -0.7        4.49 ±  2%      -0.7        4.53        perf-profile.self.cycles-pp.testcase
      2.63            -0.4        2.24 ±  2%      -0.4        2.26 ±  2%      -0.3        2.29        perf-profile.self.cycles-pp.sync_regs
      2.42            -0.4        2.04 ±  2%      -0.3        2.08 ±  3%      -0.3        2.09 ±  2%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.58 ±  2%      -0.2        0.42 ±  3%      -0.2        0.40 ±  2%      -0.1        0.43 ±  3%  perf-profile.self.cycles-pp._compound_head
      0.93 ±  2%      -0.2        0.77 ±  5%      -0.0        0.89 ±  2%      -0.1        0.86 ±  3%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      1.00            -0.1        0.85            -0.2        0.85 ±  2%      -0.2        0.83 ±  2%  perf-profile.self.cycles-pp.___perf_sw_event
      0.93 ±  2%      -0.1        0.78 ±  3%      -0.1        0.79 ±  4%      -0.1        0.80 ±  3%  perf-profile.self.cycles-pp.mem_cgroup_commit_charge
      0.61            -0.1        0.48 ±  3%      -0.1        0.50 ±  4%      -0.1        0.50 ±  3%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.51            -0.1        0.38            -0.1        0.38 ±  2%      -0.1        0.40 ±  2%  perf-profile.self.cycles-pp.free_pages_and_swap_cache
      0.80            -0.1        0.70 ±  2%      -0.1        0.69 ±  3%      -0.1        0.70 ±  2%  perf-profile.self.cycles-pp.__handle_mm_fault
      0.61 ±  2%      -0.1        0.51            -0.1        0.51 ±  2%      -0.1        0.51        perf-profile.self.cycles-pp.lru_add_fn
      0.47            -0.1        0.38            -0.1        0.38            -0.1        0.39 ±  2%  perf-profile.self.cycles-pp.get_page_from_freelist
      0.45            -0.1        0.37 ±  2%      -0.1        0.37 ±  2%      -0.1        0.38        perf-profile.self.cycles-pp.zap_present_ptes
      0.44            -0.1        0.36 ±  4%      -0.1        0.37 ±  4%      -0.1        0.38 ±  3%  perf-profile.self.cycles-pp.xas_load
      0.65            -0.1        0.57 ±  2%      -0.1        0.58 ±  2%      -0.1        0.58 ±  2%  perf-profile.self.cycles-pp.mas_walk
      0.46            -0.1        0.39 ±  2%      -0.1        0.40 ±  2%      -0.1        0.41 ±  3%  perf-profile.self.cycles-pp.handle_mm_fault
      0.44            -0.1        0.38 ±  2%      -0.1        0.38 ±  2%      -0.1        0.39        perf-profile.self.cycles-pp.shmem_get_folio_gfp
      0.52 ±  3%      -0.1        0.46 ±  3%      -0.0        0.51 ±  6%      -0.1        0.46 ±  5%  perf-profile.self.cycles-pp.__count_memcg_events
      0.89 ±  2%      -0.1        0.84            -0.0        0.88 ±  3%      -0.1        0.83 ±  3%  perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
      0.32            -0.1        0.26            -0.1        0.26            -0.0        0.27        perf-profile.self.cycles-pp.__page_cache_release
      0.39            -0.1        0.34 ±  4%      -0.0        0.35 ±  4%      -0.0        0.35 ±  3%  perf-profile.self.cycles-pp.filemap_get_entry
      0.20 ±  4%      -0.1        0.15 ±  5%      -0.1        0.15 ±  3%      -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.page_counter_uncharge
      0.24            -0.0        0.19            -0.0        0.20 ±  2%      -0.0        0.20        perf-profile.self.cycles-pp.folio_remove_rmap_ptes
      0.34 ±  3%      -0.0        0.29 ±  2%      -0.0        0.29 ±  2%      -0.0        0.29 ±  3%  perf-profile.self.cycles-pp.__alloc_pages_noprof
      0.27            -0.0        0.23 ±  3%      -0.0        0.23 ±  3%      -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.free_unref_folios
      0.27 ±  3%      -0.0        0.23 ±  2%      -0.0        0.23 ±  2%      -0.0        0.22 ±  2%  perf-profile.self.cycles-pp.rmqueue
      0.30            -0.0        0.26            -0.0        0.26            -0.0        0.26        perf-profile.self.cycles-pp.do_user_addr_fault
      0.26            -0.0        0.22 ±  2%      -0.0        0.22 ±  2%      -0.0        0.22 ±  4%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.23 ±  3%      -0.0        0.19 ±  4%      -0.0        0.20 ±  5%      -0.0        0.20 ±  3%  perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.22 ±  3%      -0.0        0.19 ±  2%      -0.0        0.19 ±  3%      -0.0        0.20 ±  4%  perf-profile.self.cycles-pp.__pte_offset_map
      0.29            -0.0        0.26            -0.0        0.25 ±  5%      -0.0        0.25        perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.19 ±  2%      -0.0        0.16 ±  2%      -0.0        0.16 ±  4%      -0.0        0.16 ±  4%  perf-profile.self.cycles-pp.__mod_lruvec_state
      0.21 ±  3%      -0.0        0.17 ±  2%      -0.0        0.18 ±  2%      -0.0        0.19 ±  4%  perf-profile.self.cycles-pp.finish_fault
      0.25            -0.0        0.21            -0.0        0.21 ±  3%      -0.0        0.22 ±  2%  perf-profile.self.cycles-pp.error_entry
      0.24            -0.0        0.21 ±  3%      -0.0        0.22            -0.0        0.22        perf-profile.self.cycles-pp.try_charge_memcg
      0.21 ±  2%      -0.0        0.18 ±  4%      -0.0        0.18 ±  2%      -0.0        0.19 ±  2%  perf-profile.self.cycles-pp.folio_add_new_anon_rmap
      0.22            -0.0        0.19 ±  2%      -0.0        0.19 ±  2%      -0.0        0.19 ±  2%  perf-profile.self.cycles-pp.set_pte_range
      0.24 ±  3%      -0.0        0.21 ±  7%      -0.0        0.20 ±  4%      -0.0        0.21 ±  6%  perf-profile.self.cycles-pp._raw_spin_trylock
      0.06            -0.0        0.03 ± 81%      -0.0        0.04 ± 50%      -0.0        0.05        perf-profile.self.cycles-pp.vm_normal_page
      0.23 ±  2%      -0.0        0.20 ±  2%      -0.0        0.20 ±  2%      -0.0        0.20 ±  2%  perf-profile.self.cycles-pp.do_fault
      0.18            -0.0        0.15 ±  2%      -0.0        0.15 ±  2%      -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.free_unref_page_commit
      0.15 ±  2%      -0.0        0.12            -0.0        0.12 ±  6%      -0.0        0.13 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.13 ±  3%      -0.0        0.10 ±  4%      -0.0        0.11 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.__mem_cgroup_charge
      0.18            -0.0        0.15 ±  2%      -0.0        0.16 ±  3%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.down_read_trylock
      0.11 ±  3%      -0.0        0.08 ±  4%      -0.0        0.09            -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.__mod_zone_page_state
      0.19 ±  2%      -0.0        0.17 ±  2%      -0.0        0.16 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.folio_add_lru_vma
      0.19 ±  2%      -0.0        0.17 ±  8%      -0.0        0.17 ±  3%      -0.0        0.17 ±  3%  perf-profile.self.cycles-pp.get_vma_policy
      0.16 ±  2%      -0.0        0.13 ±  3%      -0.0        0.13 ±  5%      -0.0        0.14 ±  2%  perf-profile.self.cycles-pp.folio_unlock
      0.12 ±  3%      -0.0        0.10 ±  6%      -0.0        0.10 ±  6%      -0.0        0.10        perf-profile.self.cycles-pp.perf_exclude_event
      0.19 ±  2%      -0.0        0.17            -0.0        0.17 ±  2%      -0.0        0.17 ±  2%  perf-profile.self.cycles-pp.asm_exc_page_fault
      0.15 ±  2%      -0.0        0.13 ±  3%      -0.0        0.13 ±  3%      -0.0        0.13        perf-profile.self.cycles-pp.folio_put
      0.14 ±  2%      -0.0        0.12            -0.0        0.12 ±  3%      -0.0        0.12        perf-profile.self.cycles-pp.__rmqueue_pcplist
      0.17 ±  2%      -0.0        0.14 ±  5%      -0.0        0.14 ±  2%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.__perf_sw_event
      0.10 ±  3%      -0.0        0.08 ±  7%      -0.0        0.08 ± 11%      -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
      0.15 ±  2%      -0.0        0.13            -0.0        0.13 ±  3%      -0.0        0.13 ±  3%  perf-profile.self.cycles-pp.uncharge_folio
      0.12 ±  3%      -0.0        0.10            -0.0        0.10 ±  3%      -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.alloc_pages_mpol_noprof
      0.11 ±  3%      -0.0        0.09 ±  8%      -0.0        0.09 ±  4%      -0.0        0.09        perf-profile.self.cycles-pp.page_counter_try_charge
      0.17 ±  4%      -0.0        0.15 ±  4%      -0.0        0.15 ±  2%      -0.0        0.15        perf-profile.self.cycles-pp.lock_vma_under_rcu
      0.17 ±  2%      -0.0        0.15 ±  3%      -0.0        0.16 ±  3%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.up_read
      0.11            -0.0        0.09 ±  4%      -0.0        0.09 ±  5%      -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.zap_pte_range
      0.10            -0.0        0.08 ±  4%      -0.0        0.08 ±  5%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.get_pfnblock_flags_mask
      0.16 ±  2%      -0.0        0.15 ±  5%      -0.0        0.15 ±  3%      -0.0        0.15 ±  5%  perf-profile.self.cycles-pp.shmem_fault
      0.10 ±  4%      -0.0        0.08 ±  4%      -0.0        0.08 ±  4%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.__do_fault
      0.12 ±  3%      -0.0        0.10 ±  7%      -0.0        0.10 ±  7%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.exc_page_fault
      0.12 ±  3%      -0.0        0.10 ±  3%      -0.0        0.10 ±  3%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.access_error
      0.12 ±  4%      -0.0        0.10            -0.0        0.10            -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.vma_alloc_folio_noprof
      0.11            -0.0        0.10 ±  5%      -0.0        0.09 ±  4%      -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.perf_swevent_event
      0.09 ±  5%      -0.0        0.08            -0.0        0.08            -0.0        0.08        perf-profile.self.cycles-pp.policy_nodemask
      0.09            -0.0        0.08 ± 13%      -0.0        0.08 ±  5%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.xas_start
      0.10 ±  4%      -0.0        0.09 ±  4%      -0.0        0.09            -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.pte_offset_map_nolock
      0.08 ±  4%      -0.0        0.07            -0.0        0.07 ±  5%      -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.__irqentry_text_end
      0.10            -0.0        0.09            -0.0        0.09 ±  5%      -0.0        0.09        perf-profile.self.cycles-pp.folio_prealloc
      0.09            -0.0        0.08            -0.0        0.08            -0.0        0.08        perf-profile.self.cycles-pp.__cond_resched
      0.38 ±  2%      +0.1        0.47 ±  2%      +0.1        0.46            +0.1        0.44        perf-profile.self.cycles-pp.folio_batch_move_lru
      0.18 ±  2%      +0.2        0.38 ±  6%      +0.2        0.35 ±  7%      +0.2        0.34 ±  4%  perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
     40.64            +9.3       49.96 ±  2%      +8.7       49.34 ±  2%      +8.4       49.08        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


[2]
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/page_fault2/will-it-scale

59142d87ab03b8ff 70a64b7919cbd6c12306051ff28 ff48c71c26aaefb090c108d8803 a94032b35e5f97dc1023030d929
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
  17488267            -3.4%   16886777            -6.0%   16433590            -5.6%   16505101        will-it-scale.224.processes
     78072            -3.4%      75386            -6.0%      73363            -5.6%      73683        will-it-scale.per_process_ops
  17488267            -3.4%   16886777            -6.0%   16433590            -5.6%   16505101        will-it-scale.workload
 5.296e+09            -3.4%  5.116e+09            -6.0%  4.977e+09            -5.6%  4.998e+09        proc-vmstat.numa_hit
 5.291e+09            -3.4%  5.111e+09            -6.0%  4.973e+09            -5.6%  4.995e+09        proc-vmstat.numa_local
 5.285e+09            -3.4%  5.105e+09            -6.0%  4.968e+09            -5.6%  4.989e+09        proc-vmstat.pgalloc_normal
 5.264e+09            -3.4%  5.084e+09            -6.0%  4.947e+09            -5.6%  4.969e+09        proc-vmstat.pgfault
 5.283e+09            -3.4%  5.104e+09            -6.0%  4.967e+09            -5.6%  4.989e+09        proc-vmstat.pgfree
      3067           +20.1%       3685 ±  8%     +19.5%       3665 ±  8%      -0.4%       3056        sched_debug.cfs_rq:/.load.min
      0.07 ± 19%     -12.8%       0.06 ± 14%     -31.1%       0.05 ± 14%      -8.8%       0.06 ± 14%  sched_debug.cfs_rq:/.nr_running.stddev
   1727628 ± 22%      +2.3%    1767491 ± 32%      +8.6%    1876362 ± 25%     -24.1%    1310525 ±  7%  sched_debug.cpu.avg_idle.max
      6058 ± 41%     +71.5%      10389 ±118%     +96.1%      11878 ± 66%     -47.9%       3156 ± 43%  sched_debug.cpu.max_idle_balance_cost.stddev
     17928 ± 11%    +133.0%      41768 ± 36%     +39.4%      24992 ± 57%      +6.3%      19052 ± 15%  sched_debug.cpu.nr_switches.max
      2270 ±  6%     +70.6%       3874 ± 28%     +21.4%       2756 ± 37%      +0.5%       2282 ±  4%  sched_debug.cpu.nr_switches.stddev
   4369255            -9.9%    3934784 ±  8%      -3.0%    4238563 ±  6%      -3.0%    4239325 ±  7%  numa-vmstat.node0.nr_file_pages
     20526 ±  3%     -25.8%      15236 ± 22%     -11.5%      18161 ± 16%      -6.4%      19205 ± 16%  numa-vmstat.node0.nr_mapped
     35617 ±  5%     -27.8%      25727 ± 20%     -12.1%      31303 ± 13%      -9.1%      32375 ± 21%  numa-vmstat.node0.nr_slab_reclaimable
     65089 ± 16%      -8.1%      59820 ± 19%     -19.8%      52215 ±  3%     -18.3%      53200 ±  3%  numa-vmstat.node0.nr_slab_unreclaimable
    738801 ±  3%     -59.2%     301176 ±113%     -17.7%     608173 ± 48%     -18.0%     605778 ± 49%  numa-vmstat.node0.nr_unevictable
    738801 ±  3%     -59.2%     301176 ±113%     -17.7%     608173 ± 48%     -18.0%     605778 ± 49%  numa-vmstat.node0.nr_zone_unevictable
   4024866           +10.9%    4465333 ±  7%      +3.2%    4152344 ±  7%      +3.4%    4163009 ±  7%  numa-vmstat.node1.nr_file_pages
     19132 ± 10%     +51.8%      29044 ± 18%     +22.2%      23371 ± 18%     +17.3%      22446 ± 30%  numa-vmstat.node1.nr_slab_reclaimable
     45845 ± 24%     +12.0%      51337 ± 23%     +28.7%      58982 ±  2%     +26.8%      58122 ±  3%  numa-vmstat.node1.nr_slab_unreclaimable
     30816 ± 81%   +1420.1%     468441 ± 72%    +423.9%     161444 ±184%    +431.7%     163839 ±184%  numa-vmstat.node1.nr_unevictable
     30816 ± 81%   +1420.1%     468441 ± 72%    +423.9%     161444 ±184%    +431.7%     163839 ±184%  numa-vmstat.node1.nr_zone_unevictable
    142458 ±  5%     -27.7%     102968 ± 20%     -12.1%     125181 ± 13%      -9.1%     129506 ± 21%  numa-meminfo.node0.KReclaimable
     81201 ±  3%     -25.4%      60607 ± 21%     -11.8%      71622 ± 16%      -6.6%      75868 ± 16%  numa-meminfo.node0.Mapped
    142458 ±  5%     -27.7%     102968 ± 20%     -12.1%     125181 ± 13%      -9.1%     129506 ± 21%  numa-meminfo.node0.SReclaimable
    260359 ± 16%      -8.1%     239286 ± 19%     -19.8%     208866 ±  3%     -18.3%     212806 ±  3%  numa-meminfo.node0.SUnreclaim
    402817 ± 12%     -15.0%     342254 ± 18%     -17.1%     334047 ±  6%     -15.0%     342313 ±  9%  numa-meminfo.node0.Slab
   2955204 ±  3%     -59.2%    1204704 ±113%     -17.7%    2432692 ± 48%     -18.0%    2423114 ± 49%  numa-meminfo.node0.Unevictable
  16107004           +11.0%   17872044 ±  7%      +3.0%   16587232 ±  7%      +3.3%   16635393 ±  7%  numa-meminfo.node1.FilePages
     76509 ± 10%     +51.9%     116237 ± 18%     +22.1%      93450 ± 18%     +17.4%      89791 ± 30%  numa-meminfo.node1.KReclaimable
     76509 ± 10%     +51.9%     116237 ± 18%     +22.1%      93450 ± 18%     +17.4%      89791 ± 30%  numa-meminfo.node1.SReclaimable
    183385 ± 24%     +12.0%     205353 ± 23%     +28.7%     235933 ±  2%     +26.8%     232488 ±  3%  numa-meminfo.node1.SUnreclaim
    259894 ± 20%     +23.7%     321590 ± 19%     +26.7%     329384 ±  6%     +24.0%     322280 ± 10%  numa-meminfo.node1.Slab
    123266 ± 81%   +1420.1%    1873767 ± 72%    +423.9%     645778 ±184%    +431.7%     655357 ±184%  numa-meminfo.node1.Unevictable
     20.16            -1.4%      19.89            -2.9%      19.57            -2.9%      19.58        perf-stat.i.MPKI
 2.501e+10            -1.7%   2.46e+10            -2.6%  2.436e+10            -2.4%   2.44e+10        perf-stat.i.branch-instructions
  18042153            -0.3%   17981852            -1.9%   17692517            -2.8%   17539874        perf-stat.i.branch-misses
 2.382e+09            -3.3%  2.304e+09            -5.8%  2.244e+09            -5.6%  2.249e+09        perf-stat.i.cache-misses
 2.561e+09            -3.2%  2.479e+09            -5.5%   2.42e+09            -5.3%  2.424e+09        perf-stat.i.cache-references
      5.49            +1.9%       5.59            +3.1%       5.66            +2.8%       5.64        perf-stat.i.cpi
    274.25            +2.9%     282.07            +5.7%     289.98            +5.4%     289.07        perf-stat.i.cycles-between-cache-misses
 1.177e+11            -1.9%  1.155e+11            -2.9%  1.143e+11            -2.7%  1.145e+11        perf-stat.i.instructions
      0.19            -1.9%       0.18            -3.0%       0.18            -2.7%       0.18        perf-stat.i.ipc
    155.11            -3.3%     150.03            -5.9%     145.89            -5.5%     146.59        perf-stat.i.metric.K/sec
  17405977            -3.4%   16819060            -5.9%   16378605            -5.5%   16441964        perf-stat.i.minor-faults
  17405978            -3.4%   16819060            -5.9%   16378606            -5.5%   16441964        perf-stat.i.page-faults
      4.41 ± 50%     +27.3%       5.61            +3.1%       4.54 ± 50%     +28.5%       5.66        perf-stat.overall.cpi
    217.50 ± 50%     +29.2%     280.93            +6.3%     231.09 ± 50%     +32.4%     287.87        perf-stat.overall.cycles-between-cache-misses
   1623235 ± 50%     +26.9%    2060668            +3.4%    1677714 ± 50%     +29.0%    2093187        perf-stat.overall.path-length
      5.48            -0.3        5.15            -0.4        5.10            -0.4        5.11        perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     57.55            -0.3       57.24            -0.4       57.15            -0.3       57.20        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     56.14            -0.2       55.94            -0.3       55.86            -0.2       55.90        perf-profile.calltrace.cycles-pp.testcase
      1.86            -0.1        1.73 ±  2%      -0.1        1.72            -0.2        1.71        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
      1.77            -0.1        1.64 ±  2%      -0.1        1.63            -0.1        1.63        perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault
      1.17            -0.1        1.10            -0.1        1.09            -0.1        1.10        perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     52.55            -0.1       52.49            -0.1       52.42            -0.1       52.47        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     52.62            -0.1       52.56            -0.1       52.48            -0.1       52.54        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
      0.96            -0.0        0.91            -0.0        0.91            -0.0        0.91        perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
      0.71            -0.0        0.68            -0.0        0.67            -0.0        0.67        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
     51.87            -0.0       51.84            -0.1       51.76            -0.0       51.82        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.60            -0.0        0.57            -0.0        0.56            -0.0        0.57        perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault
      4.87            +0.0        4.90            +0.0        4.91            +0.0        4.91        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      4.85            +0.0        4.88            +0.0        4.90            +0.0        4.90        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
      4.86            +0.0        4.90            +0.0        4.91            +0.0        4.91        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      4.86            +0.0        4.89            +0.1        4.91            +0.0        4.91        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      4.77            +0.0        4.80            +0.1        4.83            +0.1        4.82        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
     37.74            +0.2       37.98            +0.3       38.04            +0.3       38.01        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
     37.74            +0.2       37.98            +0.3       38.04            +0.3       38.01        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
     37.74            +0.2       37.98            +0.3       38.04            +0.3       38.01        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
     37.73            +0.2       37.97            +0.3       38.04            +0.3       38.01        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
     37.27            +0.3       37.53            +0.3       37.60            +0.3       37.57        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
     37.28            +0.3       37.54            +0.3       37.61            +0.3       37.58        perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
     37.28            +0.3       37.54            +0.3       37.61            +0.3       37.58        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
     36.72            +0.3       36.98            +0.4       37.08            +0.3       37.04        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu
     37.15            +0.3       37.41            +0.3       37.49            +0.3       37.46        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
     42.65            +0.3       42.92            +0.4       43.00            +0.3       42.97        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
     42.65            +0.3       42.92            +0.4       43.00            +0.3       42.97        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.65            +0.3       42.92            +0.4       43.00            +0.3       42.97        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.65            +0.3       42.92            +0.4       43.00            +0.3       42.97        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     42.65            +0.3       42.92            +0.4       43.00            +0.3       42.97        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     42.65            +0.3       42.92            +0.4       43.00            +0.3       42.97        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.65            +0.3       42.92            +0.4       43.00            +0.3       42.97        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     42.65            +0.3       42.92            +0.4       43.00            +0.3       42.97        perf-profile.calltrace.cycles-pp.__munmap
     41.26            +0.3       41.56            +0.4       41.68            +0.4       41.64        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
     41.26            +0.3       41.56            +0.4       41.68            +0.4       41.63        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
     41.23            +0.3       41.53            +0.4       41.66            +0.4       41.61        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
     43.64            +0.5       44.09            +0.4       44.05            +0.5       44.12        perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     41.57            +0.6       42.17            +0.6       42.14            +0.6       42.22        perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
     40.93            +0.6       41.56            +0.6       41.53            +0.7       41.59        perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault
     40.84            +0.6       41.48            +0.6       41.44            +0.7       41.50        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault
     40.16            +0.7       40.83            +0.6       40.80            +0.7       40.87        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma
     40.19            +0.7       40.85            +0.6       40.83            +0.7       40.89        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range
     40.19            +0.7       40.85            +0.6       40.83            +0.7       40.89        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
      5.49            -0.3        5.16            -0.4        5.12            -0.4        5.12        perf-profile.children.cycles-pp.copy_page
     57.05            -0.3       56.79            -0.4       56.70            -0.3       56.75        perf-profile.children.cycles-pp.testcase
     55.66            -0.2       55.44            -0.3       55.36            -0.2       55.41        perf-profile.children.cycles-pp.asm_exc_page_fault
      1.88            -0.1        1.75 ±  2%      -0.1        1.74            -0.2        1.73        perf-profile.children.cycles-pp.__pte_offset_map_lock
      1.79            -0.1        1.66 ±  2%      -0.1        1.64            -0.1        1.64        perf-profile.children.cycles-pp._raw_spin_lock
      1.19            -0.1        1.11            -0.1        1.11            -0.1        1.11        perf-profile.children.cycles-pp.folio_prealloc
     52.64            -0.1       52.57            -0.1       52.49            -0.1       52.55        perf-profile.children.cycles-pp.exc_page_fault
      0.96            -0.1        0.91            -0.1        0.91            -0.1        0.91        perf-profile.children.cycles-pp.sync_regs
     52.57            -0.0       52.52            -0.1       52.44            -0.1       52.50        perf-profile.children.cycles-pp.do_user_addr_fault
      0.73            -0.0        0.69            -0.0        0.68            -0.0        0.68        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      0.63            -0.0        0.60            -0.0        0.59            -0.0        0.59        perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
      0.55            -0.0        0.52            -0.0        0.51            -0.0        0.51        perf-profile.children.cycles-pp.__alloc_pages_noprof
     51.89            -0.0       51.86            -0.1       51.78            -0.0       51.84        perf-profile.children.cycles-pp.handle_mm_fault
      1.02            -0.0        0.99            -0.0        0.99            -0.0        0.98        perf-profile.children.cycles-pp.native_irq_return_iret
      0.46            -0.0        0.43 ±  2%      -0.0        0.44            -0.0        0.43        perf-profile.children.cycles-pp.shmem_fault
      0.39            -0.0        0.36 ±  2%      -0.0        0.36            -0.0        0.38        perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.51            -0.0        0.48 ±  2%      -0.0        0.49            -0.0        0.48        perf-profile.children.cycles-pp.__do_fault
      0.38            -0.0        0.36            -0.0        0.35            -0.0        0.36        perf-profile.children.cycles-pp.lru_add_fn
      0.51            -0.0        0.49            -0.0        0.50            -0.0        0.48        perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.36            -0.0        0.34            -0.0        0.34            -0.0        0.34        perf-profile.children.cycles-pp.___perf_sw_event
      0.42            -0.0        0.40 ±  2%      -0.0        0.40            -0.0        0.39        perf-profile.children.cycles-pp.__perf_sw_event
      0.41            -0.0        0.39            -0.0        0.39            -0.0        0.39        perf-profile.children.cycles-pp.get_page_from_freelist
      0.25 ±  2%      -0.0        0.23            -0.0        0.24 ±  2%      -0.0        0.23        perf-profile.children.cycles-pp.filemap_get_entry
      0.42            -0.0        0.41            -0.0        0.40            -0.0        0.40        perf-profile.children.cycles-pp.zap_present_ptes
      0.14 ±  2%      -0.0        0.12 ±  3%      -0.0        0.12 ±  3%      -0.0        0.13        perf-profile.children.cycles-pp.xas_load
      0.21 ±  2%      -0.0        0.20            -0.0        0.19 ±  2%      -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.26            -0.0        0.25            -0.0        0.24            -0.0        0.24        perf-profile.children.cycles-pp.__mod_lruvec_state
      0.27            -0.0        0.26 ±  2%      -0.0        0.26            -0.0        0.26        perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.11            -0.0        0.10            -0.0        0.09 ±  5%      -0.0        0.10        perf-profile.children.cycles-pp._compound_head
      0.23 ±  2%      -0.0        0.22 ±  2%      -0.0        0.22            -0.0        0.21        perf-profile.children.cycles-pp.rmqueue
      0.09            -0.0        0.08            -0.0        0.08            -0.0        0.08        perf-profile.children.cycles-pp.scheduler_tick
      0.12            -0.0        0.11            -0.0        0.11            -0.0        0.11        perf-profile.children.cycles-pp.tick_nohz_handler
      0.21            -0.0        0.20            -0.0        0.20            -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.16            -0.0        0.15 ±  2%      -0.0        0.15            -0.0        0.15        perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.11            -0.0        0.10 ±  3%      -0.0        0.10            -0.0        0.10        perf-profile.children.cycles-pp.update_process_times
      0.14 ±  3%      -0.0        0.14 ±  3%      -0.0        0.13            -0.0        0.13 ±  3%  perf-profile.children.cycles-pp.try_charge_memcg
      0.15            -0.0        0.14 ±  2%      -0.0        0.14 ±  2%      -0.0        0.14        perf-profile.children.cycles-pp.hrtimer_interrupt
      0.06            -0.0        0.06 ±  8%      -0.0        0.05 ±  7%      -0.0        0.05        perf-profile.children.cycles-pp.task_tick_fair
      0.16 ±  2%      -0.0        0.16            -0.0        0.15            -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
      0.07            +0.0        0.08 ±  6%      +0.0        0.08            +0.0        0.08        perf-profile.children.cycles-pp.folio_add_lru
      4.88            +0.0        4.91            +0.0        4.93            +0.0        4.93        perf-profile.children.cycles-pp.tlb_finish_mmu
     37.74            +0.2       37.98            +0.3       38.04            +0.3       38.01        perf-profile.children.cycles-pp.unmap_page_range
     37.74            +0.2       37.98            +0.3       38.04            +0.3       38.01        perf-profile.children.cycles-pp.unmap_vmas
     37.74            +0.2       37.98            +0.3       38.04            +0.3       38.01        perf-profile.children.cycles-pp.zap_pmd_range
     37.74            +0.2       37.98            +0.3       38.04            +0.3       38.01        perf-profile.children.cycles-pp.zap_pte_range
     37.28            +0.3       37.54            +0.3       37.61            +0.3       37.58        perf-profile.children.cycles-pp.tlb_flush_mmu
     42.65            +0.3       42.92            +0.3       43.00            +0.3       42.97        perf-profile.children.cycles-pp.__vm_munmap
     42.65            +0.3       42.92            +0.3       43.00            +0.3       42.97        perf-profile.children.cycles-pp.__x64_sys_munmap
     42.65            +0.3       42.92            +0.4       43.00            +0.3       42.97        perf-profile.children.cycles-pp.__munmap
     42.65            +0.3       42.92            +0.4       43.00            +0.3       42.97        perf-profile.children.cycles-pp.unmap_region
     42.65            +0.3       42.93            +0.4       43.01            +0.3       42.98        perf-profile.children.cycles-pp.do_vmi_align_munmap
     42.65            +0.3       42.93            +0.4       43.01            +0.3       42.98        perf-profile.children.cycles-pp.do_vmi_munmap
     42.86            +0.3       43.14            +0.4       43.22            +0.3       43.18        perf-profile.children.cycles-pp.do_syscall_64
     42.86            +0.3       43.14            +0.4       43.22            +0.3       43.19        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     42.15            +0.3       42.44            +0.4       42.54            +0.3       42.50        perf-profile.children.cycles-pp.free_pages_and_swap_cache
     42.12            +0.3       42.41            +0.4       42.50            +0.3       42.46        perf-profile.children.cycles-pp.folios_put_refs
     42.15            +0.3       42.45            +0.4       42.54            +0.3       42.50        perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
     41.51            +0.3       41.80            +0.4       41.93            +0.4       41.89        perf-profile.children.cycles-pp.__page_cache_release
     43.66            +0.5       44.12            +0.4       44.08            +0.5       44.15        perf-profile.children.cycles-pp.finish_fault
     41.59            +0.6       42.19            +0.6       42.16            +0.6       42.24        perf-profile.children.cycles-pp.set_pte_range
     40.94            +0.6       41.57            +0.6       41.53            +0.7       41.59        perf-profile.children.cycles-pp.folio_add_lru_vma
     40.99            +0.6       41.63            +0.6       41.60            +0.7       41.66        perf-profile.children.cycles-pp.folio_batch_move_lru
     81.57            +1.0       82.53            +1.1       82.62            +1.1       82.65        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     81.60            +1.0       82.56            +1.1       82.66            +1.1       82.68        perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     81.59            +1.0       82.56            +1.1       82.66            +1.1       82.68        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      5.47            -0.3        5.14            -0.4        5.10            -0.4        5.10        perf-profile.self.cycles-pp.copy_page
      1.77            -0.1        1.65 ±  2%      -0.1        1.63            -0.1        1.63        perf-profile.self.cycles-pp._raw_spin_lock
      2.19            -0.1        2.08            -0.1        2.08            -0.1        2.07        perf-profile.self.cycles-pp.testcase
      0.96            -0.0        0.91            -0.0        0.91            -0.0        0.91        perf-profile.self.cycles-pp.sync_regs
      1.02            -0.0        0.99            -0.0        0.99            -0.0        0.98        perf-profile.self.cycles-pp.native_irq_return_iret
      0.28 ±  2%      -0.0        0.26 ±  2%      +0.0        0.29 ±  2%      +0.0        0.30 ±  2%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      0.19 ±  2%      -0.0        0.17 ±  2%      -0.0        0.17 ±  2%      -0.0        0.17        perf-profile.self.cycles-pp.get_page_from_freelist
      0.20            -0.0        0.19            -0.0        0.18 ±  2%      -0.0        0.19 ±  2%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.28            -0.0        0.27 ±  3%      -0.0        0.27            -0.0        0.26        perf-profile.self.cycles-pp.___perf_sw_event
      0.16 ±  2%      -0.0        0.15 ±  2%      -0.0        0.15            -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.handle_mm_fault
      0.06            -0.0        0.05            -0.0        0.05            -0.0        0.05        perf-profile.self.cycles-pp.down_read_trylock
      0.09            -0.0        0.08            -0.0        0.08            -0.0        0.08        perf-profile.self.cycles-pp.folio_add_new_anon_rmap
      0.11            -0.0        0.10 ±  3%      -0.0        0.10            -0.0        0.11 ±  3%  perf-profile.self.cycles-pp.xas_load
      0.16            -0.0        0.15 ±  2%      -0.0        0.15 ±  2%      -0.0        0.15        perf-profile.self.cycles-pp.mas_walk
      0.12 ±  4%      -0.0        0.11 ±  3%      +0.0        0.12            -0.0        0.10        perf-profile.self.cycles-pp.filemap_get_entry
      0.11 ±  3%      -0.0        0.11 ±  4%      -0.0        0.10 ±  4%      -0.0        0.10        perf-profile.self.cycles-pp.free_pages_and_swap_cache
      0.11            -0.0        0.11 ±  4%      -0.0        0.10            -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.error_entry
      0.09 ±  4%      -0.0        0.09            -0.0        0.08            -0.0        0.09 ±  4%  perf-profile.self.cycles-pp._compound_head
      0.21            +0.0        0.21            -0.0        0.20            -0.0        0.20        perf-profile.self.cycles-pp.folios_put_refs
      0.12            +0.0        0.12            -0.0        0.11            +0.0        0.12        perf-profile.self.cycles-pp.do_fault
      0.00            +0.0        0.00            +0.1        0.05            +0.0        0.00        perf-profile.self.cycles-pp.folio_unlock
     81.57            +1.0       82.53            +1.1       82.62            +1.1       82.65        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


> 
> Shakeel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-21  2:43           ` Oliver Sang
@ 2024-05-22  4:18             ` Shakeel Butt
  2024-05-23  7:48               ` Oliver Sang
  0 siblings, 1 reply; 15+ messages in thread
From: Shakeel Butt @ 2024-05-22  4:18 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin

On Tue, May 21, 2024 at 10:43:16AM +0800, Oliver Sang wrote:
> hi, Shakeel,
> 
[...]
> 
> we reported regression on a 2-node Skylake server. so I found a 1-node Skylake
> desktop (we don't have 1 node server) to check.
> 

Please try the following patch on both single node and dual node
machines:


From 00a84b489b9e18abd1b8ec575ea31afacaf0734b Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeel.butt@linux.dev>
Date: Tue, 21 May 2024 20:27:11 -0700
Subject: [PATCH] memcg: rearrage fields of mem_cgroup_per_node

At the moment the fields of mem_cgroup_per_node which get read on the
performance critical path share the cacheline with the fields which
might get updated. This cause contention of that cacheline for
concurrent readers. Let's move all the read only pointers at the start
of the struct, followed by memcg-v1 only fields and at the end fields
which get updated often.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 include/linux/memcontrol.h | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 030d34e9d117..16efd9737be9 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -96,23 +96,25 @@ struct mem_cgroup_reclaim_iter {
  * per-node information in memory controller.
  */
 struct mem_cgroup_per_node {
-	struct lruvec		lruvec;
+	/* Keep the read-only fields at the start */
+	struct mem_cgroup	*memcg;		/* Back pointer, we cannot */
+						/* use container_of	   */
 
 	struct lruvec_stats_percpu __percpu	*lruvec_stats_percpu;
 	struct lruvec_stats			*lruvec_stats;
-
-	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
-
-	struct mem_cgroup_reclaim_iter	iter;
-
 	struct shrinker_info __rcu	*shrinker_info;
 
+	/* memcg-v1 only stuff in middle */
+
 	struct rb_node		tree_node;	/* RB tree node */
 	unsigned long		usage_in_excess;/* Set to the value by which */
 						/* the soft limit is exceeded*/
 	bool			on_tree;
-	struct mem_cgroup	*memcg;		/* Back pointer, we cannot */
-						/* use container_of	   */
+
+	/* Fields which get updated often at the end. */
+	struct lruvec		lruvec;
+	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
+	struct mem_cgroup_reclaim_iter	iter;
 };
 
 struct mem_cgroup_threshold {
-- 
2.43.0



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-22  4:18             ` Shakeel Butt
@ 2024-05-23  7:48               ` Oliver Sang
  2024-05-23 16:47                 ` Shakeel Butt
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Sang @ 2024-05-23  7:48 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin, oliver.sang

[-- Attachment #1: Type: text/plain, Size: 7554 bytes --]

hi, Shakeel,

On Tue, May 21, 2024 at 09:18:19PM -0700, Shakeel Butt wrote:
> On Tue, May 21, 2024 at 10:43:16AM +0800, Oliver Sang wrote:
> > hi, Shakeel,
> > 
> [...]
> > 
> > we reported regression on a 2-node Skylake server. so I found a 1-node Skylake
> > desktop (we don't have 1 node server) to check.
> > 
> 
> Please try the following patch on both single node and dual node
> machines:


the regression is partially recovered by applying your patch.
(but one even more regression case as below)

details:

since you mentioned the whole patch-set behavior last time, I applied the
patch upon
  a94032b35e5f9 memcg: use proper type for mod_memcg_state

below fd2296741e2686ed6ecd05187e4 = a94032b35e5f9 + patch


for the regression in our original report, test machine is:

model: Skylake
nr_node: 2
nr_cpu: 104
memory: 192G

regression partially recovered:

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale

59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     91713           -13.0%      79833            -4.5%      87614        will-it-scale.per_process_ops

detail data is in part [1] in attachment.


in later threads, we also reported similar regression on other platforms.

on:
model: Ice Lake
nr_node: 2
nr_cpu: 64
memory: 256G
brand: Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz

regression partially recovered but not so obvious as above:

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/page_fault2/will-it-scale

59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    240373           -12.9%     209394           -10.1%     215996        will-it-scale.per_process_ops

detail data is in part [2] in attachment.


on:
model: Sapphire Rapids
nr_node: 2
nr_cpu: 224
memory: 512G
brand: Intel(R) Xeon(R) Platinum 8480CTDX


regression NOT recovered, even a little worse:

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/page_fault2/will-it-scale

59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     78072            -5.6%      73683            -6.5%      72975        will-it-scale.per_process_ops

detail data is in part [3] in attachment.


for single node machine, we reported last time no regression on:

model: Skylake
nr_node: 1
nr_cpu: 36
memory: 32G
brand: Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz

we confirmed it's not impacted by this new patch, either:

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-d08/page_fault2/will-it-scale

59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    136040            -0.1%     135881            -0.1%     135953        will-it-scale.per_process_ops

if you need detail data for this comparison, please let us know.


BTW, after last update, we found another single node machine which can reproduce
the regression in our original report:

model: Cascade Lake
nr_node: 1
nr_cpu: 36
memory: 128G
brand: Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz

the regression is also partially recovered now:

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-csl-d02/page_fault2/will-it-scale

59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    187483           -19.4%     151162           -12.1%     164714        will-it-scale.per_process_ops

detail data is in part [4] in attachment.

> 
> 
> From 00a84b489b9e18abd1b8ec575ea31afacaf0734b Mon Sep 17 00:00:00 2001
> From: Shakeel Butt <shakeel.butt@linux.dev>
> Date: Tue, 21 May 2024 20:27:11 -0700
> Subject: [PATCH] memcg: rearrage fields of mem_cgroup_per_node
> 
> At the moment the fields of mem_cgroup_per_node which get read on the
> performance critical path share the cacheline with the fields which
> might get updated. This cause contention of that cacheline for
> concurrent readers. Let's move all the read only pointers at the start
> of the struct, followed by memcg-v1 only fields and at the end fields
> which get updated often.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
>  include/linux/memcontrol.h | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 030d34e9d117..16efd9737be9 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -96,23 +96,25 @@ struct mem_cgroup_reclaim_iter {
>   * per-node information in memory controller.
>   */
>  struct mem_cgroup_per_node {
> -	struct lruvec		lruvec;
> +	/* Keep the read-only fields at the start */
> +	struct mem_cgroup	*memcg;		/* Back pointer, we cannot */
> +						/* use container_of	   */
>  
>  	struct lruvec_stats_percpu __percpu	*lruvec_stats_percpu;
>  	struct lruvec_stats			*lruvec_stats;
> -
> -	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
> -
> -	struct mem_cgroup_reclaim_iter	iter;
> -
>  	struct shrinker_info __rcu	*shrinker_info;
>  
> +	/* memcg-v1 only stuff in middle */
> +
>  	struct rb_node		tree_node;	/* RB tree node */
>  	unsigned long		usage_in_excess;/* Set to the value by which */
>  						/* the soft limit is exceeded*/
>  	bool			on_tree;
> -	struct mem_cgroup	*memcg;		/* Back pointer, we cannot */
> -						/* use container_of	   */
> +
> +	/* Fields which get updated often at the end. */
> +	struct lruvec		lruvec;
> +	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
> +	struct mem_cgroup_reclaim_iter	iter;
>  };
>  
>  struct mem_cgroup_threshold {
> -- 
> 2.43.0
> 
> 

[-- Attachment #2: detail-comparison --]
[-- Type: text/plain, Size: 136381 bytes --]

[1]

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/page_fault2/will-it-scale

59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
 1.646e+08            +7.6%  1.772e+08 ± 14%     +34.5%  2.215e+08 ± 20%  cpuidle..time
     41.99 ± 16%     -24.4%      31.73 ± 16%     -25.2%      31.39 ± 12%  sched_debug.cfs_rq:/.removed.load_avg.stddev
     34.17            -0.9%      33.87            -0.2%      34.12        boot-time.boot
      3182            -1.0%       3151            -0.2%       3176        boot-time.idle
     21099 ±  5%     -16.5%      17627 ±  2%      -7.4%      19540 ±  3%  perf-c2c.DRAM.local
      4025 ±  2%     +31.3%       5285 ±  4%     -14.7%       3432 ±  2%  perf-c2c.HITM.local
      0.44 ± 24%      +0.1        0.58            +0.2        0.65 ± 20%  mpstat.cpu.all.idle%
      0.01 ± 23%      +0.0        0.01 ±  9%      +0.0        0.02 ±  6%  mpstat.cpu.all.soft%
      7.14            -0.9        6.23            -0.3        6.79        mpstat.cpu.all.usr%
   9538291           -13.0%    8302761            -4.5%    9111939        will-it-scale.104.processes
     91713           -13.0%      79833            -4.5%      87614        will-it-scale.per_process_ops
   9538291           -13.0%    8302761            -4.5%    9111939        will-it-scale.workload
 1.438e+09           -12.9%  1.253e+09            -4.2%  1.378e+09        numa-numastat.node0.local_node
  1.44e+09           -12.9%  1.254e+09            -4.2%   1.38e+09        numa-numastat.node0.numa_hit
 1.453e+09           -13.1%  1.263e+09            -4.9%  1.382e+09        numa-numastat.node1.local_node
 1.454e+09           -12.9%  1.265e+09            -4.8%  1.384e+09        numa-numastat.node1.numa_hit
  1.44e+09           -12.9%  1.254e+09            -4.2%   1.38e+09        numa-vmstat.node0.numa_hit
 1.438e+09           -12.9%  1.253e+09            -4.2%  1.378e+09        numa-vmstat.node0.numa_local
 1.454e+09           -12.9%  1.265e+09            -4.8%  1.384e+09        numa-vmstat.node1.numa_hit
 1.453e+09           -13.1%  1.263e+09            -4.9%  1.382e+09        numa-vmstat.node1.numa_local
 2.894e+09           -12.9%   2.52e+09            -4.5%  2.764e+09        proc-vmstat.numa_hit
 2.891e+09           -13.0%  2.516e+09            -4.5%   2.76e+09        proc-vmstat.numa_local
  2.88e+09           -12.9%  2.509e+09            -4.5%  2.752e+09        proc-vmstat.pgalloc_normal
 2.869e+09           -12.9%  2.499e+09            -4.5%  2.741e+09        proc-vmstat.pgfault
  2.88e+09           -12.9%  2.509e+09            -4.5%  2.751e+09        proc-vmstat.pgfree
     17.51            -3.2%      16.95            -1.5%      17.23        perf-stat.i.MPKI
 9.457e+09            -9.7%  8.542e+09            -3.1%  9.165e+09        perf-stat.i.branch-instructions
  45022022            -9.0%   40951240            -2.6%   43850606        perf-stat.i.branch-misses
     84.38            -5.7       78.65            -3.2       81.15        perf-stat.i.cache-miss-rate%
 8.353e+08           -12.9%  7.271e+08            -4.6%  7.969e+08        perf-stat.i.cache-misses
 9.877e+08            -6.6%  9.224e+08            -0.8%  9.799e+08        perf-stat.i.cache-references
      6.06           +11.3%       6.75            +3.2%       6.26        perf-stat.i.cpi
    136.25            -1.1%     134.73            -0.1%     136.12        perf-stat.i.cpu-migrations
    348.56           +14.9%     400.65            +4.9%     365.77        perf-stat.i.cycles-between-cache-misses
 4.763e+10           -10.1%  4.285e+10            -3.1%  4.617e+10        perf-stat.i.instructions
      0.17            -9.9%       0.15            -3.2%       0.16        perf-stat.i.ipc
    182.56           -12.9%     158.99            -4.5%     174.33        perf-stat.i.metric.K/sec
   9494393           -12.9%    8270117            -4.5%    9066901        perf-stat.i.minor-faults
   9494393           -12.9%    8270117            -4.5%    9066902        perf-stat.i.page-faults
     17.54            -3.2%      16.98            -1.6%      17.27        perf-stat.overall.MPKI
     84.57            -5.7       78.84            -3.2       81.34        perf-stat.overall.cache-miss-rate%
      6.07           +11.2%       6.76            +3.2%       6.27        perf-stat.overall.cpi
    346.33           +14.9%     397.97            +4.8%     362.97        perf-stat.overall.cycles-between-cache-misses
      0.16           -10.1%       0.15            -3.1%       0.16        perf-stat.overall.ipc
   1503802            +3.5%    1555989            +1.7%    1528933        perf-stat.overall.path-length
 9.424e+09            -9.7%  8.509e+09            -3.1%  9.133e+09        perf-stat.ps.branch-instructions
  44739120            -9.2%   40645392            -2.6%   43568159        perf-stat.ps.branch-misses
 8.326e+08           -13.0%  7.247e+08            -4.6%  7.945e+08        perf-stat.ps.cache-misses
 9.846e+08            -6.6%  9.193e+08            -0.8%  9.768e+08        perf-stat.ps.cache-references
    134.98            -1.1%     133.49            -0.1%     134.89        perf-stat.ps.cpu-migrations
 4.747e+10           -10.1%  4.268e+10            -3.1%  4.601e+10        perf-stat.ps.instructions
   9463902           -12.9%    8241837            -4.5%    9037920        perf-stat.ps.minor-faults
   9463902           -12.9%    8241837            -4.5%    9037920        perf-stat.ps.page-faults
 1.434e+13            -9.9%  1.292e+13            -2.9%  1.393e+13        perf-stat.total.instructions
     64.15            -2.5       61.69            -0.9       63.21        perf-profile.calltrace.cycles-pp.testcase
     58.30            -1.9       56.36            -0.7       57.58        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     52.64            -1.3       51.29            -0.5       52.17        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
     52.50            -1.3       51.18            -0.5       52.05        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     50.81            -1.0       49.86            -0.2       50.64        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      9.27            -0.9        8.36            -0.4        8.83 ±  2%  perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     49.86            -0.8       49.02            -0.1       49.76        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     49.21            -0.8       48.45            -0.1       49.14        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      0.60 ±  4%      -0.6        0.00            -0.2        0.35 ± 70%  perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault
      3.24            -0.5        2.73            -0.3        2.98        perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      5.15            -0.5        4.65            -0.2        4.94        perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
      0.82            -0.3        0.53            -0.3        0.56        perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      1.68            -0.3        1.43            -0.2        1.51        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      1.50 ±  2%      -0.2        1.26 ±  3%      -0.1        1.42        perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      2.52            -0.2        2.27            -0.1        2.40        perf-profile.calltrace.cycles-pp.error_entry.testcase
      1.85            -0.2        1.68            -0.1        1.78 ±  2%  perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
      1.55            -0.1        1.42            -0.1        1.49 ±  3%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault
      1.07            -0.1        0.95            -0.1        1.00        perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault
      0.68            -0.1        0.56 ±  2%      -0.1        0.61        perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
      0.55            -0.1        0.42 ± 44%      -0.0        0.53 ±  2%  perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc
      0.90            -0.1        0.80            -0.0        0.86        perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
      0.89            -0.1        0.84            -0.0        0.88        perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault
      1.23            -0.0        1.21            +0.0        1.27        perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      1.15            -0.0        1.13            +0.0        1.19        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault
      0.96            +0.0        0.96            +0.1        1.01        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault
      0.73 ±  2%      +0.0        0.75            +0.1        0.79        perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
      1.00            +0.1        1.06            +0.1        1.08        perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      3.85            +0.2        4.09            +0.1        3.95        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      3.85            +0.2        4.09            +0.1        3.95        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      3.85            +0.2        4.09            +0.1        3.96        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      3.82            +0.2        4.07            +0.1        3.92        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
      3.68            +0.3        3.93            +0.1        3.80        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
      0.83            +0.3        1.12 ±  2%      +0.3        1.14        perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault.__handle_mm_fault
      0.00            +0.6        0.56 ±  3%      +0.3        0.34 ± 70%  perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range
     31.81            +0.6       32.44            +0.4       32.22        perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault
     31.69            +0.6       32.33            +0.4       32.11        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault
     30.47            +0.6       31.11            +0.4       30.90        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range
     30.48            +0.6       31.13            +0.4       30.91        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
     30.44            +0.7       31.09            +0.4       30.88        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma
      0.00            +0.7        0.68 ±  2%      +0.6        0.63        perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
     35.03            +0.7       35.76            +0.6       35.66        perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     32.87            +0.9       33.79            +0.7       33.58        perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
     29.54            +2.3       31.84            +0.9       30.39        perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
     29.54            +2.3       31.84            +0.9       30.39        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
     29.53            +2.3       31.83            +0.9       30.39        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
     30.66            +2.3       32.98            +0.9       31.57        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
     30.66            +2.3       32.98            +0.9       31.57        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
     30.66            +2.3       32.98            +0.9       31.57        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
     30.66            +2.3       32.98            +0.9       31.57        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
     29.26            +2.4       31.64            +0.9       30.16        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
     28.41            +2.4       30.83            +1.0       29.39        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu
     34.56            +2.6       37.12            +1.0       35.57        perf-profile.calltrace.cycles-pp.__munmap
     34.55            +2.6       37.12            +1.0       35.57        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     34.55            +2.6       37.12            +1.0       35.57        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     34.55            +2.6       37.12            +1.0       35.57        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     34.55            +2.6       37.12            +1.0       35.57        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     34.56            +2.6       37.12            +1.0       35.57        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     34.56            +2.6       37.12            +1.0       35.57        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     34.55            +2.6       37.11            +1.0       35.56        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
     31.41            +2.8       34.25            +1.1       32.55        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
     31.38            +2.9       34.24            +1.1       32.53        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
     31.42            +2.9       34.28            +1.1       32.56        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
     65.26            -2.6       62.67            -1.0       64.26        perf-profile.children.cycles-pp.testcase
     56.09            -1.7       54.39            -0.6       55.47        perf-profile.children.cycles-pp.asm_exc_page_fault
     52.66            -1.3       51.31            -0.5       52.19        perf-profile.children.cycles-pp.exc_page_fault
     52.52            -1.3       51.20            -0.5       52.07        perf-profile.children.cycles-pp.do_user_addr_fault
     50.83            -1.0       49.88            -0.2       50.66        perf-profile.children.cycles-pp.handle_mm_fault
      9.35            -0.9        8.44            -0.4        8.91 ±  2%  perf-profile.children.cycles-pp.copy_page
     49.87            -0.8       49.03            -0.1       49.77        perf-profile.children.cycles-pp.__handle_mm_fault
     49.23            -0.8       48.47            -0.1       49.16        perf-profile.children.cycles-pp.do_fault
      3.27            -0.5        2.76            -0.3        3.01        perf-profile.children.cycles-pp.folio_prealloc
      5.15            -0.5        4.65            -0.2        4.94        perf-profile.children.cycles-pp.__irqentry_text_end
      0.82            -0.3        0.53            -0.3        0.57        perf-profile.children.cycles-pp.lock_vma_under_rcu
      1.52 ±  2%      -0.3        1.26 ±  3%      -0.1        1.43        perf-profile.children.cycles-pp.__mem_cgroup_charge
      1.69            -0.2        1.44            -0.2        1.52        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      2.54            -0.2        2.29            -0.1        2.43        perf-profile.children.cycles-pp.error_entry
      0.57            -0.2        0.33            -0.2        0.34        perf-profile.children.cycles-pp.mas_walk
      1.87            -0.2        1.70            -0.1        1.80 ±  2%  perf-profile.children.cycles-pp.__pte_offset_map_lock
      0.60 ±  4%      -0.2        0.44 ±  6%      -0.1        0.52 ±  5%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
      1.57            -0.1        1.43            -0.1        1.51 ±  3%  perf-profile.children.cycles-pp._raw_spin_lock
      1.12            -0.1        0.99            -0.1        1.04        perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
      0.70            -0.1        0.57 ±  2%      -0.1        0.62        perf-profile.children.cycles-pp.lru_add_fn
      0.95            -0.1        0.82 ±  5%      +0.3        1.22 ±  2%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      1.16            -0.1        1.04            -0.0        1.11        perf-profile.children.cycles-pp.native_irq_return_iret
      0.94            -0.1        0.84            -0.0        0.90        perf-profile.children.cycles-pp.sync_regs
      0.43            -0.1        0.34 ±  2%      -0.0        0.39        perf-profile.children.cycles-pp.free_unref_folios
      0.96            -0.1        0.87            -0.0        0.92        perf-profile.children.cycles-pp.__perf_sw_event
      0.44            -0.1        0.36            -0.1        0.39        perf-profile.children.cycles-pp.get_vma_policy
      0.21 ±  3%      -0.1        0.13 ±  2%      -0.0        0.16 ±  2%  perf-profile.children.cycles-pp._compound_head
      0.75            -0.1        0.68            -0.0        0.72        perf-profile.children.cycles-pp.___perf_sw_event
      0.94            -0.1        0.88            -0.0        0.92        perf-profile.children.cycles-pp.__alloc_pages_noprof
      0.44 ±  5%      -0.1        0.37 ±  7%      -0.0        0.42 ±  6%  perf-profile.children.cycles-pp.__count_memcg_events
      0.31            -0.1        0.24 ±  2%      -0.0        0.28 ±  3%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
      0.41 ±  4%      -0.1        0.35 ±  7%      -0.0        0.40 ±  5%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      0.57            -0.0        0.52            -0.0        0.55 ±  2%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.17 ±  2%      -0.0        0.12 ±  4%      -0.0        0.15 ±  3%  perf-profile.children.cycles-pp.uncharge_batch
      0.19 ±  3%      -0.0        0.15 ±  8%      -0.0        0.18 ±  2%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.15 ±  2%      -0.0        0.12 ±  4%      -0.0        0.13 ±  3%  perf-profile.children.cycles-pp.free_unref_page_commit
      0.32 ±  3%      -0.0        0.29 ±  2%      -0.0        0.30 ±  2%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.13 ±  3%      -0.0        0.10 ±  5%      -0.0        0.11 ±  3%  perf-profile.children.cycles-pp.page_counter_uncharge
      0.13 ±  2%      -0.0        0.10 ±  4%      -0.0        0.12 ±  6%  perf-profile.children.cycles-pp.__mod_zone_page_state
      0.10 ±  3%      -0.0        0.07 ±  5%      -0.0        0.09 ±  5%  perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      0.08            -0.0        0.05            -0.0        0.05 ±  8%  perf-profile.children.cycles-pp.policy_nodemask
      1.24            -0.0        1.21            +0.0        1.28        perf-profile.children.cycles-pp.__do_fault
      0.36            -0.0        0.33            -0.0        0.34        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.39            -0.0        0.37            -0.0        0.38 ±  2%  perf-profile.children.cycles-pp.rmqueue
      0.17 ±  2%      -0.0        0.15            -0.0        0.16 ±  3%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.32            -0.0        0.30            -0.0        0.31        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      1.15            -0.0        1.13            +0.0        1.19        perf-profile.children.cycles-pp.shmem_fault
      0.09            -0.0        0.07            -0.0        0.08        perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.16            -0.0        0.14            -0.0        0.15 ±  3%  perf-profile.children.cycles-pp.handle_pte_fault
      0.12 ±  3%      -0.0        0.10 ±  3%      -0.0        0.11 ±  4%  perf-profile.children.cycles-pp.uncharge_folio
      0.16 ±  2%      -0.0        0.14 ±  2%      -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.shmem_get_policy
      0.29            -0.0        0.27            -0.0        0.28 ±  2%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.08            -0.0        0.06 ±  6%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.folio_unlock
      0.16 ±  4%      -0.0        0.14 ±  3%      -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__pte_offset_map
      0.25            -0.0        0.24            -0.0        0.24        perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.30            -0.0        0.28 ±  2%      -0.0        0.28        perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.20 ±  2%      -0.0        0.18 ±  3%      -0.0        0.19 ±  2%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.09 ±  4%      -0.0        0.08            -0.0        0.09        perf-profile.children.cycles-pp.down_read_trylock
      0.12 ±  3%      -0.0        0.11            -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.folio_add_new_anon_rmap
      0.99            -0.0        0.99            +0.1        1.04 ±  2%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.04 ± 44%      +0.0        0.06 ±  7%      -0.0        0.02 ±142%  perf-profile.children.cycles-pp.kthread
      0.04 ± 44%      +0.0        0.06 ±  7%      -0.0        0.02 ±142%  perf-profile.children.cycles-pp.ret_from_fork
      0.04 ± 44%      +0.0        0.06 ±  7%      -0.0        0.02 ±142%  perf-profile.children.cycles-pp.ret_from_fork_asm
      0.73            +0.0        0.75            +0.1        0.79        perf-profile.children.cycles-pp.filemap_get_entry
      0.00            +0.1        0.05            +0.0        0.01 ±223%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      1.02            +0.1        1.07            +0.1        1.10        perf-profile.children.cycles-pp.zap_present_ptes
      0.47            +0.2        0.68 ±  2%      +0.2        0.64        perf-profile.children.cycles-pp.folio_remove_rmap_ptes
      3.87            +0.2        4.11            +0.1        3.97        perf-profile.children.cycles-pp.tlb_finish_mmu
      1.17            +0.6        1.75 ±  2%      +0.5        1.67        perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
     31.81            +0.6       32.44            +0.4       32.22        perf-profile.children.cycles-pp.folio_add_lru_vma
     31.77            +0.6       32.42            +0.4       32.19        perf-profile.children.cycles-pp.folio_batch_move_lru
     35.04            +0.7       35.77            +0.6       35.67        perf-profile.children.cycles-pp.finish_fault
     32.88            +0.9       33.80            +0.7       33.59        perf-profile.children.cycles-pp.set_pte_range
     29.54            +2.3       31.84            +0.9       30.39        perf-profile.children.cycles-pp.tlb_flush_mmu
     30.66            +2.3       32.98            +0.9       31.57        perf-profile.children.cycles-pp.zap_pte_range
     30.66            +2.3       32.98            +0.9       31.58        perf-profile.children.cycles-pp.unmap_page_range
     30.66            +2.3       32.98            +0.9       31.58        perf-profile.children.cycles-pp.unmap_vmas
     30.66            +2.3       32.98            +0.9       31.58        perf-profile.children.cycles-pp.zap_pmd_range
     33.41            +2.5       35.95            +1.0       34.36        perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
     33.40            +2.5       35.94            +1.0       34.36        perf-profile.children.cycles-pp.free_pages_and_swap_cache
     34.56            +2.6       37.12            +1.0       35.57        perf-profile.children.cycles-pp.__x64_sys_munmap
     34.56            +2.6       37.12            +1.0       35.57        perf-profile.children.cycles-pp.__vm_munmap
     34.56            +2.6       37.12            +1.0       35.58        perf-profile.children.cycles-pp.do_vmi_munmap
     34.56            +2.6       37.12            +1.0       35.57        perf-profile.children.cycles-pp.__munmap
     34.56            +2.6       37.12            +1.0       35.58        perf-profile.children.cycles-pp.do_vmi_align_munmap
     34.56            +2.6       37.12            +1.0       35.58        perf-profile.children.cycles-pp.unmap_region
     34.67            +2.6       37.24            +1.0       35.68        perf-profile.children.cycles-pp.do_syscall_64
     34.67            +2.6       37.24            +1.0       35.69        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     33.22            +2.6       35.83            +1.0       34.21        perf-profile.children.cycles-pp.folios_put_refs
     32.12            +2.7       34.80            +1.1       33.22        perf-profile.children.cycles-pp.__page_cache_release
     61.97            +3.5       65.47            +1.6       63.54        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     61.98            +3.5       65.50            +1.6       63.56        perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     61.94            +3.5       65.48            +1.6       63.51        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      9.32            -0.9        8.41            -0.4        8.88 ±  2%  perf-profile.self.cycles-pp.copy_page
      5.15            -0.5        4.65            -0.2        4.94        perf-profile.self.cycles-pp.__irqentry_text_end
      2.58            -0.3        2.30            -0.1        2.46        perf-profile.self.cycles-pp.testcase
      2.53            -0.2        2.28            -0.1        2.42        perf-profile.self.cycles-pp.error_entry
      0.56            -0.2        0.32 ±  2%      -0.2        0.34        perf-profile.self.cycles-pp.mas_walk
      0.60 ±  4%      -0.2        0.43 ±  5%      -0.1        0.51 ±  5%  perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
      1.54            -0.1        1.42            -0.1        1.49 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock
      1.15            -0.1        1.04            -0.0        1.11        perf-profile.self.cycles-pp.native_irq_return_iret
      0.94            -0.1        0.84            -0.0        0.90        perf-profile.self.cycles-pp.sync_regs
      0.85            -0.1        0.75 ±  5%      +0.3        1.13 ±  2%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      0.20 ±  3%      -0.1        0.12 ±  3%      -0.1        0.15 ±  2%  perf-profile.self.cycles-pp._compound_head
      0.27 ±  3%      -0.1        0.19 ±  2%      -0.0        0.23 ±  3%  perf-profile.self.cycles-pp.free_pages_and_swap_cache
      0.26            -0.1        0.19 ±  3%      -0.0        0.25 ±  2%  perf-profile.self.cycles-pp.__page_cache_release
      0.66            -0.1        0.59            -0.0        0.63        perf-profile.self.cycles-pp.___perf_sw_event
      0.28 ±  2%      -0.1        0.22 ±  3%      -0.0        0.25        perf-profile.self.cycles-pp.zap_present_ptes
      0.32            -0.1        0.27 ±  4%      -0.0        0.28        perf-profile.self.cycles-pp.lru_add_fn
      0.37 ±  5%      -0.1        0.32 ±  6%      -0.0        0.36 ±  6%  perf-profile.self.cycles-pp.__count_memcg_events
      0.26            -0.1        0.20            -0.0        0.21        perf-profile.self.cycles-pp.get_vma_policy
      0.47            -0.1        0.42            -0.0        0.44 ±  2%  perf-profile.self.cycles-pp.__handle_mm_fault
      0.16            -0.0        0.12 ±  4%      -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.vma_alloc_folio_noprof
      0.20            -0.0        0.16 ±  3%      -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.free_unref_folios
      0.30            -0.0        0.25            -0.0        0.26        perf-profile.self.cycles-pp.handle_mm_fault
      0.16 ±  4%      -0.0        0.12 ±  3%      -0.0        0.13 ±  3%  perf-profile.self.cycles-pp.lock_vma_under_rcu
      0.14 ±  3%      -0.0        0.11 ±  3%      -0.0        0.13        perf-profile.self.cycles-pp.folio_remove_rmap_ptes
      0.10 ±  4%      -0.0        0.07            -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.zap_pte_range
      0.16 ±  2%      -0.0        0.12 ±  7%      -0.0        0.16 ±  3%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.10 ±  4%      -0.0        0.07 ±  5%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.alloc_pages_mpol_noprof
      0.11            -0.0        0.08            -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.free_unref_page_commit
      0.09 ±  5%      -0.0        0.06 ±  7%      -0.0        0.08        perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
      0.11            -0.0        0.08 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.page_counter_uncharge
      0.12 ±  4%      -0.0        0.09            -0.0        0.11 ±  5%  perf-profile.self.cycles-pp.__mod_zone_page_state
      0.31 ±  2%      -0.0        0.29 ±  2%      -0.0        0.30 ±  2%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.14 ±  2%      -0.0        0.12 ±  4%      -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.mem_cgroup_commit_charge
      0.21            -0.0        0.19 ±  2%      -0.0        0.20 ±  2%  perf-profile.self.cycles-pp.do_user_addr_fault
      0.09            -0.0        0.07 ±  5%      -0.0        0.08        perf-profile.self.cycles-pp.get_pfnblock_flags_mask
      0.21            -0.0        0.19 ±  2%      -0.0        0.20 ±  2%  perf-profile.self.cycles-pp.__perf_sw_event
      0.17 ±  2%      -0.0        0.15            -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.28            -0.0        0.26 ±  2%      -0.0        0.27        perf-profile.self.cycles-pp.__alloc_pages_noprof
      0.22 ±  2%      -0.0        0.19 ±  2%      -0.0        0.20 ±  2%  perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.20 ±  2%      -0.0        0.18 ±  2%      -0.0        0.20 ±  3%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
      0.12            -0.0        0.10            -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.uncharge_folio
      0.11 ±  4%      -0.0        0.09 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.__mem_cgroup_charge
      0.08            -0.0        0.06 ±  6%      -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.folio_unlock
      0.14 ±  3%      -0.0        0.12 ±  3%      -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.do_fault
      0.16 ±  3%      -0.0        0.14 ±  2%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.shmem_get_policy
      0.10 ±  3%      -0.0        0.08 ±  5%      -0.0        0.09        perf-profile.self.cycles-pp.set_pte_range
      0.16 ±  2%      -0.0        0.15 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.get_page_from_freelist
      0.10 ±  3%      -0.0        0.09            -0.0        0.10 ±  5%  perf-profile.self.cycles-pp.exc_page_fault
      0.12 ±  3%      -0.0        0.11            -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.folio_add_new_anon_rmap
      0.09            -0.0        0.08            +0.0        0.09        perf-profile.self.cycles-pp.down_read_trylock
      0.38 ±  2%      +0.0        0.42            +0.1        0.44 ±  2%  perf-profile.self.cycles-pp.filemap_get_entry
      0.26            +0.1        0.36            -0.0        0.23        perf-profile.self.cycles-pp.folios_put_refs
      0.33            +0.1        0.45 ±  4%      +0.1        0.40        perf-profile.self.cycles-pp.folio_batch_move_lru
      0.40 ±  5%      +0.6        0.99            +0.2        0.59        perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
     61.94            +3.5       65.48            +1.6       63.51        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


[2]

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp9/page_fault2/will-it-scale

59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    194.40 ±  9%     -13.9%     167.40 ±  2%     -10.0%     175.00 ±  4%  perf-c2c.HITM.remote
      0.27 ±  3%      -0.0        0.24 ±  2%      -0.0        0.25 ±  2%  mpstat.cpu.all.irq%
      3.83            -0.6        3.21            -0.5        3.37 ±  2%  mpstat.cpu.all.usr%
  15383898           -12.9%   13401271           -10.1%   13823802        will-it-scale.64.processes
    240373           -12.9%     209394           -10.1%     215996        will-it-scale.per_process_ops
  15383898           -12.9%   13401271           -10.1%   13823802        will-it-scale.workload
 2.359e+09           -12.8%  2.057e+09           -10.2%  2.118e+09 ±  2%  numa-numastat.node0.local_node
 2.359e+09           -12.8%  2.057e+09           -10.2%  2.118e+09 ±  2%  numa-numastat.node0.numa_hit
 2.346e+09           -13.2%  2.035e+09 ±  2%     -10.3%  2.105e+09        numa-numastat.node1.local_node
 2.345e+09           -13.2%  2.036e+09 ±  2%     -10.2%  2.105e+09        numa-numastat.node1.numa_hit
  2.36e+09           -12.9%  2.056e+09           -10.2%  2.118e+09 ±  2%  numa-vmstat.node0.numa_hit
  2.36e+09           -12.9%  2.056e+09           -10.3%  2.118e+09 ±  2%  numa-vmstat.node0.numa_local
 2.346e+09           -13.3%  2.035e+09 ±  2%     -10.3%  2.105e+09        numa-vmstat.node1.numa_hit
 2.347e+09           -13.3%  2.034e+09 ±  2%     -10.3%  2.105e+09        numa-vmstat.node1.numa_local
      7.86 ±  5%     -29.5%       5.54 ± 34%     -37.0%       4.95 ± 30%  sched_debug.cfs_rq:/.removed.runnable_avg.avg
     22.93 ±  4%     -18.5%      18.68 ± 15%     -21.7%      17.96 ± 20%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
      7.86 ±  5%     -30.0%       5.50 ± 34%     -37.0%       4.95 ± 30%  sched_debug.cfs_rq:/.removed.util_avg.avg
     22.93 ±  4%     -19.9%      18.35 ± 14%     -21.7%      17.96 ± 20%  sched_debug.cfs_rq:/.removed.util_avg.stddev
    149.50 ± 33%     -70.9%      43.57 ±125%     -58.2%      62.42 ± 67%  sched_debug.cfs_rq:/.util_est.min
      1930 ±  4%     -10.5%       1729 ± 16%     -14.9%       1643 ±  6%  sched_debug.cpu.nr_switches.min
   1137116            -1.8%    1116759            -1.8%    1116590        proc-vmstat.nr_anon_pages
      4575            +1.7%       4654            +1.7%       4652        proc-vmstat.nr_page_table_pages
 4.705e+09           -13.0%  4.093e+09           -10.2%  4.224e+09        proc-vmstat.numa_hit
 4.706e+09           -13.0%  4.092e+09           -10.3%  4.223e+09        proc-vmstat.numa_local
 4.645e+09           -12.8%   4.05e+09           -10.1%  4.177e+09        proc-vmstat.pgalloc_normal
 4.631e+09           -12.8%  4.038e+09           -10.1%  4.164e+09        proc-vmstat.pgfault
 4.643e+09           -12.8%  4.049e+09           -10.1%  4.176e+09        proc-vmstat.pgfree
     21.14            -9.9%      19.05            -7.4%      19.58        perf-stat.i.MPKI
 1.468e+10            -7.9%  1.351e+10            -6.2%  1.378e+10        perf-stat.i.branch-instructions
  14349180            -6.2%   13464962            -5.2%   13596701        perf-stat.i.branch-misses
     69.58            -4.6       64.96            -3.2       66.40        perf-stat.i.cache-miss-rate%
  1.57e+09           -17.8%  1.291e+09           -13.6%  1.356e+09 ±  2%  perf-stat.i.cache-misses
 2.252e+09           -11.9%  1.985e+09            -9.4%  2.039e+09        perf-stat.i.cache-references
      3.00           +10.6%       3.32            +8.1%       3.25        perf-stat.i.cpi
     99.00            -0.9%      98.13            -1.1%      97.87        perf-stat.i.cpu-migrations
    143.06           +22.4%     175.18           +16.4%     166.58 ±  2%  perf-stat.i.cycles-between-cache-misses
 7.403e+10            -8.7%   6.76e+10            -6.7%   6.91e+10        perf-stat.i.instructions
      0.34            -9.7%       0.30            -7.6%       0.31        perf-stat.i.ipc
    478.41           -12.7%     417.50           -10.0%     430.74        perf-stat.i.metric.K/sec
  15310132           -12.7%   13361235           -10.0%   13784853        perf-stat.i.minor-faults
  15310132           -12.7%   13361235           -10.0%   13784853        perf-stat.i.page-faults
     21.21           -28.3%      15.20 ± 50%      -7.5%      19.62        perf-stat.overall.MPKI
      0.10            -0.0        0.08 ± 50%      +0.0        0.10        perf-stat.overall.branch-miss-rate%
     69.71           -17.9       51.83 ± 50%      -3.2       66.46        perf-stat.overall.cache-miss-rate%
      3.01           -11.4%       2.67 ± 50%      +8.0%       3.25        perf-stat.overall.cpi
    141.98            -1.2%     140.33 ± 50%     +16.8%     165.83 ±  2%  perf-stat.overall.cycles-between-cache-misses
      0.33           -27.7%       0.24 ± 50%      -7.4%       0.31        perf-stat.overall.ipc
   1453908           -16.2%    1218410 ± 50%      +3.6%    1506867        perf-stat.overall.path-length
 1.463e+10           -26.4%  1.077e+10 ± 50%      -6.2%  1.373e+10        perf-stat.ps.branch-instructions
  14253731           -25.1%   10681742 ± 50%      -5.2%   13506212        perf-stat.ps.branch-misses
 1.565e+09           -34.6%  1.023e+09 ± 50%     -13.6%  1.351e+09 ±  2%  perf-stat.ps.cache-misses
 2.245e+09           -29.6%  1.579e+09 ± 50%      -9.4%  2.032e+09        perf-stat.ps.cache-references
 7.378e+10           -27.0%  5.385e+10 ± 50%      -6.7%  6.886e+10        perf-stat.ps.instructions
  15260342           -30.3%   10633461 ± 50%     -10.0%   13738637        perf-stat.ps.minor-faults
  15260342           -30.3%   10633461 ± 50%     -10.0%   13738637        perf-stat.ps.page-faults
 2.237e+13           -27.2%  1.629e+13 ± 50%      -6.9%  2.083e+13        perf-stat.total.instructions
     75.68            -5.4       70.26            -5.0       70.73        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     72.31            -5.1       67.25            -4.7       67.66        perf-profile.calltrace.cycles-pp.testcase
     63.50            -3.9       59.64            -3.7       59.78        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
     63.32            -3.8       59.48            -3.7       59.63        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     61.04            -3.6       57.49            -3.5       57.55        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     21.29            -3.5       17.77 ±  2%      -2.8       18.48 ±  2%  perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     59.53            -3.3       56.21            -3.3       56.24        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     58.35            -3.2       55.17            -3.2       55.16        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      5.31            -0.8        4.50            -0.7        4.64        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
      4.97            -0.8        4.21            -0.6        4.35        perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault
      4.40            -0.6        3.78 ±  2%      -0.4        3.96 ±  3%  perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.57            -0.6        0.00            -0.3        0.26 ±100%  perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      2.63            -0.3        2.29            -0.3        2.36        perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
      1.82            -0.3        1.49            -0.3        1.55        perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      2.21            -0.3        1.90            -0.2        1.97        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      2.01            -0.3        1.73 ±  2%      -0.2        1.84 ±  5%  perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      1.80            -0.3        1.54            -0.2        1.59        perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault
      1.55            -0.2        1.33            -0.2        1.36        perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault
      1.74            -0.2        1.52 ±  2%      -0.2        1.57        perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.63 ±  2%      -0.2        0.41 ± 50%      -0.1        0.53 ±  2%  perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault
      1.60            -0.2        1.39 ±  2%      -0.2        1.44        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault
      1.29            -0.2        1.11 ±  3%      -0.1        1.19 ±  6%  perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault
      1.42            -0.2        1.24 ±  2%      -0.1        1.28 ±  2%  perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault
      1.12            -0.2        0.95 ±  2%      -0.1        0.98        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc
      1.50            -0.1        1.36 ±  3%      -0.2        1.33 ±  2%  perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault.__handle_mm_fault
      0.72 ±  2%      -0.1        0.60 ±  3%      -0.1        0.62 ±  2%  perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      0.98            -0.1        0.87 ±  2%      -0.1        0.90 ±  2%  perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.92            -0.1        0.81 ±  3%      -0.1        0.84 ±  3%  perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
      0.74            -0.1        0.64            -0.1        0.66        perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
      0.66            -0.1        0.56 ±  2%      -0.1        0.59        perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.64            -0.1        0.56 ±  2%      -0.1        0.57 ±  2%  perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof
      1.15            -0.1        1.07            -0.1        1.08 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
      0.66            -0.1        0.58 ±  2%      -0.1        0.60 ±  2%  perf-profile.calltrace.cycles-pp.mas_walk.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      2.71            +0.6        3.31 ±  2%      +0.5        3.23        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      2.71            +0.6        3.31 ±  2%      +0.5        3.23        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      2.71            +0.6        3.31 ±  2%      +0.5        3.22        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      2.65            +0.6        3.26 ±  2%      +0.5        3.17        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
      2.44            +0.6        3.07 ±  2%      +0.5        2.98        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
     24.39            +2.1       26.54 ±  3%      +1.0       25.41 ±  4%  perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
     22.46            +2.3       24.81 ±  4%      +1.2       23.70 ±  4%  perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault
     22.25            +2.4       24.63 ±  4%      +1.3       23.52 ±  5%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault
     20.38            +2.5       22.84 ±  4%      +1.3       21.71 ±  5%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
     20.37            +2.5       22.83 ±  4%      +1.3       21.70 ±  5%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range
     20.30            +2.5       22.77 ±  4%      +1.3       21.63 ±  5%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma
     22.59            +4.7       27.29 ±  2%      +4.3       26.92        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
     22.59            +4.7       27.29 ±  2%      +4.3       26.92        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
     22.59            +4.7       27.29 ±  2%      +4.3       26.92        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
     22.58            +4.7       27.28 ±  2%      +4.3       26.91        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
     20.59            +5.1       25.64 ±  2%      +4.6       25.21        perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
     20.59            +5.1       25.64 ±  2%      +4.6       25.20        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
     20.56            +5.1       25.62 ±  2%      +4.6       25.18        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
     20.07            +5.2       25.23 ±  3%      +4.7       24.78        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
     18.73            +5.3       24.01 ±  3%      +4.8       23.55        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu
     25.34            +5.3       30.64 ±  2%      +4.8       30.19        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     25.34            +5.3       30.64 ±  2%      +4.8       30.19        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     25.34            +5.3       30.64 ±  2%      +4.8       30.19        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     25.34            +5.3       30.64 ±  2%      +4.8       30.19        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     25.34            +5.3       30.65 ±  2%      +4.9       30.19        perf-profile.calltrace.cycles-pp.__munmap
     25.34            +5.3       30.64 ±  2%      +4.9       30.19        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     25.33            +5.3       30.64 ±  2%      +4.9       30.18        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
     25.33            +5.3       30.64 ±  2%      +4.9       30.19        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     20.36            +5.9       26.30 ±  3%      +5.4       25.74        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
     20.35            +5.9       26.29 ±  3%      +5.4       25.73        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
     20.28            +6.0       26.24 ±  3%      +5.4       25.67        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
     74.49            -5.3       69.18            -4.9       69.64        perf-profile.children.cycles-pp.testcase
     71.15            -4.8       66.30            -4.5       66.66        perf-profile.children.cycles-pp.asm_exc_page_fault
     63.55            -3.9       59.68            -3.7       59.82        perf-profile.children.cycles-pp.exc_page_fault
     63.38            -3.8       59.54            -3.7       59.68        perf-profile.children.cycles-pp.do_user_addr_fault
     61.10            -3.6       57.54            -3.5       57.61        perf-profile.children.cycles-pp.handle_mm_fault
     21.32            -3.5       17.80 ±  2%      -2.8       18.51 ±  2%  perf-profile.children.cycles-pp.copy_page
     59.57            -3.3       56.24            -3.3       56.27        perf-profile.children.cycles-pp.__handle_mm_fault
     58.44            -3.2       55.25            -3.2       55.25        perf-profile.children.cycles-pp.do_fault
      5.36            -0.8        4.54            -0.7        4.69        perf-profile.children.cycles-pp.__pte_offset_map_lock
      5.02            -0.8        4.25            -0.6        4.38        perf-profile.children.cycles-pp._raw_spin_lock
      4.45            -0.6        3.82 ±  2%      -0.4        4.00 ±  3%  perf-profile.children.cycles-pp.folio_prealloc
      2.64            -0.3        2.30            -0.3        2.37        perf-profile.children.cycles-pp.sync_regs
      1.89            -0.3        1.55            -0.3        1.62        perf-profile.children.cycles-pp.zap_present_ptes
      2.42            -0.3        2.09 ±  2%      -0.3        2.16 ±  2%  perf-profile.children.cycles-pp.native_irq_return_iret
      2.24            -0.3        1.93            -0.2        2.00        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      2.07            -0.3        1.77 ±  2%      -0.2        1.88 ±  5%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      1.89            -0.3        1.62            -0.2        1.67        perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
      1.64            -0.2        1.41            -0.2        1.45        perf-profile.children.cycles-pp.__alloc_pages_noprof
      1.42            -0.2        1.19 ±  2%      -0.2        1.23 ±  2%  perf-profile.children.cycles-pp.__perf_sw_event
      1.77            -0.2        1.54 ±  2%      -0.2        1.60        perf-profile.children.cycles-pp.__do_fault
      1.62            -0.2        1.41 ±  2%      -0.2        1.46 ±  2%  perf-profile.children.cycles-pp.shmem_fault
      1.25            -0.2        1.05 ±  2%      -0.2        1.08 ±  2%  perf-profile.children.cycles-pp.___perf_sw_event
      2.04            -0.2        1.83 ±  3%      -0.2        1.82 ±  2%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
      1.32            -0.2        1.13 ±  2%      -0.1        1.21 ±  6%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      1.47            -0.2        1.29 ±  2%      -0.1        1.34 ±  2%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
      1.17            -0.2        1.00 ±  2%      -0.1        1.03        perf-profile.children.cycles-pp.get_page_from_freelist
      0.84            -0.2        0.69 ±  2%      -0.1        0.71 ±  3%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.61            -0.2        0.46 ±  2%      -0.1        0.48        perf-profile.children.cycles-pp._compound_head
      0.65            -0.1        0.53 ±  2%      -0.1        0.54 ±  3%  perf-profile.children.cycles-pp.__mod_node_page_state
      1.02            -0.1        0.90 ±  2%      -0.1        0.93 ±  2%  perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.94            -0.1        0.82 ±  3%      -0.1        0.85 ±  2%  perf-profile.children.cycles-pp.filemap_get_entry
      1.13 ±  2%      -0.1        1.03 ±  3%      -0.1        1.02 ±  3%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      0.76            -0.1        0.66            -0.1        0.68        perf-profile.children.cycles-pp.folio_remove_rmap_ptes
      1.20            -0.1        1.11            -0.1        1.12        perf-profile.children.cycles-pp.lru_add_fn
      0.69            -0.1        0.60 ±  2%      -0.1        0.61 ±  2%  perf-profile.children.cycles-pp.rmqueue
      0.47            -0.1        0.38            -0.1        0.40        perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
      0.59            -0.1        0.50            -0.1        0.52        perf-profile.children.cycles-pp.free_unref_folios
      0.63 ±  3%      -0.1        0.55 ±  3%      -0.0        0.59 ±  7%  perf-profile.children.cycles-pp.__count_memcg_events
      0.67            -0.1        0.59 ±  2%      -0.1        0.61 ±  2%  perf-profile.children.cycles-pp.mas_walk
      0.54            -0.1        0.47 ±  3%      -0.1        0.49 ±  3%  perf-profile.children.cycles-pp.xas_load
      0.27 ±  3%      -0.1        0.21            -0.1        0.22 ±  3%  perf-profile.children.cycles-pp.uncharge_batch
      0.32            -0.1        0.26            -0.0        0.28 ±  3%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.22 ±  3%      -0.1        0.17 ±  2%      -0.0        0.18 ±  2%  perf-profile.children.cycles-pp.page_counter_uncharge
      0.38            -0.0        0.33            -0.0        0.34 ±  2%  perf-profile.children.cycles-pp.try_charge_memcg
      0.31            -0.0        0.26            -0.0        0.28 ±  2%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.31 ±  2%      -0.0        0.27 ±  3%      -0.0        0.28 ±  5%  perf-profile.children.cycles-pp.get_vma_policy
      0.30            -0.0        0.26 ±  3%      -0.0        0.27        perf-profile.children.cycles-pp.handle_pte_fault
      0.28            -0.0        0.25            -0.0        0.26        perf-profile.children.cycles-pp.error_entry
      0.22            -0.0        0.19 ±  2%      -0.0        0.20        perf-profile.children.cycles-pp.free_unref_page_commit
      0.28 ±  2%      -0.0        0.25 ±  2%      -0.0        0.26 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.32 ±  2%      -0.0        0.29 ±  2%      -0.0        0.29 ±  3%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.26 ±  2%      -0.0        0.23 ±  5%      -0.0        0.23 ±  5%  perf-profile.children.cycles-pp._raw_spin_trylock
      0.22 ±  2%      -0.0        0.20 ±  2%      -0.0        0.20 ±  3%  perf-profile.children.cycles-pp.folio_add_new_anon_rmap
      0.22 ±  2%      -0.0        0.19 ±  3%      -0.0        0.19        perf-profile.children.cycles-pp.pte_offset_map_nolock
      0.14 ±  2%      -0.0        0.11            -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.__mod_zone_page_state
      0.14 ±  3%      -0.0        0.12 ±  4%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.perf_exclude_event
      0.18            -0.0        0.15 ±  2%      -0.0        0.16        perf-profile.children.cycles-pp.__rmqueue_pcplist
      0.26 ±  3%      -0.0        0.23 ±  5%      -0.0        0.22 ±  3%  perf-profile.children.cycles-pp.__pte_offset_map
      0.26 ±  3%      -0.0        0.23            -0.0        0.23 ±  4%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.25 ±  3%      -0.0        0.22            -0.0        0.22 ±  4%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.18 ±  2%      -0.0        0.15 ±  2%      -0.0        0.16 ±  2%  perf-profile.children.cycles-pp.__cond_resched
      0.16 ±  2%      -0.0        0.14 ±  2%      -0.0        0.14        perf-profile.children.cycles-pp.uncharge_folio
      0.19 ±  2%      -0.0        0.17 ±  4%      -0.0        0.18 ±  4%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.17 ±  2%      -0.0        0.15 ±  3%      -0.0        0.15 ±  4%  perf-profile.children.cycles-pp.folio_unlock
      0.19 ±  2%      -0.0        0.17 ±  3%      -0.0        0.18 ±  2%  perf-profile.children.cycles-pp.down_read_trylock
      0.16            -0.0        0.14 ±  2%      -0.0        0.14 ±  3%  perf-profile.children.cycles-pp.folio_put
      0.14 ±  2%      -0.0        0.12 ±  6%      -0.0        0.12 ±  6%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
      0.11 ±  3%      -0.0        0.09 ±  4%      -0.0        0.09 ±  5%  perf-profile.children.cycles-pp.xas_start
      0.13 ±  3%      -0.0        0.11 ±  4%      -0.0        0.11 ±  4%  perf-profile.children.cycles-pp.page_counter_try_charge
      0.18 ±  3%      -0.0        0.16 ±  3%      -0.0        0.16 ±  5%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.12 ±  3%      -0.0        0.10 ±  4%      -0.0        0.10        perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.18 ±  2%      -0.0        0.16 ±  2%      -0.0        0.17        perf-profile.children.cycles-pp.up_read
      0.16 ±  2%      -0.0        0.14 ±  3%      -0.0        0.14 ±  5%  perf-profile.children.cycles-pp.update_process_times
      0.14            -0.0        0.12 ±  3%      -0.0        0.13        perf-profile.children.cycles-pp.policy_nodemask
      0.08            -0.0        0.06 ±  6%      -0.0        0.07 ±  5%  perf-profile.children.cycles-pp.memcg_check_events
      0.13 ±  3%      -0.0        0.11            -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.access_error
      0.12 ±  3%      -0.0        0.11 ±  3%      -0.0        0.11        perf-profile.children.cycles-pp.perf_swevent_event
      0.09 ±  4%      -0.0        0.08            -0.0        0.08        perf-profile.children.cycles-pp.__irqentry_text_end
      0.06            -0.0        0.05            -0.0        0.05 ±  7%  perf-profile.children.cycles-pp.pte_alloc_one
      0.05            +0.0        0.06            +0.0        0.06 ±  8%  perf-profile.children.cycles-pp.perf_mmap__push
      0.19 ±  2%      +0.2        0.35 ±  4%      +0.1        0.30 ±  3%  perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      2.72            +0.6        3.32 ±  2%      +0.5        3.24        perf-profile.children.cycles-pp.tlb_finish_mmu
     24.44            +2.1       26.58 ±  3%      +1.0       25.45 ±  4%  perf-profile.children.cycles-pp.set_pte_range
     22.47            +2.3       24.81 ±  4%      +1.2       23.71 ±  4%  perf-profile.children.cycles-pp.folio_add_lru_vma
     22.31            +2.4       24.70 ±  4%      +1.3       23.58 ±  4%  perf-profile.children.cycles-pp.folio_batch_move_lru
     22.59            +4.7       27.29 ±  2%      +4.3       26.92        perf-profile.children.cycles-pp.unmap_page_range
     22.59            +4.7       27.29 ±  2%      +4.3       26.92        perf-profile.children.cycles-pp.unmap_vmas
     22.59            +4.7       27.29 ±  2%      +4.3       26.92        perf-profile.children.cycles-pp.zap_pmd_range
     22.59            +4.7       27.29 ±  2%      +4.3       26.92        perf-profile.children.cycles-pp.zap_pte_range
     20.59            +5.1       25.64 ±  2%      +4.6       25.21        perf-profile.children.cycles-pp.tlb_flush_mmu
     25.34            +5.3       30.64 ±  2%      +4.9       30.19        perf-profile.children.cycles-pp.__vm_munmap
     25.34            +5.3       30.64 ±  2%      +4.9       30.19        perf-profile.children.cycles-pp.__x64_sys_munmap
     25.34            +5.3       30.65 ±  2%      +4.9       30.19        perf-profile.children.cycles-pp.__munmap
     25.34            +5.3       30.65 ±  2%      +4.9       30.20        perf-profile.children.cycles-pp.do_vmi_align_munmap
     25.34            +5.3       30.65 ±  2%      +4.9       30.20        perf-profile.children.cycles-pp.do_vmi_munmap
     25.46            +5.3       30.77 ±  2%      +4.9       30.32        perf-profile.children.cycles-pp.do_syscall_64
     25.33            +5.3       30.64 ±  2%      +4.9       30.19        perf-profile.children.cycles-pp.unmap_region
     25.46            +5.3       30.77 ±  2%      +4.9       30.32        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     23.30            +5.7       28.96 ±  2%      +5.1       28.44        perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
     23.29            +5.7       28.95 ±  2%      +5.1       28.43        perf-profile.children.cycles-pp.free_pages_and_swap_cache
     23.00            +5.7       28.73 ±  2%      +5.2       28.20        perf-profile.children.cycles-pp.folios_put_refs
     21.22            +5.9       27.13 ±  3%      +5.4       26.57        perf-profile.children.cycles-pp.__page_cache_release
     40.79            +8.4       49.20            +6.7       47.50 ±  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     40.78            +8.4       49.19            +6.7       47.49 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     40.64            +8.4       49.09            +6.7       47.38 ±  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     21.23            -3.5       17.73 ±  2%      -2.8       18.43 ±  2%  perf-profile.self.cycles-pp.copy_page
      4.99            -0.8        4.22            -0.6        4.36        perf-profile.self.cycles-pp._raw_spin_lock
      5.21            -0.7        4.53            -0.5        4.68        perf-profile.self.cycles-pp.testcase
      2.63            -0.3        2.29            -0.3        2.37 ±  2%  perf-profile.self.cycles-pp.sync_regs
      2.42            -0.3        2.09 ±  2%      -0.3        2.16 ±  2%  perf-profile.self.cycles-pp.native_irq_return_iret
      1.00            -0.2        0.83 ±  2%      -0.1        0.87 ±  2%  perf-profile.self.cycles-pp.___perf_sw_event
      0.58 ±  2%      -0.1        0.43 ±  3%      -0.1        0.46        perf-profile.self.cycles-pp._compound_head
      0.93 ±  2%      -0.1        0.80 ±  3%      -0.1        0.84 ±  6%  perf-profile.self.cycles-pp.mem_cgroup_commit_charge
      0.61            -0.1        0.50 ±  3%      -0.1        0.51 ±  3%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.51            -0.1        0.40 ±  2%      -0.1        0.42        perf-profile.self.cycles-pp.free_pages_and_swap_cache
      0.80            -0.1        0.70 ±  2%      -0.1        0.72        perf-profile.self.cycles-pp.__handle_mm_fault
      0.61 ±  2%      -0.1        0.51            -0.1        0.54        perf-profile.self.cycles-pp.lru_add_fn
      0.47            -0.1        0.39 ±  2%      -0.1        0.41        perf-profile.self.cycles-pp.get_page_from_freelist
      0.93 ±  2%      -0.1        0.86 ±  3%      -0.1        0.85 ±  3%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      0.45            -0.1        0.38            -0.1        0.40        perf-profile.self.cycles-pp.zap_present_ptes
      0.65            -0.1        0.58 ±  2%      -0.1        0.60 ±  2%  perf-profile.self.cycles-pp.mas_walk
      0.89 ±  2%      -0.1        0.83 ±  3%      -0.1        0.83 ±  2%  perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
      0.44            -0.1        0.39            -0.0        0.40 ±  2%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
      0.52 ±  3%      -0.1        0.46 ±  5%      -0.0        0.49 ±  8%  perf-profile.self.cycles-pp.__count_memcg_events
      0.46            -0.1        0.41 ±  3%      -0.0        0.41 ±  3%  perf-profile.self.cycles-pp.handle_mm_fault
      0.44            -0.1        0.38 ±  3%      -0.0        0.40 ±  3%  perf-profile.self.cycles-pp.xas_load
      0.32            -0.0        0.27            -0.0        0.28 ±  2%  perf-profile.self.cycles-pp.__page_cache_release
      0.34 ±  3%      -0.0        0.29 ±  3%      -0.0        0.29 ±  2%  perf-profile.self.cycles-pp.__alloc_pages_noprof
      0.39            -0.0        0.35 ±  3%      -0.0        0.36 ±  3%  perf-profile.self.cycles-pp.filemap_get_entry
      0.20 ±  4%      -0.0        0.15 ±  2%      -0.0        0.16 ±  3%  perf-profile.self.cycles-pp.page_counter_uncharge
      0.27 ±  3%      -0.0        0.22 ±  2%      -0.0        0.23 ±  2%  perf-profile.self.cycles-pp.rmqueue
      0.29            -0.0        0.25            -0.0        0.27 ±  2%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.27            -0.0        0.23 ±  2%      -0.0        0.24        perf-profile.self.cycles-pp.free_unref_folios
      0.24            -0.0        0.20            -0.0        0.21 ±  2%  perf-profile.self.cycles-pp.folio_remove_rmap_ptes
      0.26            -0.0        0.22 ±  4%      -0.0        0.23 ±  3%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.30            -0.0        0.26            -0.0        0.27 ±  2%  perf-profile.self.cycles-pp.do_user_addr_fault
      0.23 ±  3%      -0.0        0.20 ±  3%      -0.0        0.21 ±  4%  perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.22            -0.0        0.19 ±  2%      -0.0        0.19 ±  2%  perf-profile.self.cycles-pp.set_pte_range
      0.19 ±  2%      -0.0        0.16 ±  4%      -0.0        0.16 ±  3%  perf-profile.self.cycles-pp.__mod_lruvec_state
      0.13 ±  3%      -0.0        0.10 ±  3%      -0.0        0.11        perf-profile.self.cycles-pp.__mem_cgroup_charge
      0.25            -0.0        0.22 ±  2%      -0.0        0.22 ±  2%  perf-profile.self.cycles-pp.error_entry
      0.23 ±  2%      -0.0        0.20 ±  2%      -0.0        0.21        perf-profile.self.cycles-pp.do_fault
      0.21 ±  2%      -0.0        0.19 ±  2%      -0.0        0.19        perf-profile.self.cycles-pp.folio_add_new_anon_rmap
      0.19 ±  2%      -0.0        0.16 ±  2%      -0.0        0.17 ±  2%  perf-profile.self.cycles-pp.folio_add_lru_vma
      0.18            -0.0        0.15 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.free_unref_page_commit
      0.15 ±  2%      -0.0        0.13 ±  3%      -0.0        0.13        perf-profile.self.cycles-pp.uncharge_folio
      0.12 ±  3%      -0.0        0.10            -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.perf_exclude_event
      0.19 ±  2%      -0.0        0.17 ±  3%      -0.0        0.18 ±  6%  perf-profile.self.cycles-pp.get_vma_policy
      0.24            -0.0        0.22            -0.0        0.22 ±  3%  perf-profile.self.cycles-pp.try_charge_memcg
      0.14 ±  2%      -0.0        0.12            -0.0        0.12 ±  3%  perf-profile.self.cycles-pp.__rmqueue_pcplist
      0.11 ±  3%      -0.0        0.09 ±  5%      -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.__mod_zone_page_state
      0.11 ±  3%      -0.0        0.09            -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.page_counter_try_charge
      0.15 ±  2%      -0.0        0.13 ±  3%      -0.0        0.13 ±  2%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.17 ±  4%      -0.0        0.15            -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.lock_vma_under_rcu
      0.15 ±  2%      -0.0        0.13            -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.folio_put
      0.18            -0.0        0.16 ±  2%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.down_read_trylock
      0.21 ±  3%      -0.0        0.19 ±  4%      -0.0        0.18 ±  4%  perf-profile.self.cycles-pp.finish_fault
      0.17 ±  2%      -0.0        0.15 ±  3%      -0.0        0.15        perf-profile.self.cycles-pp.__perf_sw_event
      0.19 ±  2%      -0.0        0.17 ±  2%      -0.0        0.18        perf-profile.self.cycles-pp.asm_exc_page_fault
      0.16 ±  2%      -0.0        0.14 ±  2%      -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.folio_unlock
      0.22 ±  3%      -0.0        0.20 ±  4%      -0.0        0.20 ±  2%  perf-profile.self.cycles-pp.__pte_offset_map
      0.16 ±  2%      -0.0        0.15 ±  5%      -0.0        0.15 ±  2%  perf-profile.self.cycles-pp.shmem_fault
      0.17 ±  2%      -0.0        0.15 ±  3%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.up_read
      0.10            -0.0        0.08 ±  4%      -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.get_pfnblock_flags_mask
      0.11            -0.0        0.09 ±  5%      -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.perf_swevent_event
      0.10 ±  3%      -0.0        0.09 ±  5%      -0.0        0.09 ±  6%  perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
      0.11            -0.0        0.09 ±  5%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.zap_pte_range
      0.10 ±  4%      -0.0        0.09 ±  4%      -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.pte_offset_map_nolock
      0.10 ±  4%      -0.0        0.08 ±  4%      -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.__do_fault
      0.12 ±  3%      -0.0        0.10 ±  3%      -0.0        0.10        perf-profile.self.cycles-pp.exc_page_fault
      0.12 ±  3%      -0.0        0.11 ±  4%      -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.alloc_pages_mpol_noprof
      0.12 ±  3%      -0.0        0.10 ±  3%      -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.access_error
      0.09 ±  5%      -0.0        0.08            -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.policy_nodemask
      0.12 ±  4%      -0.0        0.10 ±  3%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.vma_alloc_folio_noprof
      0.09            -0.0        0.08 ±  5%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.xas_start
      0.10            -0.0        0.09            -0.0        0.09        perf-profile.self.cycles-pp.folio_prealloc
      0.09            -0.0        0.08            -0.0        0.08        perf-profile.self.cycles-pp.__cond_resched
      0.06            -0.0        0.05            -0.0        0.05        perf-profile.self.cycles-pp.vm_normal_page
      0.38 ±  2%      +0.1        0.44            +0.1        0.44 ±  3%  perf-profile.self.cycles-pp.folio_batch_move_lru
      0.18 ±  2%      +0.2        0.34 ±  4%      +0.1        0.29 ±  4%  perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
     40.64            +8.4       49.08            +6.7       47.38 ±  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


[3]

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/page_fault2/will-it-scale

59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   1727628 ± 22%     -24.1%    1310525 ±  7%      -5.3%    1636459 ± 30%  sched_debug.cpu.avg_idle.max
      6058 ± 41%     -47.9%       3156 ± 43%      +1.0%       6121 ± 61%  sched_debug.cpu.max_idle_balance_cost.stddev
     35617 ±  5%      -9.1%      32375 ± 21%     -26.2%      26270 ± 25%  numa-vmstat.node0.nr_slab_reclaimable
   4024866            +3.4%    4163009 ±  7%      +8.7%    4374953 ±  7%  numa-vmstat.node1.nr_file_pages
     19132 ± 10%     +17.3%      22446 ± 30%     +49.4%      28587 ± 23%  numa-vmstat.node1.nr_slab_reclaimable
  17488267            -5.6%   16505101            -6.5%   16346741        will-it-scale.224.processes
     78072            -5.6%      73683            -6.5%      72975        will-it-scale.per_process_ops
  17488267            -5.6%   16505101            -6.5%   16346741        will-it-scale.workload
    142458 ±  5%      -9.1%     129506 ± 21%     -26.2%     105066 ± 25%  numa-meminfo.node0.KReclaimable
    142458 ±  5%      -9.1%     129506 ± 21%     -26.2%     105066 ± 25%  numa-meminfo.node0.SReclaimable
  16107004            +3.3%   16635393 ±  7%      +8.6%   17491995 ±  7%  numa-meminfo.node1.FilePages
     76509 ± 10%     +17.4%      89791 ± 30%     +49.4%     114321 ± 23%  numa-meminfo.node1.KReclaimable
     76509 ± 10%     +17.4%      89791 ± 30%     +49.4%     114321 ± 23%  numa-meminfo.node1.SReclaimable
 5.296e+09            -5.6%  4.998e+09            -6.5%  4.949e+09        proc-vmstat.numa_hit
 5.291e+09            -5.6%  4.995e+09            -6.5%  4.947e+09        proc-vmstat.numa_local
 5.285e+09            -5.6%  4.989e+09            -6.5%  4.941e+09        proc-vmstat.pgalloc_normal
 5.264e+09            -5.6%  4.969e+09            -6.5%  4.921e+09        proc-vmstat.pgfault
 5.283e+09            -5.6%  4.989e+09            -6.5%  4.941e+09        proc-vmstat.pgfree
     20.16            -2.9%      19.58            -3.3%      19.50        perf-stat.i.MPKI
 2.501e+10            -2.4%   2.44e+10            -2.9%  2.428e+10        perf-stat.i.branch-instructions
  18042153            -2.8%   17539874            -3.8%   17362741        perf-stat.i.branch-misses
 2.382e+09            -5.6%  2.249e+09            -6.5%  2.228e+09        perf-stat.i.cache-misses
 2.561e+09            -5.3%  2.424e+09            -6.5%  2.394e+09        perf-stat.i.cache-references
      5.49            +2.8%       5.64            +3.3%       5.67        perf-stat.i.cpi
    274.25            +5.4%     289.07            +6.4%     291.86        perf-stat.i.cycles-between-cache-misses
 1.177e+11            -2.7%  1.145e+11            -3.2%  1.139e+11        perf-stat.i.instructions
      0.19            -2.7%       0.18            -3.2%       0.18        perf-stat.i.ipc
    155.11            -5.5%     146.59            -6.5%     145.09        perf-stat.i.metric.K/sec
  17405977            -5.5%   16441964            -6.5%   16274188        perf-stat.i.minor-faults
  17405978            -5.5%   16441964            -6.5%   16274188        perf-stat.i.page-faults
      4.41 ± 50%     +28.5%       5.66           +29.1%       5.69        perf-stat.overall.cpi
    217.50 ± 50%     +32.4%     287.87           +33.6%     290.48        perf-stat.overall.cycles-between-cache-misses
   1623235 ± 50%     +29.0%    2093187           +29.6%    2103156        perf-stat.overall.path-length
      5.48            -0.4        5.11            -0.4        5.11        perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     57.55            -0.3       57.20            -0.1       57.48        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     56.14            -0.2       55.90            +0.0       56.16        perf-profile.calltrace.cycles-pp.testcase
      1.86            -0.2        1.71            -0.1        1.73        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
      1.77            -0.1        1.63            -0.1        1.64        perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault
      1.17            -0.1        1.10            -0.1        1.08        perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     51.87            -0.0       51.82            +0.2       52.11        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.96            -0.0        0.91            -0.1        0.91        perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
      0.71            -0.0        0.67            -0.0        0.66        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      0.60            -0.0        0.57            -0.0        0.56        perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault
     51.39            -0.0       51.37            +0.3       51.67        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     51.03            +0.0       51.03            +0.3       51.33        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
      4.86            +0.0        4.91            +0.0        4.90        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      4.87            +0.0        4.91            +0.0        4.90        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      4.86            +0.0        4.91            +0.0        4.90        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      4.85            +0.0        4.90            +0.0        4.88        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
      4.77            +0.1        4.82            +0.0        4.81        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
     37.74            +0.3       38.01            -0.0       37.74        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
     37.74            +0.3       38.01            -0.0       37.74        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
     37.74            +0.3       38.01            -0.0       37.74        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
     37.73            +0.3       38.01            +0.0       37.74        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
     37.27            +0.3       37.57            +0.0       37.30        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
     37.28            +0.3       37.58            +0.0       37.31        perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
     37.28            +0.3       37.58            +0.0       37.31        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
     37.15            +0.3       37.46            +0.0       37.20        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
     42.65            +0.3       42.97            +0.0       42.68        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
     42.65            +0.3       42.97            +0.0       42.68        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.65            +0.3       42.97            +0.0       42.68        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.65            +0.3       42.97            +0.0       42.68        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     42.65            +0.3       42.97            +0.0       42.68        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     36.72            +0.3       37.04            +0.1       36.79        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu
     42.65            +0.3       42.97            +0.0       42.69        perf-profile.calltrace.cycles-pp.__munmap
     42.65            +0.3       42.97            +0.0       42.69        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     42.65            +0.3       42.97            +0.0       42.69        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     41.26            +0.4       41.63            +0.1       41.38        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
     41.26            +0.4       41.64            +0.1       41.38        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
     41.23            +0.4       41.61            +0.1       41.36        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
     43.64            +0.5       44.12            +0.8       44.42        perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     41.57            +0.6       42.22            +0.9       42.50        perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
     40.93            +0.7       41.59            +1.0       41.90        perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault
     40.84            +0.7       41.50            +1.0       41.81        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault
     40.19            +0.7       40.89            +1.0       41.19        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range
     40.19            +0.7       40.89            +1.0       41.19        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
     40.16            +0.7       40.87            +1.0       41.16        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma
      5.49            -0.4        5.12            -0.4        5.12        perf-profile.children.cycles-pp.copy_page
     57.05            -0.3       56.75            -0.0       57.02        perf-profile.children.cycles-pp.testcase
     55.66            -0.2       55.41            +0.0       55.70        perf-profile.children.cycles-pp.asm_exc_page_fault
      1.88            -0.2        1.73            -0.1        1.75        perf-profile.children.cycles-pp.__pte_offset_map_lock
      1.79            -0.1        1.64            -0.1        1.66        perf-profile.children.cycles-pp._raw_spin_lock
      1.19            -0.1        1.11            -0.1        1.10        perf-profile.children.cycles-pp.folio_prealloc
      0.96            -0.1        0.91            -0.1        0.91        perf-profile.children.cycles-pp.sync_regs
     51.89            -0.0       51.84            +0.2       52.13        perf-profile.children.cycles-pp.handle_mm_fault
      0.73            -0.0        0.68            -0.0        0.68        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      1.02            -0.0        0.98            -0.1        0.96        perf-profile.children.cycles-pp.native_irq_return_iret
      0.63            -0.0        0.59            -0.0        0.59        perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
      0.55            -0.0        0.51            -0.0        0.51        perf-profile.children.cycles-pp.__alloc_pages_noprof
      0.51            -0.0        0.48            -0.0        0.48        perf-profile.children.cycles-pp.__do_fault
      0.46            -0.0        0.43            -0.0        0.44        perf-profile.children.cycles-pp.shmem_fault
      0.41            -0.0        0.39            -0.0        0.38        perf-profile.children.cycles-pp.get_page_from_freelist
      0.51            -0.0        0.48            -0.0        0.50        perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.36            -0.0        0.34            -0.0        0.34        perf-profile.children.cycles-pp.___perf_sw_event
      0.42            -0.0        0.39            -0.0        0.39        perf-profile.children.cycles-pp.__perf_sw_event
      0.42            -0.0        0.40            -0.0        0.40        perf-profile.children.cycles-pp.zap_present_ptes
      0.26            -0.0        0.24            -0.0        0.24        perf-profile.children.cycles-pp.__mod_lruvec_state
      0.38            -0.0        0.36            -0.0        0.36        perf-profile.children.cycles-pp.lru_add_fn
      0.25 ±  2%      -0.0        0.23            -0.0        0.24        perf-profile.children.cycles-pp.filemap_get_entry
      0.21 ±  2%      -0.0        0.20 ±  2%      -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.21            -0.0        0.19 ±  2%      -0.0        0.20        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
     51.40            -0.0       51.39            +0.3       51.68        perf-profile.children.cycles-pp.__handle_mm_fault
      0.23 ±  2%      -0.0        0.21            -0.0        0.21 ±  2%  perf-profile.children.cycles-pp.rmqueue
      0.39            -0.0        0.38            -0.0        0.36 ±  2%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.16 ±  2%      -0.0        0.15 ±  2%      -0.0        0.14 ±  3%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
      0.11            -0.0        0.10            -0.0        0.10 ±  5%  perf-profile.children.cycles-pp._compound_head
      0.17 ±  2%      -0.0        0.16 ±  2%      -0.0        0.16        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.16            -0.0        0.15            -0.0        0.15 ±  2%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.27            -0.0        0.26            -0.0        0.26        perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.11            -0.0        0.10            -0.0        0.10        perf-profile.children.cycles-pp.update_process_times
      0.09            -0.0        0.08            -0.0        0.08        perf-profile.children.cycles-pp.scheduler_tick
      0.06            -0.0        0.05            -0.0        0.05        perf-profile.children.cycles-pp.task_tick_fair
      0.12            -0.0        0.11            -0.0        0.11 ±  3%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.15            -0.0        0.14            -0.0        0.14        perf-profile.children.cycles-pp.hrtimer_interrupt
      0.11 ±  4%      -0.0        0.10            -0.0        0.09 ±  5%  perf-profile.children.cycles-pp.uncharge_batch
     51.07            -0.0       51.06            +0.3       51.36        perf-profile.children.cycles-pp.do_fault
      0.08            -0.0        0.08 ±  6%      -0.0        0.07        perf-profile.children.cycles-pp.page_counter_uncharge
      0.06            -0.0        0.06 ±  6%      +0.0        0.07        perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      0.15 ±  2%      +0.0        0.16 ±  6%      +0.0        0.17 ±  4%  perf-profile.children.cycles-pp.generic_perform_write
      0.07            +0.0        0.08            +0.0        0.08 ±  4%  perf-profile.children.cycles-pp.folio_add_lru
      0.09 ±  4%      +0.0        0.10 ±  3%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.shmem_write_begin
      4.88            +0.0        4.93            +0.0        4.91        perf-profile.children.cycles-pp.tlb_finish_mmu
     37.74            +0.3       38.01            -0.0       37.74        perf-profile.children.cycles-pp.unmap_page_range
     37.74            +0.3       38.01            -0.0       37.74        perf-profile.children.cycles-pp.unmap_vmas
     37.74            +0.3       38.01            -0.0       37.74        perf-profile.children.cycles-pp.zap_pmd_range
     37.74            +0.3       38.01            -0.0       37.74        perf-profile.children.cycles-pp.zap_pte_range
     37.28            +0.3       37.58            +0.0       37.31        perf-profile.children.cycles-pp.tlb_flush_mmu
     42.65            +0.3       42.97            +0.0       42.68        perf-profile.children.cycles-pp.__x64_sys_munmap
     42.65            +0.3       42.97            +0.0       42.68        perf-profile.children.cycles-pp.__vm_munmap
     42.65            +0.3       42.97            +0.0       42.69        perf-profile.children.cycles-pp.__munmap
     42.65            +0.3       42.98            +0.0       42.69        perf-profile.children.cycles-pp.do_vmi_align_munmap
     42.65            +0.3       42.98            +0.0       42.69        perf-profile.children.cycles-pp.do_vmi_munmap
     42.86            +0.3       43.18            +0.1       42.91        perf-profile.children.cycles-pp.do_syscall_64
     42.65            +0.3       42.97            +0.0       42.69        perf-profile.children.cycles-pp.unmap_region
     42.86            +0.3       43.19            +0.1       42.91        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     42.15            +0.3       42.50            +0.1       42.22        perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
     42.12            +0.3       42.46            +0.1       42.19        perf-profile.children.cycles-pp.folios_put_refs
     42.15            +0.3       42.50            +0.1       42.22        perf-profile.children.cycles-pp.free_pages_and_swap_cache
     41.51            +0.4       41.89            +0.1       41.63        perf-profile.children.cycles-pp.__page_cache_release
     43.66            +0.5       44.15            +0.8       44.45        perf-profile.children.cycles-pp.finish_fault
     41.59            +0.6       42.24            +0.9       42.52        perf-profile.children.cycles-pp.set_pte_range
     40.94            +0.7       41.59            +1.0       41.90        perf-profile.children.cycles-pp.folio_add_lru_vma
     40.99            +0.7       41.66            +1.0       41.97        perf-profile.children.cycles-pp.folio_batch_move_lru
     81.57            +1.1       82.65            +1.1       82.68        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     81.59            +1.1       82.68            +1.1       82.72        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     81.60            +1.1       82.68            +1.1       82.72        perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      5.47            -0.4        5.10            -0.4        5.11        perf-profile.self.cycles-pp.copy_page
      1.77            -0.1        1.63            -0.1        1.64        perf-profile.self.cycles-pp._raw_spin_lock
      2.19            -0.1        2.07            -0.1        2.06        perf-profile.self.cycles-pp.testcase
      0.96            -0.0        0.91            -0.1        0.90        perf-profile.self.cycles-pp.sync_regs
      1.02            -0.0        0.98            -0.1        0.96        perf-profile.self.cycles-pp.native_irq_return_iret
      0.28            -0.0        0.26            -0.0        0.26 ±  2%  perf-profile.self.cycles-pp.___perf_sw_event
      0.19 ±  2%      -0.0        0.17            -0.0        0.17 ±  2%  perf-profile.self.cycles-pp.get_page_from_freelist
      0.20            -0.0        0.19 ±  2%      -0.0        0.19 ±  2%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.12 ±  4%      -0.0        0.10            -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.filemap_get_entry
      0.11 ±  3%      -0.0        0.10            -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.free_pages_and_swap_cache
      0.21            -0.0        0.20            -0.0        0.20 ±  2%  perf-profile.self.cycles-pp.folios_put_refs
      0.16            -0.0        0.15            -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.mas_walk
      0.09            -0.0        0.08            -0.0        0.08        perf-profile.self.cycles-pp.folio_add_new_anon_rmap
      0.06            -0.0        0.05            -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.down_read_trylock
      0.18            -0.0        0.17 ±  2%      -0.0        0.17        perf-profile.self.cycles-pp.lru_add_fn
      0.09 ±  4%      -0.0        0.09 ±  4%      -0.0        0.08        perf-profile.self.cycles-pp._compound_head
     81.57            +1.1       82.65            +1.1       82.68        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


[4]

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-13/performance/x86_64-rhel-8.3/process/100%/debian-12-x86_64-20240206.cgz/lkp-csl-d02/page_fault2/will-it-scale

59142d87ab03b8ff a94032b35e5f97dc1023030d929 fd2296741e2686ed6ecd05187e4
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     13383           -14.7%      11416           -10.2%      12023        perf-c2c.DRAM.local
    878.00 ±  4%     +39.1%       1221 ±  6%     +11.3%     977.00 ±  4%  perf-c2c.HITM.local
      0.54 ±  3%      -0.1        0.43 ±  2%      -0.1        0.47 ±  2%  mpstat.cpu.all.irq%
      0.04 ±  6%      -0.0        0.03            +0.0        0.04 ± 11%  mpstat.cpu.all.soft%
      8.44 ±  2%      -1.1        7.32            -0.9        7.53        mpstat.cpu.all.usr%
     59743 ± 11%     -22.9%      46054 ±  7%     -15.0%      50754 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.stddev
     59744 ± 11%     -22.9%      46054 ±  7%     -15.0%      50754 ±  8%  sched_debug.cfs_rq:/.min_vruntime.stddev
      3843 ±  4%     -28.8%       2737 ±  8%     -14.2%       3296 ± 10%  sched_debug.cpu.nr_switches.min
   6749425           -19.4%    5441878           -12.1%    5929733        will-it-scale.36.processes
    187483           -19.4%     151162           -12.1%     164714        will-it-scale.per_process_ops
   6749425           -19.4%    5441878           -12.1%    5929733        will-it-scale.workload
    734606            -2.1%     718878            -1.8%     721386        proc-vmstat.nr_anon_pages
      9660            -4.0%       9278            -2.9%       9383        proc-vmstat.nr_mapped
      2999            +3.2%       3095            +2.3%       3069        proc-vmstat.nr_page_table_pages
 2.043e+09           -19.3%  1.649e+09           -12.0%  1.799e+09        proc-vmstat.numa_hit
 2.049e+09           -19.3%  1.653e+09           -12.0%  1.803e+09        proc-vmstat.numa_local
 2.036e+09           -19.2%  1.644e+09           -12.0%  1.791e+09        proc-vmstat.pgalloc_normal
 2.029e+09           -19.3%  1.639e+09           -12.0%  1.785e+09        proc-vmstat.pgfault
 2.035e+09           -19.2%  1.644e+09           -12.0%  1.791e+09        proc-vmstat.pgfree
     21123 ±  2%      +3.4%      21833            +3.9%      21942        proc-vmstat.pgreuse
     17.45            -8.6%      15.96            -6.0%      16.41        perf-stat.i.MPKI
 6.199e+09           -10.2%  5.567e+09            -5.5%  5.856e+09        perf-stat.i.branch-instructions
      0.26            -0.0        0.25            -0.0        0.25        perf-stat.i.branch-miss-rate%
  16660671           -10.6%   14902193            -7.3%   15444974        perf-stat.i.branch-misses
     87.85            -2.9       84.90            -2.8       85.02        perf-stat.i.cache-miss-rate%
 5.476e+08           -19.5%  4.407e+08           -12.3%  4.805e+08        perf-stat.i.cache-misses
 6.227e+08           -16.7%  5.186e+08            -9.3%  5.647e+08        perf-stat.i.cache-references
      4.35           +14.1%       4.96            +7.6%       4.68        perf-stat.i.cpi
     61.84 ±  2%     -16.2%      51.79           -14.1%      53.13        perf-stat.i.cpu-migrations
    251.09           +24.4%     312.35           +14.2%     286.75        perf-stat.i.cycles-between-cache-misses
 3.137e+10           -11.8%  2.768e+10            -6.6%  2.931e+10        perf-stat.i.instructions
      0.23           -11.7%       0.21            -6.5%       0.22        perf-stat.i.ipc
    373.37           -19.3%     301.36           -12.0%     328.39        perf-stat.i.metric.K/sec
   6720929           -19.3%    5424836           -12.0%    5911373        perf-stat.i.minor-faults
   6720929           -19.3%    5424836           -12.0%    5911373        perf-stat.i.page-faults
     17.45            -8.8%      15.92            -6.1%      16.39        perf-stat.overall.MPKI
      0.27            -0.0        0.27            -0.0        0.26        perf-stat.overall.branch-miss-rate%
     87.94            -3.0       84.96            -2.9       85.08        perf-stat.overall.cache-miss-rate%
      4.35           +13.4%       4.93            +7.1%       4.65        perf-stat.overall.cpi
    249.03           +24.3%     309.56           +14.0%     283.85        perf-stat.overall.cycles-between-cache-misses
      0.23           -11.8%       0.20            -6.6%       0.21        perf-stat.overall.ipc
   1400364            +9.4%    1532615            +6.5%    1491568        perf-stat.overall.path-length
 6.178e+09           -10.2%  5.548e+09            -5.5%  5.835e+09        perf-stat.ps.branch-instructions
  16578081           -10.7%   14811244            -7.4%   15346617        perf-stat.ps.branch-misses
 5.458e+08           -19.5%  4.392e+08           -12.3%  4.788e+08        perf-stat.ps.cache-misses
 6.206e+08           -16.7%  5.169e+08            -9.3%  5.628e+08        perf-stat.ps.cache-references
     61.60 ±  2%     -16.3%      51.58           -14.2%      52.85        perf-stat.ps.cpu-migrations
 3.127e+10           -11.8%  2.758e+10            -6.6%  2.921e+10        perf-stat.ps.instructions
   6698560           -19.3%    5406176           -12.1%    5890997        perf-stat.ps.minor-faults
   6698560           -19.3%    5406177           -12.1%    5890998        perf-stat.ps.page-faults
 9.451e+12           -11.8%   8.34e+12            -6.4%  8.845e+12        perf-stat.total.instructions
     78.09           -11.0       67.12            -7.4       70.68        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
     84.87 ±  2%     -10.3       74.55            -6.9       77.97        perf-profile.calltrace.cycles-pp.testcase
     68.48 ±  2%      -9.3       59.13            -6.2       62.28        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
     68.26 ±  2%      -9.3       58.94            -6.2       62.08        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     65.58 ±  2%      -8.7       56.90            -5.7       59.92        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     64.14 ±  2%      -8.5       55.61            -5.6       58.59        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     63.24 ±  2%      -8.4       54.84            -5.5       57.78        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     40.12 ±  4%      -4.1       36.02            -2.9       37.23        perf-profile.calltrace.cycles-pp.copy_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     15.19 ±  3%      -3.5       11.73            -1.9       13.28        perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      9.10 ±  8%      -3.1        6.01 ±  2%      -1.9        7.16 ±  3%  perf-profile.calltrace.cycles-pp.folio_add_lru_vma.set_pte_range.finish_fault.do_fault.__handle_mm_fault
      8.89 ±  8%      -3.1        5.83 ±  3%      -1.9        6.96 ±  3%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault.do_fault
     10.98 ±  6%      -3.0        7.97 ±  2%      -1.6        9.38 ±  2%  perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
      7.41 ± 10%      -2.9        4.49 ±  4%      -1.9        5.50 ±  4%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range
      7.42 ± 10%      -2.9        4.51 ±  4%      -1.9        5.52 ±  4%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
      7.35 ± 10%      -2.9        4.44 ±  4%      -1.9        5.45 ±  4%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru_vma
      2.14 ± 15%      -1.4        0.70 ±  6%      -1.2        0.93 ±  3%  perf-profile.calltrace.cycles-pp._compound_head.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
      3.15 ± 11%      -1.3        1.84            -1.2        1.96        perf-profile.calltrace.cycles-pp.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      3.60 ±  3%      -0.4        3.16            -0.3        3.28        perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
      3.88            -0.4        3.46            -0.4        3.50        perf-profile.calltrace.cycles-pp.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      1.29            -0.4        0.87            -0.4        0.92        perf-profile.calltrace.cycles-pp.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      3.09 ±  3%      -0.4        2.68            -0.3        2.81        perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.finish_fault.do_fault.__handle_mm_fault
      0.96            -0.3        0.62 ±  2%      -0.3        0.65        perf-profile.calltrace.cycles-pp.mas_walk.lock_vma_under_rcu.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      3.45 ±  3%      -0.3        3.12            -0.2        3.24        perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      3.31 ±  3%      -0.3        3.00            -0.2        3.11        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault
      3.09 ±  3%      -0.3        2.80            -0.2        2.90        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault
      2.42            -0.3        2.16            -0.3        2.14        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      2.72 ±  4%      -0.2        2.50            -0.1        2.58        perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_fault
      1.55 ±  2%      -0.2        1.33            -0.2        1.38        perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.testcase
      0.87            -0.2        0.72            -0.1        0.79        perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru_vma.set_pte_range.finish_fault
      1.39 ±  3%      -0.1        1.25 ±  3%      -0.1        1.30 ±  2%  perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.folio_prealloc.do_fault.__handle_mm_fault.handle_mm_fault
      0.81            -0.1        0.70 ±  2%      -0.1        0.73 ±  2%  perf-profile.calltrace.cycles-pp.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      1.74            -0.1        1.63            -0.1        1.62        perf-profile.calltrace.cycles-pp.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault.__handle_mm_fault
      0.85 ±  2%      -0.1        0.74 ±  3%      -0.1        0.78        perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.71            -0.1        0.62            -0.1        0.64 ±  3%  perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.handle_mm_fault.do_user_addr_fault.exc_page_fault
      1.01 ±  4%      -0.1        0.93 ±  2%      -0.1        0.94        perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault
      0.72 ±  2%      -0.1        0.64 ±  3%      -0.0        0.67        perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      1.56            -0.1        1.50            -0.1        1.48        perf-profile.calltrace.cycles-pp.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof.folio_prealloc.do_fault
      0.35 ± 81%      +0.1        0.44 ± 50%      +0.3        0.68 ±  7%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault
      0.77 ±  2%      +0.1        0.87 ±  2%      +0.0        0.80 ±  2%  perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages_noprof.alloc_pages_mpol_noprof.vma_alloc_folio_noprof
      1.47 ±  2%      +0.2        1.63 ±  6%      +0.4        1.90 ±  2%  perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.set_pte_range.finish_fault.do_fault.__handle_mm_fault
      0.62 ±  5%      +0.2        0.84 ±  2%      +0.1        0.69 ±  2%  perf-profile.calltrace.cycles-pp.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range.unmap_page_range
      0.00            +0.7        0.68 ±  3%      +0.4        0.35 ± 70%  perf-profile.calltrace.cycles-pp.__lruvec_stat_mod_folio.folio_remove_rmap_ptes.zap_present_ptes.zap_pte_range.zap_pmd_range
      1.66 ± 12%      +1.2        2.86            +0.8        2.50        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      1.66 ± 12%      +1.2        2.86            +0.8        2.49        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap
      1.66 ± 12%      +1.2        2.86            +0.8        2.49        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.51 ± 15%      +1.3        2.80            +0.9        2.41        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.unmap_region
      1.31 ± 18%      +1.3        2.64 ±  2%      +0.9        2.25        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
     16.10 ±  9%      +9.5       25.63 ±  2%      +6.4       22.50        perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
     16.10 ±  9%      +9.5       25.63 ±  2%      +6.4       22.50        perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
     16.10 ±  9%      +9.5       25.63 ±  2%      +6.4       22.50        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
     16.09 ±  9%      +9.5       25.62 ±  2%      +6.4       22.49        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
     17.82 ± 10%     +10.7       28.54 ±  2%      +7.2       25.03        perf-profile.calltrace.cycles-pp.__munmap
     17.81 ± 10%     +10.7       28.53 ±  2%      +7.2       25.02        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     17.81 ± 10%     +10.7       28.53 ±  2%      +7.2       25.02        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     17.81 ± 10%     +10.7       28.53            +7.2       25.02        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     17.82 ± 10%     +10.7       28.54 ±  2%      +7.2       25.03        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
     17.82 ± 10%     +10.7       28.54 ±  2%      +7.2       25.03        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
     17.81 ± 10%     +10.7       28.53 ±  2%      +7.2       25.02        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     17.79 ± 10%     +10.7       28.53 ±  2%      +7.2       25.02        perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
     12.80 ± 15%     +10.9       23.68 ±  2%      +7.6       20.42        perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
     12.78 ± 15%     +10.9       23.68 ±  2%      +7.6       20.41        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
     12.77 ± 15%     +10.9       23.67 ±  2%      +7.6       20.40        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
     11.49 ± 18%     +11.7       23.22 ±  2%      +8.3       19.79        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
     10.49 ± 20%     +11.9       22.36 ±  2%      +8.4       18.90        perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu
     11.02 ± 22%     +13.4       24.43 ±  2%      +9.4       20.44        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
     11.03 ± 22%     +13.4       24.46 ±  2%      +9.4       20.46        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
     10.97 ± 22%     +13.4       24.41 ±  2%      +9.4       20.40        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
     81.97 ±  2%     -10.7       71.28            -7.2       74.78        perf-profile.children.cycles-pp.testcase
     74.32 ±  2%     -10.3       64.01            -6.9       67.40        perf-profile.children.cycles-pp.asm_exc_page_fault
     68.51 ±  2%      -9.4       59.15            -6.2       62.30        perf-profile.children.cycles-pp.exc_page_fault
     68.29 ±  2%      -9.3       58.97            -6.2       62.11        perf-profile.children.cycles-pp.do_user_addr_fault
     65.61 ±  2%      -8.7       56.92            -5.7       59.95        perf-profile.children.cycles-pp.handle_mm_fault
     64.16 ±  2%      -8.5       55.63            -5.6       58.60        perf-profile.children.cycles-pp.__handle_mm_fault
     63.27 ±  2%      -8.4       54.87            -5.5       57.82        perf-profile.children.cycles-pp.do_fault
     40.21 ±  4%      -4.1       36.11            -2.9       37.33        perf-profile.children.cycles-pp.copy_page
     15.21 ±  3%      -3.5       11.75            -1.9       13.30        perf-profile.children.cycles-pp.finish_fault
      9.10 ±  8%      -3.1        6.02 ±  2%      -1.9        7.16 ±  3%  perf-profile.children.cycles-pp.folio_add_lru_vma
      8.91 ±  8%      -3.0        5.87 ±  3%      -1.9        6.99 ±  3%  perf-profile.children.cycles-pp.folio_batch_move_lru
     10.99 ±  6%      -3.0        7.98 ±  2%      -1.6        9.40 ±  2%  perf-profile.children.cycles-pp.set_pte_range
      2.16 ± 15%      -1.4        0.71 ±  6%      -1.2        0.94 ±  4%  perf-profile.children.cycles-pp._compound_head
      3.17 ± 11%      -1.3        1.85            -1.2        1.98        perf-profile.children.cycles-pp.zap_present_ptes
      3.63 ±  3%      -0.5        3.17            -0.3        3.30        perf-profile.children.cycles-pp.__pte_offset_map_lock
      3.14 ±  3%      -0.4        2.71            -0.3        2.85        perf-profile.children.cycles-pp._raw_spin_lock
      1.30            -0.4        0.88            -0.4        0.93        perf-profile.children.cycles-pp.lock_vma_under_rcu
      3.90            -0.4        3.49            -0.4        3.53        perf-profile.children.cycles-pp.folio_prealloc
      0.97            -0.3        0.62 ±  2%      -0.3        0.66        perf-profile.children.cycles-pp.mas_walk
      3.46 ±  3%      -0.3        3.13            -0.2        3.25        perf-profile.children.cycles-pp.__do_fault
      3.31 ±  3%      -0.3        3.00            -0.2        3.12        perf-profile.children.cycles-pp.shmem_fault
      6.74 ±  4%      -0.3        6.44            -0.2        6.53        perf-profile.children.cycles-pp.native_irq_return_iret
      3.10 ±  3%      -0.3        2.82            -0.2        2.92        perf-profile.children.cycles-pp.shmem_get_folio_gfp
      2.43            -0.3        2.17            -0.3        2.15        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      1.60 ±  2%      -0.2        1.37            -0.2        1.42        perf-profile.children.cycles-pp.sync_regs
      2.73 ±  4%      -0.2        2.51            -0.1        2.58        perf-profile.children.cycles-pp.filemap_get_entry
      1.66            -0.2        1.44 ±  2%      -0.1        1.51        perf-profile.children.cycles-pp.__perf_sw_event
      0.64 ±  4%      -0.2        0.44 ±  2%      -0.1        0.53        perf-profile.children.cycles-pp.free_unref_folios
      1.45            -0.2        1.28 ±  2%      -0.1        1.33        perf-profile.children.cycles-pp.___perf_sw_event
      0.88            -0.2        0.73            -0.1        0.80        perf-profile.children.cycles-pp.lru_add_fn
      1.40 ±  3%      -0.1        1.26 ±  3%      -0.1        1.31 ±  2%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      1.23 ±  9%      -0.1        1.09 ±  8%      +0.2        1.41 ±  4%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      1.83            -0.1        1.71            -0.1        1.69        perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
      0.58 ±  7%      -0.1        0.47 ±  5%      -0.1        0.51 ±  3%  perf-profile.children.cycles-pp.__count_memcg_events
      0.69 ±  3%      -0.1        0.59            -0.1        0.63 ±  4%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.33 ±  5%      -0.1        0.22 ±  2%      -0.1        0.27        perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
      0.51 ±  5%      -0.1        0.42 ±  7%      -0.1        0.46 ±  3%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      0.53            -0.1        0.44 ±  3%      -0.1        0.43 ±  2%  perf-profile.children.cycles-pp.get_vma_policy
      1.02 ±  4%      -0.1        0.93 ±  3%      -0.1        0.94        perf-profile.children.cycles-pp.xas_load
      0.58 ±  3%      -0.1        0.50 ±  2%      -0.0        0.54 ±  5%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.57 ±  6%      -0.1        0.50 ±  3%      -0.0        0.53 ±  2%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.48 ±  7%      -0.1        0.40 ±  4%      -0.0        0.44 ±  2%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.23 ±  5%      -0.1        0.16 ±  4%      -0.0        0.19        perf-profile.children.cycles-pp.free_unref_page_commit
      0.43 ±  7%      -0.1        0.36 ±  3%      -0.0        0.39 ±  2%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.43 ±  6%      -0.1        0.36 ±  3%      -0.0        0.39 ±  2%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.16 ±  4%      -0.1        0.10            -0.0        0.12        perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.15 ±  9%      -0.1        0.09 ±  4%      -0.0        0.11 ±  6%  perf-profile.children.cycles-pp.uncharge_batch
      0.30 ±  6%      -0.1        0.24 ±  4%      -0.0        0.27 ±  3%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.37 ±  7%      -0.1        0.31 ±  3%      -0.0        0.34 ±  2%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.27 ±  8%      -0.1        0.22 ±  4%      -0.0        0.24 ±  3%  perf-profile.children.cycles-pp.update_process_times
      1.64            -0.1        1.59            -0.1        1.56        perf-profile.children.cycles-pp.__alloc_pages_noprof
      0.30 ±  9%      -0.0        0.25 ± 28%      -0.0        0.25 ±  5%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.16 ±  6%      -0.0        0.12 ±  3%      -0.0        0.15 ±  4%  perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      0.25            -0.0        0.20            -0.0        0.21 ±  2%  perf-profile.children.cycles-pp.handle_pte_fault
      0.11 ± 11%      -0.0        0.07 ±  5%      -0.0        0.08        perf-profile.children.cycles-pp.page_counter_uncharge
      0.20 ±  5%      -0.0        0.16 ±  2%      -0.0        0.16 ±  4%  perf-profile.children.cycles-pp.__pte_offset_map
      0.22 ±  3%      -0.0        0.19 ±  4%      -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.error_entry
      0.10 ±  3%      -0.0        0.07 ±  7%      -0.0        0.07 ±  8%  perf-profile.children.cycles-pp.policy_nodemask
      0.16 ±  3%      -0.0        0.12 ±  3%      -0.0        0.13 ±  4%  perf-profile.children.cycles-pp.pte_offset_map_nolock
      0.15 ±  5%      -0.0        0.12 ±  3%      -0.0        0.14 ±  4%  perf-profile.children.cycles-pp.uncharge_folio
      0.22 ±  3%      -0.0        0.19 ±  3%      -0.0        0.20 ±  2%  perf-profile.children.cycles-pp.folio_add_new_anon_rmap
      0.17 ±  9%      -0.0        0.14 ±  5%      -0.0        0.16 ±  3%  perf-profile.children.cycles-pp.scheduler_tick
      0.18 ±  5%      -0.0        0.16 ±  5%      -0.0        0.17 ±  2%  perf-profile.children.cycles-pp.up_read
      0.19 ±  2%      -0.0        0.16 ±  7%      -0.0        0.15 ±  7%  perf-profile.children.cycles-pp.shmem_get_policy
      0.14 ±  4%      -0.0        0.12 ±  4%      -0.0        0.12 ±  6%  perf-profile.children.cycles-pp.down_read_trylock
      0.13 ±  6%      -0.0        0.10 ±  3%      -0.0        0.11        perf-profile.children.cycles-pp.folio_put
      0.08 ± 11%      -0.0        0.06 ± 14%      -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.12 ±  6%      -0.0        0.09 ±  5%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.task_tick_fair
      0.29 ±  3%      -0.0        0.27 ±  4%      -0.0        0.28 ±  3%  perf-profile.children.cycles-pp._raw_spin_trylock
      0.12 ±  6%      -0.0        0.10 ±  4%      -0.0        0.10 ±  4%  perf-profile.children.cycles-pp.folio_unlock
      0.79 ±  2%      +0.1        0.90 ±  2%      +0.0        0.83 ±  2%  perf-profile.children.cycles-pp.rmqueue
      0.24 ±  3%      +0.2        0.41 ±  6%      +0.1        0.33 ±  4%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      0.10 ±  5%      +0.2        0.29 ±  9%      +0.1        0.20 ±  9%  perf-profile.children.cycles-pp.rmqueue_bulk
      0.62 ±  4%      +0.2        0.84 ±  2%      +0.1        0.70 ±  3%  perf-profile.children.cycles-pp.folio_remove_rmap_ptes
      1.89 ±  2%      +0.4        2.34 ±  5%      +0.5        2.43 ±  2%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
      1.67 ± 12%      +1.2        2.87            +0.8        2.50        perf-profile.children.cycles-pp.tlb_finish_mmu
     16.10 ±  9%      +9.5       25.63 ±  2%      +6.4       22.50        perf-profile.children.cycles-pp.unmap_vmas
     16.10 ±  9%      +9.5       25.63 ±  2%      +6.4       22.50        perf-profile.children.cycles-pp.unmap_page_range
     16.10 ±  9%      +9.5       25.63 ±  2%      +6.4       22.50        perf-profile.children.cycles-pp.zap_pmd_range
     16.10 ±  9%      +9.5       25.63 ±  2%      +6.4       22.50        perf-profile.children.cycles-pp.zap_pte_range
     18.48 ± 17%     +10.5       29.01            +7.5       26.01        perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     17.97 ±  9%     +10.7       28.66            +7.2       25.16        perf-profile.children.cycles-pp.do_syscall_64
     17.97 ±  9%     +10.7       28.66            +7.2       25.16        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     18.49 ± 17%     +10.7       29.20            +7.6       26.12        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     17.82 ± 10%     +10.7       28.54 ±  2%      +7.2       25.03        perf-profile.children.cycles-pp.__munmap
     17.81 ± 10%     +10.7       28.53 ±  2%      +7.2       25.02        perf-profile.children.cycles-pp.__vm_munmap
     17.81 ± 10%     +10.7       28.53 ±  2%      +7.2       25.02        perf-profile.children.cycles-pp.__x64_sys_munmap
     17.82 ± 10%     +10.7       28.54 ±  2%      +7.2       25.03        perf-profile.children.cycles-pp.do_vmi_munmap
     17.81 ± 10%     +10.7       28.54 ±  2%      +7.2       25.03        perf-profile.children.cycles-pp.do_vmi_align_munmap
     17.80 ± 10%     +10.7       28.53 ±  2%      +7.2       25.02        perf-profile.children.cycles-pp.unmap_region
     18.38 ± 17%     +10.8       29.13            +7.7       26.03        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     12.80 ± 15%     +10.9       23.68 ±  2%      +7.6       20.42        perf-profile.children.cycles-pp.tlb_flush_mmu
     14.44 ± 15%     +12.1       26.54 ±  2%      +8.5       22.91        perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
     14.43 ± 15%     +12.1       26.54 ±  2%      +8.5       22.90        perf-profile.children.cycles-pp.free_pages_and_swap_cache
     13.19 ± 17%     +13.0       26.17 ±  2%      +9.2       22.36        perf-profile.children.cycles-pp.folios_put_refs
     11.81 ± 20%     +13.2       25.01 ±  2%      +9.4       21.17        perf-profile.children.cycles-pp.__page_cache_release
     39.99 ±  4%      -4.1       35.92            -2.9       37.12        perf-profile.self.cycles-pp.copy_page
      2.14 ± 15%      -1.4        0.70 ±  5%      -1.2        0.93 ±  4%  perf-profile.self.cycles-pp._compound_head
      1.39 ± 13%      -0.9        0.48 ±  3%      -0.7        0.67 ±  4%  perf-profile.self.cycles-pp.free_pages_and_swap_cache
      4.45            -0.7        3.74            -0.5        3.92        perf-profile.self.cycles-pp.testcase
      3.12 ±  3%      -0.4        2.69            -0.3        2.83        perf-profile.self.cycles-pp._raw_spin_lock
      0.96            -0.3        0.61 ±  2%      -0.3        0.64        perf-profile.self.cycles-pp.mas_walk
      6.74 ±  4%      -0.3        6.44            -0.2        6.53        perf-profile.self.cycles-pp.native_irq_return_iret
      1.59 ±  2%      -0.2        1.36            -0.2        1.42        perf-profile.self.cycles-pp.sync_regs
      1.22 ±  2%      -0.2        1.04 ±  2%      -0.1        1.10        perf-profile.self.cycles-pp.___perf_sw_event
      1.71 ±  4%      -0.1        1.57            -0.1        1.64        perf-profile.self.cycles-pp.filemap_get_entry
      1.06 ± 10%      -0.1        0.94 ± 10%      +0.2        1.25 ±  4%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      0.48 ±  9%      -0.1        0.38 ±  6%      -0.0        0.44 ±  5%  perf-profile.self.cycles-pp.__count_memcg_events
      0.63            -0.1        0.54            -0.1        0.56        perf-profile.self.cycles-pp.__handle_mm_fault
      0.44            -0.1        0.35            -0.1        0.38 ±  2%  perf-profile.self.cycles-pp.lru_add_fn
      0.29            -0.1        0.21 ±  3%      -0.0        0.28 ±  3%  perf-profile.self.cycles-pp.__page_cache_release
      0.36 ±  3%      -0.1        0.28 ±  2%      -0.1        0.31 ±  4%  perf-profile.self.cycles-pp.get_page_from_freelist
      0.57 ±  3%      -0.1        0.49 ±  2%      -0.0        0.53 ±  5%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.25 ±  3%      -0.1        0.18 ±  2%      -0.0        0.22 ±  2%  perf-profile.self.cycles-pp.free_unref_folios
      0.23 ±  2%      -0.1        0.16 ±  2%      -0.0        0.18 ±  4%  perf-profile.self.cycles-pp.folio_remove_rmap_ptes
      0.85 ±  4%      -0.1        0.78 ±  2%      -0.1        0.80        perf-profile.self.cycles-pp.xas_load
      0.28 ±  3%      -0.1        0.21 ±  3%      -0.0        0.23 ±  3%  perf-profile.self.cycles-pp.do_user_addr_fault
      0.19 ±  2%      -0.1        0.13 ±  8%      -0.1        0.13 ±  3%  perf-profile.self.cycles-pp.set_pte_range
      0.30 ±  2%      -0.1        0.24 ±  3%      -0.0        0.26 ±  2%  perf-profile.self.cycles-pp.zap_present_ptes
      0.29 ±  2%      -0.1        0.23 ±  6%      -0.1        0.24 ±  7%  perf-profile.self.cycles-pp.get_vma_policy
      0.15 ±  2%      -0.1        0.09            -0.1        0.09 ±  4%  perf-profile.self.cycles-pp.vma_alloc_folio_noprof
      0.16 ±  5%      -0.1        0.10 ±  4%      -0.0        0.12        perf-profile.self.cycles-pp.get_pfnblock_flags_mask
      0.32 ±  5%      -0.1        0.26 ±  4%      -0.0        0.28 ±  3%  perf-profile.self.cycles-pp.__alloc_pages_noprof
      0.28 ±  4%      -0.1        0.23 ±  6%      -0.0        0.23 ±  4%  perf-profile.self.cycles-pp.rmqueue
      0.26            -0.0        0.21 ±  3%      -0.0        0.23 ±  3%  perf-profile.self.cycles-pp.asm_exc_page_fault
      0.07 ±  5%      -0.0        0.02 ±122%      -0.0        0.04 ± 45%  perf-profile.self.cycles-pp.policy_nodemask
      0.19 ±  6%      -0.0        0.14            -0.0        0.15 ±  6%  perf-profile.self.cycles-pp.lock_vma_under_rcu
      0.16 ±  5%      -0.0        0.11 ±  3%      -0.0        0.14 ±  4%  perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
      0.32 ±  3%      -0.0        0.28            -0.0        0.29 ±  2%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
      0.15 ±  2%      -0.0        0.10 ±  3%      -0.0        0.13 ±  3%  perf-profile.self.cycles-pp.free_unref_page_commit
      0.20 ±  6%      -0.0        0.16 ±  5%      -0.0        0.18 ±  3%  perf-profile.self.cycles-pp.__perf_sw_event
      0.19 ±  2%      -0.0        0.14 ±  3%      -0.0        0.16 ±  3%  perf-profile.self.cycles-pp.do_fault
      0.12            -0.0        0.08 ±  5%      -0.0        0.10 ±  4%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.27 ±  9%      -0.0        0.23 ± 32%      -0.0        0.22 ±  6%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.19 ±  5%      -0.0        0.15 ±  3%      -0.0        0.15 ±  4%  perf-profile.self.cycles-pp.__pte_offset_map
      0.10 ± 11%      -0.0        0.06 ± 10%      -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.page_counter_uncharge
      0.15 ±  6%      -0.0        0.12 ±  3%      -0.0        0.14 ±  4%  perf-profile.self.cycles-pp.uncharge_folio
      0.21 ±  4%      -0.0        0.17 ±  2%      -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.error_entry
      0.09 ±  4%      -0.0        0.06 ±  6%      -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.alloc_pages_mpol_noprof
      0.18            -0.0        0.15 ±  4%      -0.0        0.16 ±  2%  perf-profile.self.cycles-pp.exc_page_fault
      0.22 ±  3%      -0.0        0.19 ±  3%      -0.0        0.20 ±  2%  perf-profile.self.cycles-pp.folio_add_new_anon_rmap
      0.22 ±  4%      -0.0        0.19 ±  4%      -0.0        0.20 ±  3%  perf-profile.self.cycles-pp.shmem_fault
      0.18 ±  6%      -0.0        0.15 ±  3%      -0.0        0.16 ±  3%  perf-profile.self.cycles-pp.up_read
      0.11 ±  4%      -0.0        0.09            -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.zap_pte_range
      0.13 ±  6%      -0.0        0.10 ±  3%      -0.0        0.11        perf-profile.self.cycles-pp.folio_put
      0.29            -0.0        0.26 ±  4%      -0.0        0.27 ±  3%  perf-profile.self.cycles-pp._raw_spin_trylock
      0.14 ±  2%      -0.0        0.12 ±  4%      -0.0        0.12 ±  6%  perf-profile.self.cycles-pp.down_read_trylock
      0.11 ±  4%      -0.0        0.09 ±  4%      -0.0        0.10 ±  5%  perf-profile.self.cycles-pp.__mod_lruvec_state
      0.09 ±  4%      -0.0        0.07 ±  5%      -0.0        0.07        perf-profile.self.cycles-pp.pte_offset_map_nolock
      0.12 ±  6%      -0.0        0.10 ±  4%      -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.folio_unlock
      0.18 ±  4%      -0.0        0.16 ±  7%      -0.0        0.14 ±  8%  perf-profile.self.cycles-pp.shmem_get_policy
      0.07 ±  7%      -0.0        0.05 ±  7%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.__do_fault
      0.08 ±  5%      -0.0        0.07            -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.handle_pte_fault
      0.08            -0.0        0.07 ±  5%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.__mem_cgroup_charge
      0.40            -0.0        0.39 ±  4%      -0.0        0.38 ±  2%  perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.38 ±  3%      +0.1        0.44 ±  3%      +0.1        0.47 ±  2%  perf-profile.self.cycles-pp.folio_batch_move_lru
      0.39 ±  3%      +0.1        0.46            -0.0        0.35 ±  2%  perf-profile.self.cycles-pp.folios_put_refs
      0.61 ± 13%      +0.5        1.15 ±  3%      +0.4        0.97 ±  3%  perf-profile.self.cycles-pp.__lruvec_stat_mod_folio
     18.38 ± 17%     +10.8       29.13            +7.7       26.03        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-23  7:48               ` Oliver Sang
@ 2024-05-23 16:47                 ` Shakeel Butt
  2024-05-24  7:45                   ` Oliver Sang
  0 siblings, 1 reply; 15+ messages in thread
From: Shakeel Butt @ 2024-05-23 16:47 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin

On Thu, May 23, 2024 at 03:48:40PM +0800, Oliver Sang wrote:
> hi, Shakeel,
> 
> On Tue, May 21, 2024 at 09:18:19PM -0700, Shakeel Butt wrote:
> > On Tue, May 21, 2024 at 10:43:16AM +0800, Oliver Sang wrote:
> > > hi, Shakeel,
> > > 
> > [...]
> > > 
> > > we reported regression on a 2-node Skylake server. so I found a 1-node Skylake
> > > desktop (we don't have 1 node server) to check.
> > > 
> > 
> > Please try the following patch on both single node and dual node
> > machines:
> 
> 
> the regression is partially recovered by applying your patch.
> (but one even more regression case as below)
> 
> details:
> 
> since you mentioned the whole patch-set behavior last time, I applied the
> patch upon
>   a94032b35e5f9 memcg: use proper type for mod_memcg_state
> 
> below fd2296741e2686ed6ecd05187e4 = a94032b35e5f9 + patch
> 

Thanks a lot Oliver. I have couple of questions and requests:

1. What is the baseline kernel you are using? Is it linux-next or linus?
If linux-next, which one specifically?

2. What is the cgroup hierarchy where the workload is running? Is it
running in the root cgroup?

3. For the followup experiments when needed, can you please remove the
whole series (including 59142d87ab03b8ff) for the base numbers.

4. My experiment [1] on Cooper Lake (2 node) and Skylake (1 node) shows
significant improvement but I noticed that I am directly running
page_fault2_processes with -t equal nr_cpus but you are running through
runtest.py. Also it seems like lkp has modified runtest.py. I will try
to run the same setup as yours to repro.


[1] https://lore.kernel.org/all/20240523034824.1255719-1-shakeel.butt@linux.dev

thanks,
Shakeel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-23 16:47                 ` Shakeel Butt
@ 2024-05-24  7:45                   ` Oliver Sang
  2024-05-24 18:06                     ` Shakeel Butt
  0 siblings, 1 reply; 15+ messages in thread
From: Oliver Sang @ 2024-05-24  7:45 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin, oliver.sang

hi, Shakeel,

On Thu, May 23, 2024 at 09:47:30AM -0700, Shakeel Butt wrote:
> On Thu, May 23, 2024 at 03:48:40PM +0800, Oliver Sang wrote:
> > hi, Shakeel,
> > 
> > On Tue, May 21, 2024 at 09:18:19PM -0700, Shakeel Butt wrote:
> > > On Tue, May 21, 2024 at 10:43:16AM +0800, Oliver Sang wrote:
> > > > hi, Shakeel,
> > > > 
> > > [...]
> > > > 
> > > > we reported regression on a 2-node Skylake server. so I found a 1-node Skylake
> > > > desktop (we don't have 1 node server) to check.
> > > > 
> > > 
> > > Please try the following patch on both single node and dual node
> > > machines:
> > 
> > 
> > the regression is partially recovered by applying your patch.
> > (but one even more regression case as below)
> > 
> > details:
> > 
> > since you mentioned the whole patch-set behavior last time, I applied the
> > patch upon
> >   a94032b35e5f9 memcg: use proper type for mod_memcg_state
> > 
> > below fd2296741e2686ed6ecd05187e4 = a94032b35e5f9 + patch
> > 
> 
> Thanks a lot Oliver. I have couple of questions and requests:

you are welcome!

> 
> 1. What is the baseline kernel you are using? Is it linux-next or linus?
> If linux-next, which one specifically?

base is just 59142d87ab03b, which is in current linux-next/master,
and is already merged into linus/master now.

linux$ git rev-list linux-next/master | grep 59142d87ab03b
59142d87ab03b8ff969074348f65730d465f42ee

linux$ git rev-list linus/master | grep 59142d87ab03b
59142d87ab03b8ff969074348f65730d465f42ee


the data for it is the first column in the tables we supplied.

I just applied your patch upon a94032b35e5f9, so:

linux$ git log --oneline --graph fd2296741e2686ed6ecd05187e4
* fd2296741e268 fix for 70a64b7919 from Shakeel  <----- your fix patch
* a94032b35e5f9 memcg: use proper type for mod_memcg_state   <--- patch-set tip, I believe
* acb5fe2f1aff0 memcg: warn for unexpected events and stats
* 4715c6a753dcc mm: cleanup WORKINGSET_NODES in workingset
* 0667c7870a186 memcg: cleanup __mod_memcg_lruvec_state
* ff48c71c26aae memcg: reduce memory for the lruvec and memcg stats
* aab6103b97f1c mm: memcg: account memory used for memcg vmstats and lruvec stats
* 70a64b7919cbd memcg: dynamically allocate lruvec_stats   <--- we reported this as 'fbc' in original report
* 59142d87ab03b memcg: reduce memory size of mem_cgroup_events_index   <--- base


> 
> 2. What is the cgroup hierarchy where the workload is running? Is it
> running in the root cgroup?

Our test system uses systemd from the distribution (debian-12). The workload is
automatically assigned to a specific cgroup by systemd which is in the
sub-hierarchy of root, so it is not directly running in the root cgroup.

> 
> 3. For the followup experiments when needed, can you please remove the
> whole series (including 59142d87ab03b8ff) for the base numbers.

I cannot understand this very well, if the patch is to fix the regression
cause by this series, seems to me the best way is to apply this patch on top
of the series. anything I misunderstood here?

anyway, I could do that, do you mean such like v6.9, which doesn't include this
serial yet? I could use it as base, then apply your patch onto it. then check
the diff between v6.9 and v6.9+patch.

but I still have some concern that, what a big improvement show in this test
cannot guarantee there will be same improvement if comparing the series and
the series+patch

> 
> 4. My experiment [1] on Cooper Lake (2 node) and Skylake (1 node) shows
> significant improvement but I noticed that I am directly running
> page_fault2_processes with -t equal nr_cpus but you are running through
> runtest.py. Also it seems like lkp has modified runtest.py. I will try
> to run the same setup as yours to repro.
> 
> 
> [1] https://lore.kernel.org/all/20240523034824.1255719-1-shakeel.butt@linux.dev
> 
> thanks,
> Shakeel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-24  7:45                   ` Oliver Sang
@ 2024-05-24 18:06                     ` Shakeel Butt
  2024-05-28  6:30                       ` Shakeel Butt
  0 siblings, 1 reply; 15+ messages in thread
From: Shakeel Butt @ 2024-05-24 18:06 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin

On Fri, May 24, 2024 at 03:45:54PM +0800, Oliver Sang wrote:
> hi, Shakeel,
> 
[...]
> 
> > 
> > 1. What is the baseline kernel you are using? Is it linux-next or linus?
> > If linux-next, which one specifically?
> 
> base is just 59142d87ab03b, which is in current linux-next/master,
> and is already merged into linus/master now.
> 
> linux$ git rev-list linux-next/master | grep 59142d87ab03b
> 59142d87ab03b8ff969074348f65730d465f42ee
> 
> linux$ git rev-list linus/master | grep 59142d87ab03b
> 59142d87ab03b8ff969074348f65730d465f42ee
> 
> 
> the data for it is the first column in the tables we supplied.
> 
> I just applied your patch upon a94032b35e5f9, so:
> 
> linux$ git log --oneline --graph fd2296741e2686ed6ecd05187e4
> * fd2296741e268 fix for 70a64b7919 from Shakeel  <----- your fix patch
> * a94032b35e5f9 memcg: use proper type for mod_memcg_state   <--- patch-set tip, I believe
> * acb5fe2f1aff0 memcg: warn for unexpected events and stats
> * 4715c6a753dcc mm: cleanup WORKINGSET_NODES in workingset
> * 0667c7870a186 memcg: cleanup __mod_memcg_lruvec_state
> * ff48c71c26aae memcg: reduce memory for the lruvec and memcg stats
> * aab6103b97f1c mm: memcg: account memory used for memcg vmstats and lruvec stats
> * 70a64b7919cbd memcg: dynamically allocate lruvec_stats   <--- we reported this as 'fbc' in original report
> * 59142d87ab03b memcg: reduce memory size of mem_cgroup_events_index   <--- base
> 

Cool, let's stick to the linus tree. I was actually taking next-20240521
and reverting all the patches in the series to treat as the base. One
request I have would be to make the base the patch previous to the
59142d87ab03b i.e. not 59142d87ab03b.

> 
> > 
> > 2. What is the cgroup hierarchy where the workload is running? Is it
> > running in the root cgroup?
> 
> Our test system uses systemd from the distribution (debian-12). The workload is
> automatically assigned to a specific cgroup by systemd which is in the
> sub-hierarchy of root, so it is not directly running in the root cgroup.
> 
> > 
> > 3. For the followup experiments when needed, can you please remove the
> > whole series (including 59142d87ab03b8ff) for the base numbers.
> 
> I cannot understand this very well, if the patch is to fix the regression
> cause by this series, seems to me the best way is to apply this patch on top
> of the series. anything I misunderstood here?
> 

Sorry I just meant to make the 'base' case to compare against the commit
previous to 59142d87ab03b as I said above.

I will re-run my experiments on linus tree and report back.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-24 18:06                     ` Shakeel Butt
@ 2024-05-28  6:30                       ` Shakeel Butt
  2024-05-30  6:17                         ` Oliver Sang
  0 siblings, 1 reply; 15+ messages in thread
From: Shakeel Butt @ 2024-05-28  6:30 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin

On Fri, May 24, 2024 at 11:06:54AM GMT, Shakeel Butt wrote:
> On Fri, May 24, 2024 at 03:45:54PM +0800, Oliver Sang wrote:
[...]
> I will re-run my experiments on linus tree and report back.

I am not able to reproduce the regression with the fix I have proposed,
at least on my 1 node 52 CPUs (Cooper Lake) and 2 node 80 CPUs (Skylake)
machines. Let me give more details below:

Setup instructions:
-------------------
mount -t tmpfs tmpfs /tmp
mkdir -p /sys/fs/cgroup/A
mkdir -p /sys/fs/cgroup/A/B
mkdir -p /sys/fs/cgroup/A/B/C
echo +memory > /sys/fs/cgroup/A/cgroup.subtree_control
echo +memory > /sys/fs/cgroup/A/B/cgroup.subtree_control
echo $$ > /sys/fs/cgroup/A/B/C/cgroup.procs

The base case (commit a4c43b8a0980):
------------------------------------
$ python3 ./runtest.py page_fault2 295 process 0 0 52
tasks,processes,processes_idle,threads,threads_idle,linear
0,0,100,0,100,0
52,2796769,0.03,0,0.00,0

$ python3 ./runtest.py page_fault2 295 process 0 0 80
tasks,processes,processes_idle,threads,threads_idle,linear
0,0,100,0,100,0
80,6755010,0.04,0,0.00,0


The regressing series (last commit a94032b35e5f)
------------------------------------------------
$ python3 ./runtest.py page_fault2 295 process 0 0 52
tasks,processes,processes_idle,threads,threads_idle,linear
0,0,100,0,100,0
52,2684859,0.03,0,0.00,0

$ python3 ./runtest.py page_fault2 295 process 0 0 80
tasks,processes,processes_idle,threads,threads_idle,linear
0,0,100,0,100,0
80,6010438,0.13,0,0.00,0

The fix on top of regressing series:
------------------------------------
$ python3 ./runtest.py page_fault2 295 process 0 0 52
tasks,processes,processes_idle,threads,threads_idle,linear
0,0,100,0,100,0
52,3812133,0.02,0,0.00,0

$ python3 ./runtest.py page_fault2 295 process 0 0 80
tasks,processes,processes_idle,threads,threads_idle,linear
0,0,100,0,100,0
80,7979893,0.15,0,0.00,0


As you can see, the fix is improving the performance over the base, at
least for me. I can only speculate that either the difference of
hardware is giving us different results (you have newer CPUs) or there
is still disparity of experiment setup/environment between us.

Are you disabling hyperthreading? Is the prefetching heuristics
different on your systems?

Regarding test environment, can you check my setup instructions above
and see if I am doing something wrong or different?

At the moment, I am inclined towards asking Andrew to include my fix in
following 6.10-rc* but keep this report open, so we continue to improve.
Let me know if you have concerns.

thanks,
Shakeel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-next:master] [memcg]  70a64b7919: will-it-scale.per_process_ops -11.9% regression
  2024-05-28  6:30                       ` Shakeel Butt
@ 2024-05-30  6:17                         ` Oliver Sang
  0 siblings, 0 replies; 15+ messages in thread
From: Oliver Sang @ 2024-05-30  6:17 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Yosry Ahmed, T.J. Mercier, Roman Gushchin, Johannes Weiner,
	Michal Hocko, Muchun Song, cgroups, ying.huang, feng.tang,
	fengwei.yin, oliver.sang

hi, Shakeel,

On Mon, May 27, 2024 at 11:30:38PM -0700, Shakeel Butt wrote:
> On Fri, May 24, 2024 at 11:06:54AM GMT, Shakeel Butt wrote:
> > On Fri, May 24, 2024 at 03:45:54PM +0800, Oliver Sang wrote:
> [...]
> > I will re-run my experiments on linus tree and report back.
> 
> I am not able to reproduce the regression with the fix I have proposed,
> at least on my 1 node 52 CPUs (Cooper Lake) and 2 node 80 CPUs (Skylake)
> machines. Let me give more details below:
> 
> Setup instructions:
> -------------------
> mount -t tmpfs tmpfs /tmp
> mkdir -p /sys/fs/cgroup/A
> mkdir -p /sys/fs/cgroup/A/B
> mkdir -p /sys/fs/cgroup/A/B/C
> echo +memory > /sys/fs/cgroup/A/cgroup.subtree_control
> echo +memory > /sys/fs/cgroup/A/B/cgroup.subtree_control
> echo $$ > /sys/fs/cgroup/A/B/C/cgroup.procs
> 
> The base case (commit a4c43b8a0980):
> ------------------------------------
> $ python3 ./runtest.py page_fault2 295 process 0 0 52
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 52,2796769,0.03,0,0.00,0
> 
> $ python3 ./runtest.py page_fault2 295 process 0 0 80
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 80,6755010,0.04,0,0.00,0
> 
> 
> The regressing series (last commit a94032b35e5f)
> ------------------------------------------------
> $ python3 ./runtest.py page_fault2 295 process 0 0 52
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 52,2684859,0.03,0,0.00,0
> 
> $ python3 ./runtest.py page_fault2 295 process 0 0 80
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 80,6010438,0.13,0,0.00,0
> 
> The fix on top of regressing series:
> ------------------------------------
> $ python3 ./runtest.py page_fault2 295 process 0 0 52
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 52,3812133,0.02,0,0.00,0
> 
> $ python3 ./runtest.py page_fault2 295 process 0 0 80
> tasks,processes,processes_idle,threads,threads_idle,linear
> 0,0,100,0,100,0
> 80,7979893,0.15,0,0.00,0
> 
> 
> As you can see, the fix is improving the performance over the base, at
> least for me. I can only speculate that either the difference of
> hardware is giving us different results (you have newer CPUs) or there
> is still disparity of experiment setup/environment between us.
> 
> Are you disabling hyperthreading? Is the prefetching heuristics
> different on your systems?

we don't disable hyperthreading.

for prefetching, we don't change bios default setting. for the skl server
in our original report:

MLC Spatial Prefetcher     - enabled
DCU Data Prefetcher        - enabled
DCU Instruction Prefetcher - enabled
LLC Prefetch               - disabled

but we don't uniform these setting for all our servers. such like for that
Ice Lake server mentioned in previous mail, the "LLC Prefetch" is default
to be enabled, so we keep it as enabled.

> 
> Regarding test environment, can you check my setup instructions above
> and see if I am doing something wrong or different?
> 
> At the moment, I am inclined towards asking Andrew to include my fix in
> following 6.10-rc* but keep this report open, so we continue to improve.
> Let me know if you have concerns.

yeah, different setup/environment could cause difference. anyway, when your
fix merged, we could capture it for some performance improvement. or if you
want us a manual check, you could let us know. Thanks!

> 
> thanks,
> Shakeel


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-05-30  6:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-17  5:56 [linux-next:master] [memcg] 70a64b7919: will-it-scale.per_process_ops -11.9% regression kernel test robot
2024-05-17 23:38 ` Yosry Ahmed
2024-05-18  6:28 ` Shakeel Butt
2024-05-19  9:14   ` Oliver Sang
2024-05-19 17:20     ` Shakeel Butt
2024-05-20  2:43       ` Oliver Sang
2024-05-20  3:49         ` Shakeel Butt
2024-05-21  2:43           ` Oliver Sang
2024-05-22  4:18             ` Shakeel Butt
2024-05-23  7:48               ` Oliver Sang
2024-05-23 16:47                 ` Shakeel Butt
2024-05-24  7:45                   ` Oliver Sang
2024-05-24 18:06                     ` Shakeel Butt
2024-05-28  6:30                       ` Shakeel Butt
2024-05-30  6:17                         ` Oliver Sang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox