* [linus:master] [mm/vmalloc] 9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression
@ 2025-12-12 3:27 kernel test robot
2025-12-15 12:19 ` Uladzislau Rezki
0 siblings, 1 reply; 7+ messages in thread
From: kernel test robot @ 2025-12-12 3:27 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Michal Hocko,
Baoquan He, Alexander Potapenko, Andrey Ryabinin, Marco Elver,
Michal Hocko, linux-mm, oliver.sang
Hello,
kernel test robot noticed a 21.3% regression of stress-ng.bigheap.realloc_calls_per_sec on:
commit: 9c47753167a6a585d0305663c6912f042e131c2d ("mm/vmalloc: defer freeing partly initialized vm_struct")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91]
[still regression on linux-next/master 008d3547aae5bc86fac3eda317489169c3fda112]
testcase: stress-ng
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: bigheap
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202512121138.986f6a6b-lkp@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251212/202512121138.986f6a6b-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp3/bigheap/stress-ng/60s
commit:
86e968d8ca ("mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area()")
9c47753167 ("mm/vmalloc: defer freeing partly initialized vm_struct")
86e968d8ca6dc823 9c47753167a6a585d0305663c69
---------------- ---------------------------
%stddev %change %stddev
\ | \
209109 ± 5% -14.1% 179718 ± 6% numa-meminfo.node0.PageTables
1278595 ± 7% -10.4% 1145748 ± 2% sched_debug.cpu.max_idle_balance_cost.max
33.90 -3.6% 32.67 turbostat.RAMWatt
3.885e+08 -10.9% 3.463e+08 numa-numastat.node0.local_node
3.886e+08 -10.8% 3.466e+08 numa-numastat.node0.numa_hit
3.881e+08 -10.9% 3.46e+08 numa-numastat.node1.local_node
3.883e+08 -10.9% 3.461e+08 numa-numastat.node1.numa_hit
3.886e+08 -10.8% 3.466e+08 numa-vmstat.node0.numa_hit
3.885e+08 -10.9% 3.463e+08 numa-vmstat.node0.numa_local
3.883e+08 -10.9% 3.461e+08 numa-vmstat.node1.numa_hit
3.881e+08 -10.9% 3.46e+08 numa-vmstat.node1.numa_local
48320196 -10.9% 43072080 stress-ng.bigheap.ops
785159 -9.8% 708390 stress-ng.bigheap.ops_per_sec
879805 -21.3% 692805 stress-ng.bigheap.realloc_calls_per_sec
72414 -3.3% 70043 stress-ng.time.involuntary_context_switches
7.735e+08 -10.9% 6.895e+08 stress-ng.time.minor_page_faults
15385 -1.0% 15224 stress-ng.time.system_time
236.00 -10.5% 211.19 ± 2% stress-ng.time.user_time
0.32 ± 4% +95.1% 0.63 ± 12% perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
16.96 ± 41% +5031.1% 870.26 ± 40% perf-sched.sch_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
0.32 ± 4% +95.1% 0.63 ± 12% perf-sched.total_sch_delay.average.ms
16.96 ± 41% +5031.1% 870.26 ± 40% perf-sched.total_sch_delay.max.ms
4750 ± 4% -12.2% 4169 ± 4% perf-sched.total_wait_and_delay.max.ms
4750 ± 4% -12.2% 4169 ± 4% perf-sched.total_wait_time.max.ms
4750 ± 4% -12.2% 4169 ± 4% perf-sched.wait_and_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
4750 ± 4% -12.2% 4169 ± 4% perf-sched.wait_time.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
29568942 -2.9% 28712561 proc-vmstat.nr_active_anon
28797015 -2.8% 27991137 proc-vmstat.nr_anon_pages
99294 -3.7% 95669 proc-vmstat.nr_page_table_pages
29568950 -2.9% 28712562 proc-vmstat.nr_zone_active_anon
7.77e+08 -10.9% 6.927e+08 proc-vmstat.numa_hit
7.766e+08 -10.9% 6.923e+08 proc-vmstat.numa_local
7.785e+08 -10.8% 6.941e+08 proc-vmstat.pgalloc_normal
7.739e+08 -10.8% 6.899e+08 proc-vmstat.pgfault
7.756e+08 -10.6% 6.931e+08 proc-vmstat.pgfree
7.68 -3.8% 7.39 perf-stat.i.MPKI
2.811e+10 -4.9% 2.672e+10 perf-stat.i.branch-instructions
0.06 -0.0 0.05 perf-stat.i.branch-miss-rate%
15424402 -14.3% 13220241 perf-stat.i.branch-misses
80.75 -2.3 78.42 perf-stat.i.cache-miss-rate%
1.037e+09 -11.0% 9.233e+08 perf-stat.i.cache-misses
1.217e+09 -10.6% 1.088e+09 perf-stat.i.cache-references
2817 ± 2% -2.8% 2739 perf-stat.i.context-switches
7.16 +5.1% 7.53 perf-stat.i.cpi
1846 ± 5% +30.6% 2410 ± 5% perf-stat.i.cycles-between-cache-misses
1.298e+11 -5.9% 1.222e+11 perf-stat.i.instructions
0.14 -5.2% 0.13 perf-stat.i.ipc
103.98 -9.7% 93.94 perf-stat.i.metric.K/sec
13534286 -11.0% 12040965 perf-stat.i.minor-faults
13534286 -11.0% 12040965 perf-stat.i.page-faults
7.64 -5.3% 7.23 perf-stat.overall.MPKI
0.05 -0.0 0.05 perf-stat.overall.branch-miss-rate%
7.20 +5.3% 7.58 perf-stat.overall.cpi
942.28 +11.2% 1047 perf-stat.overall.cycles-between-cache-misses
0.14 -5.0% 0.13 perf-stat.overall.ipc
2.678e+10 -4.1% 2.569e+10 perf-stat.ps.branch-instructions
14559650 -13.3% 12627015 perf-stat.ps.branch-misses
9.434e+08 -10.0% 8.491e+08 perf-stat.ps.cache-misses
1.112e+09 -9.5% 1.006e+09 perf-stat.ps.cache-references
1.235e+11 -4.9% 1.174e+11 perf-stat.ps.instructions
12270397 -10.0% 11048367 perf-stat.ps.minor-faults
12270398 -10.0% 11048367 perf-stat.ps.page-faults
7.755e+12 -5.9% 7.3e+12 perf-stat.total.instructions
42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.__munmap
42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
41.78 ± 2% -5.1 36.70 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap
41.78 ± 2% -5.1 36.70 perf-profile.calltrace.cycles-pp.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
41.78 ± 2% -5.1 36.70 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas
41.78 ± 2% -5.1 36.70 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.vms_clear_ptes
41.51 ± 2% -5.1 36.45 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range
41.51 ± 2% -5.1 36.45 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range
41.51 ± 2% -5.1 36.45 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
41.65 -5.1 36.60 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache
41.63 -5.1 36.58 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs
41.65 -5.1 36.60 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages
41.46 ± 2% -5.0 36.41 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range
40.84 ± 2% -4.9 35.90 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu
3.89 ± 4% -2.4 1.53 ± 8% perf-profile.calltrace.cycles-pp.si_meminfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.84 ± 4% -2.4 1.49 ± 8% perf-profile.calltrace.cycles-pp.nr_blockdev_pages.si_meminfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64
3.82 ± 4% -2.3 1.47 ± 9% perf-profile.calltrace.cycles-pp._raw_spin_lock.nr_blockdev_pages.si_meminfo.do_sysinfo.__do_sys_sysinfo
3.74 ± 4% -2.3 1.43 ± 9% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.nr_blockdev_pages.si_meminfo.do_sysinfo
3.10 ± 2% -0.6 2.45 ± 2% perf-profile.calltrace.cycles-pp.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
1.90 -0.4 1.52 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
1.84 -0.4 1.48 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault
1.80 -0.4 1.44 perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page
1.70 -0.4 1.36 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio
1.43 ± 6% -0.3 1.12 ± 2% perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
1.26 ± 4% -0.3 0.98 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.do_anonymous_page.__handle_mm_fault.handle_mm_fault
1.21 -0.3 0.95 perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof
1.16 ± 8% -0.3 0.90 ± 5% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
1.17 -0.3 0.92 perf-profile.calltrace.cycles-pp.clear_page_erms.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol
44.15 ± 2% +7.5 51.61 ± 2% perf-profile.calltrace.cycles-pp.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo
44.32 ± 2% +7.5 51.79 ± 2% perf-profile.calltrace.cycles-pp.sysinfo
44.30 ± 2% +7.5 51.77 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo
44.30 ± 2% +7.5 51.77 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sysinfo
44.28 ± 2% +7.5 51.75 ± 2% perf-profile.calltrace.cycles-pp.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo
40.25 ± 2% +9.8 50.06 ± 2% perf-profile.calltrace.cycles-pp.si_swapinfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe
40.24 ± 2% +9.8 50.06 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.si_swapinfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64
40.08 ± 2% +9.8 49.92 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.si_swapinfo.do_sysinfo.__do_sys_sysinfo
44.76 ± 2% -6.0 38.80 ± 4% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
44.44 ± 2% -5.9 38.56 ± 4% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
42.85 -5.2 37.62 perf-profile.children.cycles-pp.__munmap
42.85 -5.2 37.62 perf-profile.children.cycles-pp.__vm_munmap
42.85 -5.2 37.62 perf-profile.children.cycles-pp.__x64_sys_munmap
42.88 -5.2 37.65 perf-profile.children.cycles-pp.do_vmi_align_munmap
42.88 -5.2 37.65 perf-profile.children.cycles-pp.vms_clear_ptes
42.88 -5.2 37.65 perf-profile.children.cycles-pp.vms_complete_munmap_vmas
42.86 -5.2 37.64 perf-profile.children.cycles-pp.do_vmi_munmap
42.62 -5.2 37.40 perf-profile.children.cycles-pp.folios_put_refs
42.60 -5.2 37.40 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
42.60 -5.2 37.40 perf-profile.children.cycles-pp.free_pages_and_swap_cache
41.93 -5.1 36.84 perf-profile.children.cycles-pp.__page_cache_release
41.80 ± 2% -5.1 36.72 perf-profile.children.cycles-pp.unmap_page_range
41.80 ± 2% -5.1 36.72 perf-profile.children.cycles-pp.unmap_vmas
41.80 ± 2% -5.1 36.72 perf-profile.children.cycles-pp.zap_pmd_range
41.80 ± 2% -5.1 36.72 perf-profile.children.cycles-pp.zap_pte_range
41.51 ± 2% -5.1 36.45 perf-profile.children.cycles-pp.tlb_flush_mmu
3.89 ± 4% -2.4 1.53 ± 8% perf-profile.children.cycles-pp.si_meminfo
3.84 ± 4% -2.4 1.49 ± 8% perf-profile.children.cycles-pp.nr_blockdev_pages
3.11 ± 2% -0.6 2.46 ± 2% perf-profile.children.cycles-pp.alloc_anon_folio
1.90 -0.4 1.52 perf-profile.children.cycles-pp.vma_alloc_folio_noprof
1.89 -0.4 1.52 perf-profile.children.cycles-pp.alloc_pages_mpol
1.84 -0.4 1.48 perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
1.73 -0.3 1.39 perf-profile.children.cycles-pp.get_page_from_freelist
0.56 ± 72% -0.3 0.22 ±108% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
1.45 ± 6% -0.3 1.14 ± 3% perf-profile.children.cycles-pp.__pte_offset_map_lock
1.22 -0.3 0.96 perf-profile.children.cycles-pp.prep_new_page
1.16 ± 7% -0.3 0.90 ± 5% perf-profile.children.cycles-pp.__mem_cgroup_charge
1.19 -0.3 0.93 perf-profile.children.cycles-pp.clear_page_erms
0.26 ± 8% -0.1 0.16 ± 3% perf-profile.children.cycles-pp.handle_internal_command
0.26 ± 8% -0.1 0.16 ± 3% perf-profile.children.cycles-pp.main
0.26 ± 8% -0.1 0.16 ± 3% perf-profile.children.cycles-pp.run_builtin
0.44 ± 10% -0.1 0.35 ± 6% perf-profile.children.cycles-pp.free_unref_folios
0.25 ± 9% -0.1 0.16 ± 3% perf-profile.children.cycles-pp.record__mmap_read_evlist
0.40 ± 11% -0.1 0.31 ± 6% perf-profile.children.cycles-pp.free_frozen_page_commit
0.24 ± 8% -0.1 0.16 ± 4% perf-profile.children.cycles-pp.perf_mmap__push
0.38 ± 13% -0.1 0.30 ± 7% perf-profile.children.cycles-pp.free_pcppages_bulk
0.55 -0.1 0.48 perf-profile.children.cycles-pp.sync_regs
0.48 ± 4% -0.1 0.42 ± 2% perf-profile.children.cycles-pp.native_irq_return_iret
0.37 ± 4% -0.1 0.31 ± 3% perf-profile.children.cycles-pp.rmqueue
0.35 ± 4% -0.1 0.30 ± 3% perf-profile.children.cycles-pp.rmqueue_pcplist
0.19 ± 6% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.record__pushfn
0.18 ± 7% -0.0 0.13 ± 2% perf-profile.children.cycles-pp.ksys_write
0.17 ± 5% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.vfs_write
0.28 ± 5% -0.0 0.24 ± 3% perf-profile.children.cycles-pp.__rmqueue_pcplist
0.31 -0.0 0.27 perf-profile.children.cycles-pp.lru_add
0.16 ± 5% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.shmem_file_write_iter
0.24 ± 6% -0.0 0.20 ± 5% perf-profile.children.cycles-pp.rmqueue_bulk
0.16 ± 4% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.generic_perform_write
0.24 ± 2% -0.0 0.20 perf-profile.children.cycles-pp.lru_gen_add_folio
0.21 -0.0 0.18 perf-profile.children.cycles-pp.lru_gen_del_folio
0.25 ± 2% -0.0 0.22 perf-profile.children.cycles-pp.zap_present_ptes
0.14 ± 2% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.lock_vma_under_rcu
0.14 ± 3% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.__mod_node_page_state
0.13 -0.0 0.12 ± 4% perf-profile.children.cycles-pp.__perf_sw_event
0.06 ± 7% -0.0 0.05 perf-profile.children.cycles-pp.___pte_offset_map
0.09 ± 5% -0.0 0.08 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios
0.08 ± 6% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.vma_merge_extend
0.11 ± 3% -0.0 0.10 perf-profile.children.cycles-pp.__free_one_page
0.07 -0.0 0.06 perf-profile.children.cycles-pp.error_entry
0.06 -0.0 0.05 perf-profile.children.cycles-pp.__mod_zone_page_state
0.11 -0.0 0.10 perf-profile.children.cycles-pp.___perf_sw_event
0.10 ± 4% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.sched_tick
0.21 ± 3% +0.0 0.24 ± 5% perf-profile.children.cycles-pp.update_process_times
0.22 ± 3% +0.0 0.26 ± 7% perf-profile.children.cycles-pp.tick_nohz_handler
0.30 ± 4% +0.0 0.34 ± 6% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.29 ± 4% +0.0 0.33 ± 6% perf-profile.children.cycles-pp.hrtimer_interrupt
0.39 ± 2% +0.0 0.43 ± 2% perf-profile.children.cycles-pp.mremap
0.31 ± 4% +0.0 0.36 ± 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.34 ± 3% +0.0 0.39 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.28 ± 3% +0.1 0.34 ± 2% perf-profile.children.cycles-pp.__do_sys_mremap
0.28 ± 2% +0.1 0.34 ± 3% perf-profile.children.cycles-pp.do_mremap
0.11 ± 4% +0.1 0.17 ± 2% perf-profile.children.cycles-pp.expand_vma
0.00 +0.1 0.08 perf-profile.children.cycles-pp.__vm_enough_memory
0.00 +0.1 0.09 ± 5% perf-profile.children.cycles-pp.vrm_calc_charge
0.04 ±141% +0.1 0.13 ± 16% perf-profile.children.cycles-pp.add_callchain_ip
0.04 ±142% +0.1 0.14 ± 17% perf-profile.children.cycles-pp.thread__resolve_callchain_sample
0.04 ±142% +0.1 0.17 ± 15% perf-profile.children.cycles-pp.__thread__resolve_callchain
0.04 ±142% +0.1 0.18 ± 15% perf-profile.children.cycles-pp.sample__for_each_callchain_node
0.05 ±141% +0.1 0.18 ± 14% perf-profile.children.cycles-pp.build_id__mark_dso_hit
0.05 ±141% +0.1 0.19 ± 14% perf-profile.children.cycles-pp.perf_session__deliver_event
0.05 ±141% +0.1 0.20 ± 14% perf-profile.children.cycles-pp.__ordered_events__flush
0.05 ±141% +0.1 0.20 ± 33% perf-profile.children.cycles-pp.perf_session__process_events
0.05 ±141% +0.1 0.20 ± 33% perf-profile.children.cycles-pp.record__finish_output
88.59 +1.5 90.13 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
45.34 ± 2% +7.2 52.54 ± 2% perf-profile.children.cycles-pp._raw_spin_lock
44.15 ± 2% +7.5 51.61 ± 2% perf-profile.children.cycles-pp.do_sysinfo
44.33 ± 2% +7.5 51.80 ± 2% perf-profile.children.cycles-pp.sysinfo
44.28 ± 2% +7.5 51.75 ± 2% perf-profile.children.cycles-pp.__do_sys_sysinfo
40.25 ± 2% +9.8 50.07 ± 2% perf-profile.children.cycles-pp.si_swapinfo
0.55 ± 74% -0.3 0.22 ±107% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
1.50 ± 4% -0.3 1.17 perf-profile.self.cycles-pp._raw_spin_lock
1.18 -0.3 0.92 perf-profile.self.cycles-pp.clear_page_erms
2.01 -0.2 1.86 ± 3% perf-profile.self.cycles-pp.stress_bigheap_child
0.55 -0.1 0.48 perf-profile.self.cycles-pp.sync_regs
0.48 ± 4% -0.1 0.42 ± 2% perf-profile.self.cycles-pp.native_irq_return_iret
0.14 ± 3% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.get_page_from_freelist
0.14 ± 8% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.do_anonymous_page
0.14 ± 2% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.rmqueue_bulk
0.14 -0.0 0.12 perf-profile.self.cycles-pp.lru_gen_del_folio
0.11 ± 3% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.__handle_mm_fault
0.15 ± 2% -0.0 0.13 perf-profile.self.cycles-pp.lru_gen_add_folio
0.12 ± 3% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.zap_present_ptes
0.12 ± 4% -0.0 0.11 perf-profile.self.cycles-pp.__mod_node_page_state
0.07 ± 6% -0.0 0.06 perf-profile.self.cycles-pp.lock_vma_under_rcu
0.10 -0.0 0.09 ± 4% perf-profile.self.cycles-pp.__free_one_page
0.11 ± 3% -0.0 0.10 perf-profile.self.cycles-pp.folios_put_refs
0.07 -0.0 0.06 perf-profile.self.cycles-pp.___perf_sw_event
0.07 -0.0 0.06 perf-profile.self.cycles-pp.do_user_addr_fault
0.07 -0.0 0.06 perf-profile.self.cycles-pp.lru_add
0.07 -0.0 0.06 perf-profile.self.cycles-pp.mas_walk
0.08 -0.0 0.07 perf-profile.self.cycles-pp.__alloc_frozen_pages_noprof
0.06 -0.0 0.05 perf-profile.self.cycles-pp.handle_mm_fault
0.06 -0.0 0.05 perf-profile.self.cycles-pp.page_counter_uncharge
0.00 +0.1 0.08 perf-profile.self.cycles-pp.__vm_enough_memory
88.36 +1.5 89.85 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [linus:master] [mm/vmalloc] 9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression 2025-12-12 3:27 [linus:master] [mm/vmalloc] 9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression kernel test robot @ 2025-12-15 12:19 ` Uladzislau Rezki 2025-12-17 5:27 ` Oliver Sang 0 siblings, 1 reply; 7+ messages in thread From: Uladzislau Rezki @ 2025-12-15 12:19 UTC (permalink / raw) To: kernel test robot Cc: Uladzislau Rezki, oe-lkp, lkp, linux-kernel, Andrew Morton, Michal Hocko, Baoquan He, Alexander Potapenko, Andrey Ryabinin, Marco Elver, Michal Hocko, linux-mm On Fri, Dec 12, 2025 at 11:27:27AM +0800, kernel test robot wrote: > > > Hello, > > kernel test robot noticed a 21.3% regression of stress-ng.bigheap.realloc_calls_per_sec on: > > > commit: 9c47753167a6a585d0305663c6912f042e131c2d ("mm/vmalloc: defer freeing partly initialized vm_struct") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91] > [still regression on linux-next/master 008d3547aae5bc86fac3eda317489169c3fda112] > > testcase: stress-ng > config: x86_64-rhel-9.4 > compiler: gcc-14 > test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory > parameters: > > nr_threads: 100% > testtime: 60s > test: bigheap > cpufreq_governor: performance > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202512121138.986f6a6b-lkp@intel.com > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20251212/202512121138.986f6a6b-lkp@intel.com > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp3/bigheap/stress-ng/60s > > commit: > 86e968d8ca ("mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area()") > 9c47753167 ("mm/vmalloc: defer freeing partly initialized vm_struct") > > 86e968d8ca6dc823 9c47753167a6a585d0305663c69 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 209109 ± 5% -14.1% 179718 ± 6% numa-meminfo.node0.PageTables > 1278595 ± 7% -10.4% 1145748 ± 2% sched_debug.cpu.max_idle_balance_cost.max > 33.90 -3.6% 32.67 turbostat.RAMWatt > 3.885e+08 -10.9% 3.463e+08 numa-numastat.node0.local_node > 3.886e+08 -10.8% 3.466e+08 numa-numastat.node0.numa_hit > 3.881e+08 -10.9% 3.46e+08 numa-numastat.node1.local_node > 3.883e+08 -10.9% 3.461e+08 numa-numastat.node1.numa_hit > 3.886e+08 -10.8% 3.466e+08 numa-vmstat.node0.numa_hit > 3.885e+08 -10.9% 3.463e+08 numa-vmstat.node0.numa_local > 3.883e+08 -10.9% 3.461e+08 numa-vmstat.node1.numa_hit > 3.881e+08 -10.9% 3.46e+08 numa-vmstat.node1.numa_local > 48320196 -10.9% 43072080 stress-ng.bigheap.ops > 785159 -9.8% 708390 stress-ng.bigheap.ops_per_sec > 879805 -21.3% 692805 stress-ng.bigheap.realloc_calls_per_sec > 72414 -3.3% 70043 stress-ng.time.involuntary_context_switches > 7.735e+08 -10.9% 6.895e+08 stress-ng.time.minor_page_faults > 15385 -1.0% 15224 stress-ng.time.system_time > 236.00 -10.5% 211.19 ± 2% stress-ng.time.user_time > 0.32 ± 4% +95.1% 0.63 ± 12% perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown] > 16.96 ± 41% +5031.1% 870.26 ± 40% perf-sched.sch_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown] > 0.32 ± 4% +95.1% 0.63 ± 12% perf-sched.total_sch_delay.average.ms > 16.96 ± 41% +5031.1% 870.26 ± 40% perf-sched.total_sch_delay.max.ms > 4750 ± 4% -12.2% 4169 ± 4% perf-sched.total_wait_and_delay.max.ms > 4750 ± 4% -12.2% 4169 ± 4% perf-sched.total_wait_time.max.ms > 4750 ± 4% -12.2% 4169 ± 4% perf-sched.wait_and_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown] > 4750 ± 4% -12.2% 4169 ± 4% perf-sched.wait_time.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown] > 29568942 -2.9% 28712561 proc-vmstat.nr_active_anon > 28797015 -2.8% 27991137 proc-vmstat.nr_anon_pages > 99294 -3.7% 95669 proc-vmstat.nr_page_table_pages > 29568950 -2.9% 28712562 proc-vmstat.nr_zone_active_anon > 7.77e+08 -10.9% 6.927e+08 proc-vmstat.numa_hit > 7.766e+08 -10.9% 6.923e+08 proc-vmstat.numa_local > 7.785e+08 -10.8% 6.941e+08 proc-vmstat.pgalloc_normal > 7.739e+08 -10.8% 6.899e+08 proc-vmstat.pgfault > 7.756e+08 -10.6% 6.931e+08 proc-vmstat.pgfree > 7.68 -3.8% 7.39 perf-stat.i.MPKI > 2.811e+10 -4.9% 2.672e+10 perf-stat.i.branch-instructions > 0.06 -0.0 0.05 perf-stat.i.branch-miss-rate% > 15424402 -14.3% 13220241 perf-stat.i.branch-misses > 80.75 -2.3 78.42 perf-stat.i.cache-miss-rate% > 1.037e+09 -11.0% 9.233e+08 perf-stat.i.cache-misses > 1.217e+09 -10.6% 1.088e+09 perf-stat.i.cache-references > 2817 ± 2% -2.8% 2739 perf-stat.i.context-switches > 7.16 +5.1% 7.53 perf-stat.i.cpi > 1846 ± 5% +30.6% 2410 ± 5% perf-stat.i.cycles-between-cache-misses > 1.298e+11 -5.9% 1.222e+11 perf-stat.i.instructions > 0.14 -5.2% 0.13 perf-stat.i.ipc > 103.98 -9.7% 93.94 perf-stat.i.metric.K/sec > 13534286 -11.0% 12040965 perf-stat.i.minor-faults > 13534286 -11.0% 12040965 perf-stat.i.page-faults > 7.64 -5.3% 7.23 perf-stat.overall.MPKI > 0.05 -0.0 0.05 perf-stat.overall.branch-miss-rate% > 7.20 +5.3% 7.58 perf-stat.overall.cpi > 942.28 +11.2% 1047 perf-stat.overall.cycles-between-cache-misses > 0.14 -5.0% 0.13 perf-stat.overall.ipc > 2.678e+10 -4.1% 2.569e+10 perf-stat.ps.branch-instructions > 14559650 -13.3% 12627015 perf-stat.ps.branch-misses > 9.434e+08 -10.0% 8.491e+08 perf-stat.ps.cache-misses > 1.112e+09 -9.5% 1.006e+09 perf-stat.ps.cache-references > 1.235e+11 -4.9% 1.174e+11 perf-stat.ps.instructions > 12270397 -10.0% 11048367 perf-stat.ps.minor-faults > 12270398 -10.0% 11048367 perf-stat.ps.page-faults > 7.755e+12 -5.9% 7.3e+12 perf-stat.total.instructions > 42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.__munmap > 42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap > 42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 > 42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe > 42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap > 42.85 -5.2 37.62 perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap > 41.78 ± 2% -5.1 36.70 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap > 41.78 ± 2% -5.1 36.70 perf-profile.calltrace.cycles-pp.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap > 41.78 ± 2% -5.1 36.70 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas > 41.78 ± 2% -5.1 36.70 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.vms_clear_ptes > 41.51 ± 2% -5.1 36.45 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range > 41.51 ± 2% -5.1 36.45 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range > 41.51 ± 2% -5.1 36.45 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas > 41.65 -5.1 36.60 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache > 41.63 -5.1 36.58 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs > 41.65 -5.1 36.60 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages > 41.46 ± 2% -5.0 36.41 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range > 40.84 ± 2% -4.9 35.90 perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu > 3.89 ± 4% -2.4 1.53 ± 8% perf-profile.calltrace.cycles-pp.si_meminfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe > 3.84 ± 4% -2.4 1.49 ± 8% perf-profile.calltrace.cycles-pp.nr_blockdev_pages.si_meminfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64 > 3.82 ± 4% -2.3 1.47 ± 9% perf-profile.calltrace.cycles-pp._raw_spin_lock.nr_blockdev_pages.si_meminfo.do_sysinfo.__do_sys_sysinfo > 3.74 ± 4% -2.3 1.43 ± 9% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.nr_blockdev_pages.si_meminfo.do_sysinfo > 3.10 ± 2% -0.6 2.45 ± 2% perf-profile.calltrace.cycles-pp.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault > 1.90 -0.4 1.52 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault > 1.84 -0.4 1.48 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault > 1.80 -0.4 1.44 perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page > 1.70 -0.4 1.36 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio > 1.43 ± 6% -0.3 1.12 ± 2% perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault > 1.26 ± 4% -0.3 0.98 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.do_anonymous_page.__handle_mm_fault.handle_mm_fault > 1.21 -0.3 0.95 perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof > 1.16 ± 8% -0.3 0.90 ± 5% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault > 1.17 -0.3 0.92 perf-profile.calltrace.cycles-pp.clear_page_erms.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol > 44.15 ± 2% +7.5 51.61 ± 2% perf-profile.calltrace.cycles-pp.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo > 44.32 ± 2% +7.5 51.79 ± 2% perf-profile.calltrace.cycles-pp.sysinfo > 44.30 ± 2% +7.5 51.77 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo > 44.30 ± 2% +7.5 51.77 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sysinfo > 44.28 ± 2% +7.5 51.75 ± 2% perf-profile.calltrace.cycles-pp.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo > 40.25 ± 2% +9.8 50.06 ± 2% perf-profile.calltrace.cycles-pp.si_swapinfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe > 40.24 ± 2% +9.8 50.06 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.si_swapinfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64 > 40.08 ± 2% +9.8 49.92 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.si_swapinfo.do_sysinfo.__do_sys_sysinfo > 44.76 ± 2% -6.0 38.80 ± 4% perf-profile.children.cycles-pp._raw_spin_lock_irqsave > 44.44 ± 2% -5.9 38.56 ± 4% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave > 42.85 -5.2 37.62 perf-profile.children.cycles-pp.__munmap > 42.85 -5.2 37.62 perf-profile.children.cycles-pp.__vm_munmap > 42.85 -5.2 37.62 perf-profile.children.cycles-pp.__x64_sys_munmap > 42.88 -5.2 37.65 perf-profile.children.cycles-pp.do_vmi_align_munmap > 42.88 -5.2 37.65 perf-profile.children.cycles-pp.vms_clear_ptes > 42.88 -5.2 37.65 perf-profile.children.cycles-pp.vms_complete_munmap_vmas > 42.86 -5.2 37.64 perf-profile.children.cycles-pp.do_vmi_munmap > 42.62 -5.2 37.40 perf-profile.children.cycles-pp.folios_put_refs > 42.60 -5.2 37.40 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages > 42.60 -5.2 37.40 perf-profile.children.cycles-pp.free_pages_and_swap_cache > 41.93 -5.1 36.84 perf-profile.children.cycles-pp.__page_cache_release > 41.80 ± 2% -5.1 36.72 perf-profile.children.cycles-pp.unmap_page_range > 41.80 ± 2% -5.1 36.72 perf-profile.children.cycles-pp.unmap_vmas > 41.80 ± 2% -5.1 36.72 perf-profile.children.cycles-pp.zap_pmd_range > 41.80 ± 2% -5.1 36.72 perf-profile.children.cycles-pp.zap_pte_range > 41.51 ± 2% -5.1 36.45 perf-profile.children.cycles-pp.tlb_flush_mmu > 3.89 ± 4% -2.4 1.53 ± 8% perf-profile.children.cycles-pp.si_meminfo > 3.84 ± 4% -2.4 1.49 ± 8% perf-profile.children.cycles-pp.nr_blockdev_pages > 3.11 ± 2% -0.6 2.46 ± 2% perf-profile.children.cycles-pp.alloc_anon_folio > 1.90 -0.4 1.52 perf-profile.children.cycles-pp.vma_alloc_folio_noprof > 1.89 -0.4 1.52 perf-profile.children.cycles-pp.alloc_pages_mpol > 1.84 -0.4 1.48 perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof > 1.73 -0.3 1.39 perf-profile.children.cycles-pp.get_page_from_freelist > 0.56 ± 72% -0.3 0.22 ±108% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm > 1.45 ± 6% -0.3 1.14 ± 3% perf-profile.children.cycles-pp.__pte_offset_map_lock > 1.22 -0.3 0.96 perf-profile.children.cycles-pp.prep_new_page > 1.16 ± 7% -0.3 0.90 ± 5% perf-profile.children.cycles-pp.__mem_cgroup_charge > 1.19 -0.3 0.93 perf-profile.children.cycles-pp.clear_page_erms > 0.26 ± 8% -0.1 0.16 ± 3% perf-profile.children.cycles-pp.handle_internal_command > 0.26 ± 8% -0.1 0.16 ± 3% perf-profile.children.cycles-pp.main > 0.26 ± 8% -0.1 0.16 ± 3% perf-profile.children.cycles-pp.run_builtin > 0.44 ± 10% -0.1 0.35 ± 6% perf-profile.children.cycles-pp.free_unref_folios > 0.25 ± 9% -0.1 0.16 ± 3% perf-profile.children.cycles-pp.record__mmap_read_evlist > 0.40 ± 11% -0.1 0.31 ± 6% perf-profile.children.cycles-pp.free_frozen_page_commit > 0.24 ± 8% -0.1 0.16 ± 4% perf-profile.children.cycles-pp.perf_mmap__push > 0.38 ± 13% -0.1 0.30 ± 7% perf-profile.children.cycles-pp.free_pcppages_bulk > 0.55 -0.1 0.48 perf-profile.children.cycles-pp.sync_regs > 0.48 ± 4% -0.1 0.42 ± 2% perf-profile.children.cycles-pp.native_irq_return_iret > 0.37 ± 4% -0.1 0.31 ± 3% perf-profile.children.cycles-pp.rmqueue > 0.35 ± 4% -0.1 0.30 ± 3% perf-profile.children.cycles-pp.rmqueue_pcplist > 0.19 ± 6% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.record__pushfn > 0.18 ± 7% -0.0 0.13 ± 2% perf-profile.children.cycles-pp.ksys_write > 0.17 ± 5% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.vfs_write > 0.28 ± 5% -0.0 0.24 ± 3% perf-profile.children.cycles-pp.__rmqueue_pcplist > 0.31 -0.0 0.27 perf-profile.children.cycles-pp.lru_add > 0.16 ± 5% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.shmem_file_write_iter > 0.24 ± 6% -0.0 0.20 ± 5% perf-profile.children.cycles-pp.rmqueue_bulk > 0.16 ± 4% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.generic_perform_write > 0.24 ± 2% -0.0 0.20 perf-profile.children.cycles-pp.lru_gen_add_folio > 0.21 -0.0 0.18 perf-profile.children.cycles-pp.lru_gen_del_folio > 0.25 ± 2% -0.0 0.22 perf-profile.children.cycles-pp.zap_present_ptes > 0.14 ± 2% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.lock_vma_under_rcu > 0.14 ± 3% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.__mod_node_page_state > 0.13 -0.0 0.12 ± 4% perf-profile.children.cycles-pp.__perf_sw_event > 0.06 ± 7% -0.0 0.05 perf-profile.children.cycles-pp.___pte_offset_map > 0.09 ± 5% -0.0 0.08 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios > 0.08 ± 6% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.vma_merge_extend > 0.11 ± 3% -0.0 0.10 perf-profile.children.cycles-pp.__free_one_page > 0.07 -0.0 0.06 perf-profile.children.cycles-pp.error_entry > 0.06 -0.0 0.05 perf-profile.children.cycles-pp.__mod_zone_page_state > 0.11 -0.0 0.10 perf-profile.children.cycles-pp.___perf_sw_event > 0.10 ± 4% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.sched_tick > 0.21 ± 3% +0.0 0.24 ± 5% perf-profile.children.cycles-pp.update_process_times > 0.22 ± 3% +0.0 0.26 ± 7% perf-profile.children.cycles-pp.tick_nohz_handler > 0.30 ± 4% +0.0 0.34 ± 6% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt > 0.29 ± 4% +0.0 0.33 ± 6% perf-profile.children.cycles-pp.hrtimer_interrupt > 0.39 ± 2% +0.0 0.43 ± 2% perf-profile.children.cycles-pp.mremap > 0.31 ± 4% +0.0 0.36 ± 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt > 0.34 ± 3% +0.0 0.39 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt > 0.28 ± 3% +0.1 0.34 ± 2% perf-profile.children.cycles-pp.__do_sys_mremap > 0.28 ± 2% +0.1 0.34 ± 3% perf-profile.children.cycles-pp.do_mremap > 0.11 ± 4% +0.1 0.17 ± 2% perf-profile.children.cycles-pp.expand_vma > 0.00 +0.1 0.08 perf-profile.children.cycles-pp.__vm_enough_memory > 0.00 +0.1 0.09 ± 5% perf-profile.children.cycles-pp.vrm_calc_charge > 0.04 ±141% +0.1 0.13 ± 16% perf-profile.children.cycles-pp.add_callchain_ip > 0.04 ±142% +0.1 0.14 ± 17% perf-profile.children.cycles-pp.thread__resolve_callchain_sample > 0.04 ±142% +0.1 0.17 ± 15% perf-profile.children.cycles-pp.__thread__resolve_callchain > 0.04 ±142% +0.1 0.18 ± 15% perf-profile.children.cycles-pp.sample__for_each_callchain_node > 0.05 ±141% +0.1 0.18 ± 14% perf-profile.children.cycles-pp.build_id__mark_dso_hit > 0.05 ±141% +0.1 0.19 ± 14% perf-profile.children.cycles-pp.perf_session__deliver_event > 0.05 ±141% +0.1 0.20 ± 14% perf-profile.children.cycles-pp.__ordered_events__flush > 0.05 ±141% +0.1 0.20 ± 33% perf-profile.children.cycles-pp.perf_session__process_events > 0.05 ±141% +0.1 0.20 ± 33% perf-profile.children.cycles-pp.record__finish_output > 88.59 +1.5 90.13 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath > 45.34 ± 2% +7.2 52.54 ± 2% perf-profile.children.cycles-pp._raw_spin_lock > 44.15 ± 2% +7.5 51.61 ± 2% perf-profile.children.cycles-pp.do_sysinfo > 44.33 ± 2% +7.5 51.80 ± 2% perf-profile.children.cycles-pp.sysinfo > 44.28 ± 2% +7.5 51.75 ± 2% perf-profile.children.cycles-pp.__do_sys_sysinfo > 40.25 ± 2% +9.8 50.07 ± 2% perf-profile.children.cycles-pp.si_swapinfo > 0.55 ± 74% -0.3 0.22 ±107% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm > 1.50 ± 4% -0.3 1.17 perf-profile.self.cycles-pp._raw_spin_lock > 1.18 -0.3 0.92 perf-profile.self.cycles-pp.clear_page_erms > 2.01 -0.2 1.86 ± 3% perf-profile.self.cycles-pp.stress_bigheap_child > 0.55 -0.1 0.48 perf-profile.self.cycles-pp.sync_regs > 0.48 ± 4% -0.1 0.42 ± 2% perf-profile.self.cycles-pp.native_irq_return_iret > 0.14 ± 3% -0.0 0.12 ± 4% perf-profile.self.cycles-pp.get_page_from_freelist > 0.14 ± 8% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.do_anonymous_page > 0.14 ± 2% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.rmqueue_bulk > 0.14 -0.0 0.12 perf-profile.self.cycles-pp.lru_gen_del_folio > 0.11 ± 3% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.__handle_mm_fault > 0.15 ± 2% -0.0 0.13 perf-profile.self.cycles-pp.lru_gen_add_folio > 0.12 ± 3% -0.0 0.10 ± 3% perf-profile.self.cycles-pp.zap_present_ptes > 0.12 ± 4% -0.0 0.11 perf-profile.self.cycles-pp.__mod_node_page_state > 0.07 ± 6% -0.0 0.06 perf-profile.self.cycles-pp.lock_vma_under_rcu > 0.10 -0.0 0.09 ± 4% perf-profile.self.cycles-pp.__free_one_page > 0.11 ± 3% -0.0 0.10 perf-profile.self.cycles-pp.folios_put_refs > 0.07 -0.0 0.06 perf-profile.self.cycles-pp.___perf_sw_event > 0.07 -0.0 0.06 perf-profile.self.cycles-pp.do_user_addr_fault > 0.07 -0.0 0.06 perf-profile.self.cycles-pp.lru_add > 0.07 -0.0 0.06 perf-profile.self.cycles-pp.mas_walk > 0.08 -0.0 0.07 perf-profile.self.cycles-pp.__alloc_frozen_pages_noprof > 0.06 -0.0 0.05 perf-profile.self.cycles-pp.handle_mm_fault > 0.06 -0.0 0.05 perf-profile.self.cycles-pp.page_counter_uncharge > 0.00 +0.1 0.08 perf-profile.self.cycles-pp.__vm_enough_memory > 88.36 +1.5 89.85 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath > > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki > Could you please test below patch and confirm if it solves regression: <snip> diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ecbac900c35f..118de1a8348c 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3746,6 +3746,15 @@ vm_area_alloc_pages(gfp_t gfp, int nid, return nr_allocated; } +static void +__vm_area_cleanup(struct vm_struct *area) +{ + if (area->pages) + vfree(area->addr); + else + free_vm_area(area); +} + static LLIST_HEAD(pending_vm_area_cleanup); static void cleanup_vm_area_work(struct work_struct *work) { @@ -3756,12 +3765,8 @@ static void cleanup_vm_area_work(struct work_struct *work) if (!head) return; - llist_for_each_entry_safe(area, tmp, head, llnode) { - if (!area->pages) - free_vm_area(area); - else - vfree(area->addr); - } + llist_for_each_entry_safe(area, tmp, head, llnode) + __vm_area_cleanup(area); } /* @@ -3769,8 +3774,11 @@ static void cleanup_vm_area_work(struct work_struct *work) * of partially initialized vm_struct in error paths. */ static DECLARE_WORK(cleanup_vm_area, cleanup_vm_area_work); -static void defer_vm_area_cleanup(struct vm_struct *area) +static void vm_area_cleanup(struct vm_struct *area, bool can_block) { + if (can_block) + return __vm_area_cleanup(area); + if (llist_add(&area->llnode, &pending_vm_area_cleanup)) schedule_work(&cleanup_vm_area); } @@ -3915,7 +3923,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, return area->addr; fail: - defer_vm_area_cleanup(area); + vm_area_cleanup(area, gfpflags_allow_blocking(gfp_mask)); return NULL; } <snip> -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linus:master] [mm/vmalloc] 9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression 2025-12-15 12:19 ` Uladzislau Rezki @ 2025-12-17 5:27 ` Oliver Sang 2025-12-17 11:04 ` Uladzislau Rezki 0 siblings, 1 reply; 7+ messages in thread From: Oliver Sang @ 2025-12-17 5:27 UTC (permalink / raw) To: Uladzislau Rezki Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Michal Hocko, Baoquan He, Alexander Potapenko, Andrey Ryabinin, Marco Elver, Michal Hocko, linux-mm, oliver.sang hi, Uladzislau Rezki, On Mon, Dec 15, 2025 at 01:19:14PM +0100, Uladzislau Rezki wrote: > On Fri, Dec 12, 2025 at 11:27:27AM +0800, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed a 21.3% regression of stress-ng.bigheap.realloc_calls_per_sec on: > > > > > > commit: 9c47753167a6a585d0305663c6912f042e131c2d ("mm/vmalloc: defer freeing partly initialized vm_struct") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91] > > [still regression on linux-next/master 008d3547aae5bc86fac3eda317489169c3fda112] > > > > testcase: stress-ng > > config: x86_64-rhel-9.4 > > compiler: gcc-14 > > test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory > > parameters: > > > > nr_threads: 100% > > testtime: 60s > > test: bigheap > > cpufreq_governor: performance > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > the same patch/commit), kindly add following tags > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > | Closes: https://lore.kernel.org/oe-lkp/202512121138.986f6a6b-lkp@intel.com > > > > [...] > > > Could you please test below patch and confirm if it solves regression: we directly apply the patch upon 9c47753167, so our test branch looks like below * f7991e8a0136cb <---- below patch from you * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area() but found it has little performance impacts ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp3/bigheap/stress-ng/60s 86e968d8ca6dc823 9c47753167a6a585d0305663c69 f7991e8a0136cb0fdf35f11e28a ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 48320196 -10.9% 43072080 -10.8% 43116499 stress-ng.bigheap.ops 785159 -9.8% 708390 -9.7% 708644 stress-ng.bigheap.ops_per_sec 879805 -21.3% 692805 -20.7% 697312 stress-ng.bigheap.realloc_calls_per_sec the full comparison is as below [1] if this patch depends on other patches, i.e. need another base to apply, please let us know. thanks > > <snip> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ecbac900c35f..118de1a8348c 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3746,6 +3746,15 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > return nr_allocated; > } > > +static void > +__vm_area_cleanup(struct vm_struct *area) > +{ > + if (area->pages) > + vfree(area->addr); > + else > + free_vm_area(area); > +} > + > static LLIST_HEAD(pending_vm_area_cleanup); > static void cleanup_vm_area_work(struct work_struct *work) > { > @@ -3756,12 +3765,8 @@ static void cleanup_vm_area_work(struct work_struct *work) > if (!head) > return; > > - llist_for_each_entry_safe(area, tmp, head, llnode) { > - if (!area->pages) > - free_vm_area(area); > - else > - vfree(area->addr); > - } > + llist_for_each_entry_safe(area, tmp, head, llnode) > + __vm_area_cleanup(area); > } > > /* > @@ -3769,8 +3774,11 @@ static void cleanup_vm_area_work(struct work_struct *work) > * of partially initialized vm_struct in error paths. > */ > static DECLARE_WORK(cleanup_vm_area, cleanup_vm_area_work); > -static void defer_vm_area_cleanup(struct vm_struct *area) > +static void vm_area_cleanup(struct vm_struct *area, bool can_block) > { > + if (can_block) > + return __vm_area_cleanup(area); > + > if (llist_add(&area->llnode, &pending_vm_area_cleanup)) > schedule_work(&cleanup_vm_area); > } > @@ -3915,7 +3923,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > return area->addr; > > fail: > - defer_vm_area_cleanup(area); > + vm_area_cleanup(area, gfpflags_allow_blocking(gfp_mask)); > return NULL; > } > <snip> > > > -- > Uladzislau Rezki [1] ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp3/bigheap/stress-ng/60s 86e968d8ca6dc823 9c47753167a6a585d0305663c69 f7991e8a0136cb0fdf35f11e28a ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 33.90 -3.6% 32.67 -3.4% 32.73 turbostat.RAMWatt 3165 -2.0% 3101 +4.2% 3297 vmstat.system.cs 336633 ± 17% -7.8% 310279 ± 29% -59.1% 137568 ± 48% sched_debug.cpu.avg_idle.min 1278595 ± 7% -10.4% 1145748 ± 2% -5.5% 1208901 ± 8% sched_debug.cpu.max_idle_balance_cost.max 3.885e+08 -10.9% 3.463e+08 -10.8% 3.465e+08 numa-numastat.node0.local_node 3.886e+08 -10.8% 3.466e+08 -10.8% 3.467e+08 numa-numastat.node0.numa_hit 3.881e+08 -10.9% 3.46e+08 -10.7% 3.468e+08 numa-numastat.node1.local_node 3.883e+08 -10.9% 3.461e+08 -10.7% 3.469e+08 numa-numastat.node1.numa_hit 72314 ± 23% +15.6% 83599 ± 34% +40.1% 101308 ± 14% numa-meminfo.node0.KReclaimable 209109 ± 5% -14.1% 179718 ± 6% -11.8% 184353 ± 5% numa-meminfo.node0.PageTables 72314 ± 23% +15.6% 83599 ± 34% +40.1% 101308 ± 14% numa-meminfo.node0.SReclaimable 100786 ± 14% -9.0% 91704 ± 31% -26.8% 73823 ± 20% numa-meminfo.node1.KReclaimable 100786 ± 14% -9.0% 91704 ± 31% -26.8% 73823 ± 20% numa-meminfo.node1.SReclaimable 18075 ± 23% +15.6% 20900 ± 34% +40.1% 25327 ± 14% numa-vmstat.node0.nr_slab_reclaimable 3.886e+08 -10.8% 3.466e+08 -10.8% 3.467e+08 numa-vmstat.node0.numa_hit 3.885e+08 -10.9% 3.463e+08 -10.8% 3.465e+08 numa-vmstat.node0.numa_local 25179 ± 14% -9.0% 22913 ± 31% -26.7% 18451 ± 20% numa-vmstat.node1.nr_slab_reclaimable 3.883e+08 -10.9% 3.461e+08 -10.7% 3.469e+08 numa-vmstat.node1.numa_hit 3.881e+08 -10.9% 3.46e+08 -10.7% 3.468e+08 numa-vmstat.node1.numa_local 48320196 -10.9% 43072080 -10.8% 43116499 stress-ng.bigheap.ops 785159 -9.8% 708390 -9.7% 708644 stress-ng.bigheap.ops_per_sec 879805 -21.3% 692805 -20.7% 697312 stress-ng.bigheap.realloc_calls_per_sec 72414 -3.3% 70043 -2.7% 70486 ± 2% stress-ng.time.involuntary_context_switches 7.735e+08 -10.9% 6.895e+08 -10.8% 6.902e+08 stress-ng.time.minor_page_faults 15385 -1.0% 15224 -1.0% 15233 stress-ng.time.system_time 236.00 -10.5% 211.19 ± 2% -10.5% 211.25 ± 2% stress-ng.time.user_time 61.74 -1.0% 61.14 -0.9% 61.16 time.elapsed_time 61.74 -1.0% 61.14 -0.9% 61.16 time.elapsed_time.max 72414 -3.3% 70043 -2.7% 70486 ± 2% time.involuntary_context_switches 7.735e+08 -10.9% 6.895e+08 -10.8% 6.902e+08 time.minor_page_faults 15385 -1.0% 15224 -1.0% 15233 time.system_time 236.00 -10.5% 211.19 ± 2% -10.5% 211.25 ± 2% time.user_time 2033 ± 3% -18.3% 1662 ± 5% -16.1% 1705 ± 8% time.voluntary_context_switches 0.32 ± 4% +95.1% 0.63 ± 12% +111.1% 0.68 ± 31% perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown] 16.96 ± 41% +5031.1% 870.26 ± 40% +4560.8% 790.50 ± 51% perf-sched.sch_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown] 0.32 ± 4% +95.1% 0.63 ± 12% +111.1% 0.68 ± 31% perf-sched.total_sch_delay.average.ms 16.96 ± 41% +5031.1% 870.26 ± 40% +4560.8% 790.50 ± 51% perf-sched.total_sch_delay.max.ms 4750 ± 4% -12.2% 4169 ± 4% -10.2% 4267 ± 8% perf-sched.total_wait_and_delay.max.ms 4750 ± 4% -12.2% 4169 ± 4% -10.2% 4266 ± 8% perf-sched.total_wait_time.max.ms 4750 ± 4% -12.2% 4169 ± 4% -10.2% 4267 ± 8% perf-sched.wait_and_delay.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown] 4750 ± 4% -12.2% 4169 ± 4% -10.2% 4266 ± 8% perf-sched.wait_time.max.ms.[unknown].[unknown].[unknown].[unknown].[unknown] 29568942 -2.9% 28712561 -3.3% 28599966 ± 2% proc-vmstat.nr_active_anon 28797015 -2.8% 27991137 -3.2% 27872472 ± 2% proc-vmstat.nr_anon_pages 99294 -3.7% 95669 -3.5% 95835 proc-vmstat.nr_page_table_pages 29568950 -2.9% 28712562 -3.3% 28599954 ± 2% proc-vmstat.nr_zone_active_anon 7.77e+08 -10.9% 6.927e+08 -10.7% 6.936e+08 proc-vmstat.numa_hit 7.766e+08 -10.9% 6.923e+08 -10.7% 6.933e+08 proc-vmstat.numa_local 7.785e+08 -10.8% 6.941e+08 -10.8% 6.948e+08 proc-vmstat.pgalloc_normal 7.739e+08 -10.8% 6.899e+08 -10.8% 6.906e+08 proc-vmstat.pgfault 7.756e+08 -10.6% 6.931e+08 -10.6% 6.931e+08 proc-vmstat.pgfree 7.68 -3.8% 7.39 -3.1% 7.44 perf-stat.i.MPKI 2.811e+10 -4.9% 2.672e+10 -4.7% 2.678e+10 perf-stat.i.branch-instructions 0.06 -0.0 0.05 -0.0 0.05 perf-stat.i.branch-miss-rate% 15424402 -14.3% 13220241 -14.9% 13127907 ± 2% perf-stat.i.branch-misses 80.75 -2.3 78.42 -1.9 78.86 perf-stat.i.cache-miss-rate% 1.037e+09 -11.0% 9.233e+08 -10.3% 9.306e+08 perf-stat.i.cache-misses 1.217e+09 -10.6% 1.088e+09 -10.1% 1.094e+09 perf-stat.i.cache-references 2817 ± 2% -2.8% 2739 +3.4% 2914 perf-stat.i.context-switches 7.16 +5.1% 7.53 +5.0% 7.52 perf-stat.i.cpi 1846 ± 5% +30.6% 2410 ± 5% +30.1% 2402 ± 4% perf-stat.i.cycles-between-cache-misses 1.298e+11 -5.9% 1.222e+11 -5.6% 1.225e+11 perf-stat.i.instructions 0.14 -5.2% 0.13 -5.1% 0.13 perf-stat.i.ipc 103.98 -9.7% 93.94 -9.1% 94.48 perf-stat.i.metric.K/sec 13534286 -11.0% 12040965 -10.3% 12135552 perf-stat.i.minor-faults 13534286 -11.0% 12040965 -10.3% 12135553 perf-stat.i.page-faults 7.64 -5.3% 7.23 -5.1% 7.25 perf-stat.overall.MPKI 0.05 -0.0 0.05 -0.0 0.05 perf-stat.overall.branch-miss-rate% 7.20 +5.3% 7.58 +5.3% 7.58 perf-stat.overall.cpi 942.28 +11.2% 1047 +10.9% 1044 perf-stat.overall.cycles-between-cache-misses 0.14 -5.0% 0.13 -5.0% 0.13 perf-stat.overall.ipc 2.678e+10 -4.1% 2.569e+10 -4.1% 2.569e+10 perf-stat.ps.branch-instructions 14559650 -13.3% 12627015 -13.9% 12531491 perf-stat.ps.branch-misses 9.434e+08 -10.0% 8.491e+08 -9.8% 8.514e+08 perf-stat.ps.cache-misses 1.112e+09 -9.5% 1.006e+09 -9.4% 1.007e+09 perf-stat.ps.cache-references 2663 -1.3% 2629 +4.1% 2772 perf-stat.ps.context-switches 1.235e+11 -4.9% 1.174e+11 -4.9% 1.174e+11 perf-stat.ps.instructions 12270397 -10.0% 11048367 -9.8% 11072654 perf-stat.ps.minor-faults 12270398 -10.0% 11048367 -9.8% 11072655 perf-stat.ps.page-faults 7.755e+12 -5.9% 7.3e+12 -6.0% 7.287e+12 perf-stat.total.instructions 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.calltrace.cycles-pp.__munmap 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.calltrace.cycles-pp.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap 41.78 ± 2% -5.1 36.70 -5.4 36.41 ± 3% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap 41.78 ± 2% -5.1 36.70 -5.4 36.41 ± 3% perf-profile.calltrace.cycles-pp.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap 41.78 ± 2% -5.1 36.70 -5.4 36.41 ± 3% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.vms_clear_ptes.vms_complete_munmap_vmas 41.78 ± 2% -5.1 36.70 -5.4 36.41 ± 3% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.vms_clear_ptes 41.51 ± 2% -5.1 36.45 -5.3 36.17 ± 3% perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range 41.51 ± 2% -5.1 36.45 -5.3 36.18 ± 3% perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range 41.51 ± 2% -5.1 36.45 -5.3 36.18 ± 3% perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas 41.65 -5.1 36.60 -5.7 35.97 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache 41.63 -5.1 36.58 -5.7 35.95 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs 41.65 -5.1 36.60 -5.7 35.97 ± 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages 41.46 ± 2% -5.0 36.41 -5.3 36.13 ± 3% perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu.zap_pte_range 40.84 ± 2% -4.9 35.90 -5.2 35.62 ± 3% perf-profile.calltrace.cycles-pp.__page_cache_release.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_flush_mmu 3.89 ± 4% -2.4 1.53 ± 8% -2.4 1.44 ± 7% perf-profile.calltrace.cycles-pp.si_meminfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.84 ± 4% -2.4 1.49 ± 8% -2.4 1.40 ± 8% perf-profile.calltrace.cycles-pp.nr_blockdev_pages.si_meminfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64 3.82 ± 4% -2.3 1.47 ± 9% -2.4 1.39 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.nr_blockdev_pages.si_meminfo.do_sysinfo.__do_sys_sysinfo 3.74 ± 4% -2.3 1.43 ± 9% -2.4 1.34 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.nr_blockdev_pages.si_meminfo.do_sysinfo 3.10 ± 2% -0.6 2.45 ± 2% -0.6 2.49 ± 2% perf-profile.calltrace.cycles-pp.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 1.90 -0.4 1.52 -0.4 1.54 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault 1.84 -0.4 1.48 -0.4 1.49 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page.__handle_mm_fault 1.80 -0.4 1.44 -0.4 1.45 perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio.do_anonymous_page 1.70 -0.4 1.36 -0.3 1.37 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.alloc_anon_folio 1.43 ± 6% -0.3 1.12 ± 2% -0.3 1.12 ± 3% perf-profile.calltrace.cycles-pp.__pte_offset_map_lock.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 1.26 ± 4% -0.3 0.98 ± 2% -0.3 0.98 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.__pte_offset_map_lock.do_anonymous_page.__handle_mm_fault.handle_mm_fault 1.21 -0.3 0.95 -0.3 0.96 perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof 1.16 ± 8% -0.3 0.90 ± 5% -0.2 0.91 ± 6% perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.alloc_anon_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault 1.17 -0.3 0.92 -0.2 0.93 perf-profile.calltrace.cycles-pp.clear_page_erms.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol 44.15 ± 2% +7.5 51.61 ± 2% +8.4 52.53 perf-profile.calltrace.cycles-pp.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo 44.32 ± 2% +7.5 51.79 ± 2% +8.4 52.72 perf-profile.calltrace.cycles-pp.sysinfo 44.30 ± 2% +7.5 51.77 ± 2% +8.4 52.69 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo 44.30 ± 2% +7.5 51.77 ± 2% +8.4 52.70 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sysinfo 44.28 ± 2% +7.5 51.75 ± 2% +8.4 52.68 perf-profile.calltrace.cycles-pp.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe.sysinfo 40.25 ± 2% +9.8 50.06 ± 2% +10.8 51.07 perf-profile.calltrace.cycles-pp.si_swapinfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64.entry_SYSCALL_64_after_hwframe 40.24 ± 2% +9.8 50.06 ± 2% +10.8 51.06 perf-profile.calltrace.cycles-pp._raw_spin_lock.si_swapinfo.do_sysinfo.__do_sys_sysinfo.do_syscall_64 40.08 ± 2% +9.8 49.92 ± 2% +10.8 50.92 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.si_swapinfo.do_sysinfo.__do_sys_sysinfo 44.76 ± 2% -6.0 38.80 ± 4% -6.9 37.87 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 44.44 ± 2% -5.9 38.56 ± 4% -6.8 37.63 ± 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.children.cycles-pp.__vm_munmap 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.children.cycles-pp.__x64_sys_munmap 42.88 -5.2 37.65 -5.8 37.06 ± 2% perf-profile.children.cycles-pp.do_vmi_align_munmap 42.88 -5.2 37.65 -5.8 37.06 ± 2% perf-profile.children.cycles-pp.vms_clear_ptes 42.88 -5.2 37.65 -5.8 37.06 ± 2% perf-profile.children.cycles-pp.vms_complete_munmap_vmas 42.85 -5.2 37.62 -5.8 37.03 ± 2% perf-profile.children.cycles-pp.__munmap 42.86 -5.2 37.64 -5.8 37.05 ± 2% perf-profile.children.cycles-pp.do_vmi_munmap 42.62 -5.2 37.40 -5.8 36.82 ± 2% perf-profile.children.cycles-pp.folios_put_refs 42.60 -5.2 37.40 -5.8 36.81 ± 2% perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages 42.60 -5.2 37.40 -5.8 36.81 ± 2% perf-profile.children.cycles-pp.free_pages_and_swap_cache 41.93 -5.1 36.84 -5.7 36.25 ± 2% perf-profile.children.cycles-pp.__page_cache_release 41.80 ± 2% -5.1 36.72 -5.4 36.43 ± 3% perf-profile.children.cycles-pp.unmap_page_range 41.80 ± 2% -5.1 36.72 -5.4 36.43 ± 3% perf-profile.children.cycles-pp.unmap_vmas 41.80 ± 2% -5.1 36.72 -5.4 36.43 ± 3% perf-profile.children.cycles-pp.zap_pmd_range 41.80 ± 2% -5.1 36.72 -5.4 36.43 ± 3% perf-profile.children.cycles-pp.zap_pte_range 41.51 ± 2% -5.1 36.45 -5.3 36.18 ± 3% perf-profile.children.cycles-pp.tlb_flush_mmu 3.89 ± 4% -2.4 1.53 ± 8% -2.4 1.44 ± 7% perf-profile.children.cycles-pp.si_meminfo 3.84 ± 4% -2.4 1.49 ± 8% -2.4 1.40 ± 8% perf-profile.children.cycles-pp.nr_blockdev_pages 3.11 ± 2% -0.6 2.46 ± 2% -0.6 2.50 ± 2% perf-profile.children.cycles-pp.alloc_anon_folio 1.90 -0.4 1.52 -0.4 1.54 perf-profile.children.cycles-pp.vma_alloc_folio_noprof 1.89 -0.4 1.52 -0.4 1.54 perf-profile.children.cycles-pp.alloc_pages_mpol 1.84 -0.4 1.48 -0.3 1.50 perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof 1.73 -0.3 1.39 -0.3 1.40 perf-profile.children.cycles-pp.get_page_from_freelist 0.56 ± 72% -0.3 0.22 ±108% -0.3 0.28 ± 99% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm 1.45 ± 6% -0.3 1.14 ± 3% -0.3 1.14 ± 3% perf-profile.children.cycles-pp.__pte_offset_map_lock 1.22 -0.3 0.96 -0.3 0.97 perf-profile.children.cycles-pp.prep_new_page 1.16 ± 7% -0.3 0.90 ± 5% -0.2 0.92 ± 6% perf-profile.children.cycles-pp.__mem_cgroup_charge 1.19 -0.3 0.93 -0.2 0.94 perf-profile.children.cycles-pp.clear_page_erms 0.26 ± 8% -0.1 0.16 ± 3% -0.1 0.16 ± 5% perf-profile.children.cycles-pp.handle_internal_command 0.26 ± 8% -0.1 0.16 ± 3% -0.1 0.16 ± 5% perf-profile.children.cycles-pp.main 0.26 ± 8% -0.1 0.16 ± 3% -0.1 0.16 ± 5% perf-profile.children.cycles-pp.run_builtin 0.44 ± 10% -0.1 0.35 ± 6% -0.1 0.35 ± 6% perf-profile.children.cycles-pp.free_unref_folios 0.25 ± 9% -0.1 0.16 ± 3% -0.1 0.16 ± 6% perf-profile.children.cycles-pp.record__mmap_read_evlist 0.40 ± 11% -0.1 0.31 ± 6% -0.1 0.32 ± 6% perf-profile.children.cycles-pp.free_frozen_page_commit 0.24 ± 8% -0.1 0.16 ± 4% -0.1 0.16 ± 5% perf-profile.children.cycles-pp.perf_mmap__push 0.38 ± 13% -0.1 0.30 ± 7% -0.1 0.30 ± 7% perf-profile.children.cycles-pp.free_pcppages_bulk 0.55 -0.1 0.48 -0.1 0.48 perf-profile.children.cycles-pp.sync_regs 0.48 ± 4% -0.1 0.42 ± 2% -0.1 0.41 ± 4% perf-profile.children.cycles-pp.native_irq_return_iret 0.37 ± 4% -0.1 0.31 ± 3% -0.0 0.32 ± 5% perf-profile.children.cycles-pp.rmqueue 0.35 ± 4% -0.1 0.30 ± 3% -0.0 0.30 ± 5% perf-profile.children.cycles-pp.rmqueue_pcplist 0.19 ± 6% -0.0 0.14 ± 3% -0.1 0.13 ± 4% perf-profile.children.cycles-pp.record__pushfn 0.18 ± 7% -0.0 0.13 ± 2% -0.0 0.13 ± 5% perf-profile.children.cycles-pp.ksys_write 0.17 ± 5% -0.0 0.13 ± 3% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.vfs_write 0.31 -0.0 0.27 -0.0 0.27 perf-profile.children.cycles-pp.lru_add 0.28 ± 5% -0.0 0.24 ± 3% -0.0 0.25 ± 6% perf-profile.children.cycles-pp.__rmqueue_pcplist 0.16 ± 5% -0.0 0.12 ± 3% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.shmem_file_write_iter 0.24 ± 6% -0.0 0.20 ± 5% -0.0 0.21 ± 7% perf-profile.children.cycles-pp.rmqueue_bulk 0.16 ± 4% -0.0 0.12 ± 3% -0.0 0.12 ± 5% perf-profile.children.cycles-pp.generic_perform_write 0.24 ± 2% -0.0 0.20 -0.0 0.20 ± 2% perf-profile.children.cycles-pp.lru_gen_add_folio 0.21 -0.0 0.18 -0.0 0.18 ± 3% perf-profile.children.cycles-pp.lru_gen_del_folio 0.25 ± 2% -0.0 0.22 -0.0 0.22 ± 3% perf-profile.children.cycles-pp.zap_present_ptes 0.14 ± 2% -0.0 0.12 ± 3% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.lock_vma_under_rcu 0.14 ± 3% -0.0 0.12 ± 4% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.__mod_node_page_state 0.13 -0.0 0.12 ± 4% -0.0 0.12 perf-profile.children.cycles-pp.__perf_sw_event 0.06 ± 7% -0.0 0.05 -0.0 0.05 perf-profile.children.cycles-pp.___pte_offset_map 0.09 ± 5% -0.0 0.08 -0.0 0.08 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_folios 0.08 ± 6% -0.0 0.06 ± 6% -0.0 0.06 perf-profile.children.cycles-pp.vma_merge_extend 0.11 ± 3% -0.0 0.10 -0.0 0.10 ± 4% perf-profile.children.cycles-pp.__free_one_page 0.07 -0.0 0.06 -0.0 0.06 perf-profile.children.cycles-pp.error_entry 0.06 -0.0 0.05 -0.0 0.05 ± 6% perf-profile.children.cycles-pp.__mod_zone_page_state 0.11 -0.0 0.10 -0.0 0.10 perf-profile.children.cycles-pp.___perf_sw_event 0.10 ± 4% +0.0 0.11 ± 4% +0.0 0.11 ± 5% perf-profile.children.cycles-pp.sched_tick 0.21 ± 3% +0.0 0.24 ± 5% +0.0 0.23 ± 6% perf-profile.children.cycles-pp.update_process_times 0.22 ± 3% +0.0 0.26 ± 7% +0.0 0.24 ± 7% perf-profile.children.cycles-pp.tick_nohz_handler 0.30 ± 4% +0.0 0.34 ± 6% +0.0 0.33 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.29 ± 4% +0.0 0.33 ± 6% +0.0 0.32 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt 0.39 ± 2% +0.0 0.43 ± 2% +0.0 0.43 perf-profile.children.cycles-pp.mremap 0.31 ± 4% +0.0 0.36 ± 5% +0.0 0.34 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.34 ± 3% +0.0 0.39 ± 5% +0.0 0.38 ± 3% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.28 ± 3% +0.1 0.34 ± 2% +0.1 0.34 perf-profile.children.cycles-pp.__do_sys_mremap 0.28 ± 2% +0.1 0.34 ± 3% +0.1 0.34 perf-profile.children.cycles-pp.do_mremap 0.11 ± 4% +0.1 0.17 ± 2% +0.1 0.16 ± 3% perf-profile.children.cycles-pp.expand_vma 0.00 +0.1 0.08 +0.1 0.08 perf-profile.children.cycles-pp.__vm_enough_memory 0.00 +0.1 0.09 ± 5% +0.1 0.09 ± 3% perf-profile.children.cycles-pp.vrm_calc_charge 0.04 ±141% +0.1 0.13 ± 16% +0.1 0.11 ± 30% perf-profile.children.cycles-pp.add_callchain_ip 0.04 ±142% +0.1 0.14 ± 17% +0.1 0.11 ± 29% perf-profile.children.cycles-pp.thread__resolve_callchain_sample 0.04 ±142% +0.1 0.17 ± 15% +0.1 0.15 ± 30% perf-profile.children.cycles-pp.__thread__resolve_callchain 0.04 ±142% +0.1 0.18 ± 15% +0.1 0.15 ± 29% perf-profile.children.cycles-pp.sample__for_each_callchain_node 0.05 ±141% +0.1 0.18 ± 14% +0.1 0.15 ± 30% perf-profile.children.cycles-pp.build_id__mark_dso_hit 0.05 ±141% +0.1 0.19 ± 14% +0.1 0.16 ± 28% perf-profile.children.cycles-pp.perf_session__deliver_event 0.05 ±141% +0.1 0.20 ± 14% +0.1 0.17 ± 28% perf-profile.children.cycles-pp.__ordered_events__flush 0.05 ±141% +0.1 0.20 ± 33% +0.1 0.18 ± 35% perf-profile.children.cycles-pp.perf_session__process_events 0.05 ±141% +0.1 0.20 ± 33% +0.1 0.18 ± 35% perf-profile.children.cycles-pp.record__finish_output 88.59 +1.5 90.13 +1.5 90.13 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 45.34 ± 2% +7.2 52.54 ± 2% +8.1 53.46 perf-profile.children.cycles-pp._raw_spin_lock 44.15 ± 2% +7.5 51.61 ± 2% +8.4 52.53 perf-profile.children.cycles-pp.do_sysinfo 44.33 ± 2% +7.5 51.80 ± 2% +8.4 52.72 perf-profile.children.cycles-pp.sysinfo 44.28 ± 2% +7.5 51.75 ± 2% +8.4 52.68 perf-profile.children.cycles-pp.__do_sys_sysinfo 40.25 ± 2% +9.8 50.07 ± 2% +10.8 51.07 perf-profile.children.cycles-pp.si_swapinfo 0.55 ± 74% -0.3 0.22 ±107% -0.3 0.28 ± 99% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm 1.50 ± 4% -0.3 1.17 -0.3 1.17 perf-profile.self.cycles-pp._raw_spin_lock 1.18 -0.3 0.92 -0.2 0.93 perf-profile.self.cycles-pp.clear_page_erms 2.01 -0.2 1.86 ± 3% -0.1 1.88 ± 2% perf-profile.self.cycles-pp.stress_bigheap_child 0.55 -0.1 0.48 -0.1 0.48 perf-profile.self.cycles-pp.sync_regs 0.48 ± 4% -0.1 0.42 ± 2% -0.1 0.41 ± 4% perf-profile.self.cycles-pp.native_irq_return_iret 0.14 ± 3% -0.0 0.12 ± 4% -0.0 0.11 ± 3% perf-profile.self.cycles-pp.get_page_from_freelist 0.14 ± 8% -0.0 0.12 ± 3% -0.0 0.13 ± 3% perf-profile.self.cycles-pp.do_anonymous_page 0.14 ± 2% -0.0 0.12 ± 3% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.rmqueue_bulk 0.14 -0.0 0.12 -0.0 0.12 ± 2% perf-profile.self.cycles-pp.lru_gen_del_folio 0.11 ± 3% -0.0 0.09 ± 4% -0.0 0.10 ± 7% perf-profile.self.cycles-pp.__handle_mm_fault 0.15 ± 2% -0.0 0.13 -0.0 0.13 ± 2% perf-profile.self.cycles-pp.lru_gen_add_folio 0.12 ± 3% -0.0 0.10 ± 3% -0.0 0.10 perf-profile.self.cycles-pp.zap_present_ptes 0.12 ± 4% -0.0 0.11 -0.0 0.11 ± 3% perf-profile.self.cycles-pp.__mod_node_page_state 0.07 ± 6% -0.0 0.06 -0.0 0.06 ± 6% perf-profile.self.cycles-pp.lock_vma_under_rcu 0.10 -0.0 0.09 ± 4% -0.0 0.09 ± 4% perf-profile.self.cycles-pp.__free_one_page 0.11 ± 3% -0.0 0.10 -0.0 0.10 ± 3% perf-profile.self.cycles-pp.folios_put_refs 0.07 -0.0 0.06 -0.0 0.06 perf-profile.self.cycles-pp.___perf_sw_event 0.07 -0.0 0.06 -0.0 0.06 perf-profile.self.cycles-pp.do_user_addr_fault 0.07 -0.0 0.06 -0.0 0.06 perf-profile.self.cycles-pp.lru_add 0.07 -0.0 0.06 -0.0 0.06 ± 5% perf-profile.self.cycles-pp.mas_walk 0.08 -0.0 0.07 -0.0 0.07 perf-profile.self.cycles-pp.__alloc_frozen_pages_noprof 0.06 -0.0 0.05 -0.0 0.05 perf-profile.self.cycles-pp.handle_mm_fault 0.06 -0.0 0.05 -0.0 0.05 perf-profile.self.cycles-pp.page_counter_uncharge 0.13 ± 3% +0.0 0.14 ± 3% +0.0 0.14 ± 2% perf-profile.self.cycles-pp._copy_to_user 0.00 +0.1 0.08 +0.1 0.08 perf-profile.self.cycles-pp.__vm_enough_memory 88.36 +1.5 89.85 +1.5 89.86 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linus:master] [mm/vmalloc] 9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression 2025-12-17 5:27 ` Oliver Sang @ 2025-12-17 11:04 ` Uladzislau Rezki 2025-12-17 11:52 ` Mateusz Guzik 2025-12-18 4:37 ` Oliver Sang 0 siblings, 2 replies; 7+ messages in thread From: Uladzislau Rezki @ 2025-12-17 11:04 UTC (permalink / raw) To: Oliver Sang Cc: Uladzislau Rezki, oe-lkp, lkp, linux-kernel, Andrew Morton, Michal Hocko, Baoquan He, Alexander Potapenko, Andrey Ryabinin, Marco Elver, Michal Hocko, linux-mm Hello, Oliver. > > > > > > Hello, > > > > > > kernel test robot noticed a 21.3% regression of stress-ng.bigheap.realloc_calls_per_sec on: > > > > > > > > > commit: 9c47753167a6a585d0305663c6912f042e131c2d ("mm/vmalloc: defer freeing partly initialized vm_struct") > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91] > > > [still regression on linux-next/master 008d3547aae5bc86fac3eda317489169c3fda112] > > > > > > testcase: stress-ng > > > config: x86_64-rhel-9.4 > > > compiler: gcc-14 > > > test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory > > > parameters: > > > > > > nr_threads: 100% > > > testtime: 60s > > > test: bigheap > > > cpufreq_governor: performance > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > the same patch/commit), kindly add following tags > > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > > | Closes: https://lore.kernel.org/oe-lkp/202512121138.986f6a6b-lkp@intel.com > > > > > > > > [...] > > > > > > Could you please test below patch and confirm if it solves regression: > > we directly apply the patch upon 9c47753167, so our test branch looks like below > > * f7991e8a0136cb <---- below patch from you > * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct > * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area() > > but found it has little performance impacts > > ========================================================================================= > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp3/bigheap/stress-ng/60s > > 86e968d8ca6dc823 9c47753167a6a585d0305663c69 f7991e8a0136cb0fdf35f11e28a > ---------------- --------------------------- --------------------------- > %stddev %change %stddev %change %stddev > \ | \ | \ > 48320196 -10.9% 43072080 -10.8% 43116499 stress-ng.bigheap.ops > 785159 -9.8% 708390 -9.7% 708644 stress-ng.bigheap.ops_per_sec > 879805 -21.3% 692805 -20.7% 697312 stress-ng.bigheap.realloc_calls_per_sec > Thank you for testing. I had same expectations. No difference. Honestly i can not figure out how: * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area() can effect performance. I am not doing anything related to performance. I would like to ask you if you could test one more thing. I see that [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91] contains also below patch: <snip> commit a0615780439938e8e61343f1f92a4c54a71dc6a5 mm/vmalloc: request large order pages from buddy allocator <snip> where we try to use larger order for vmalloc. Could you please revert it and rerun same tests? Thank you in advance! -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linus:master] [mm/vmalloc] 9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression 2025-12-17 11:04 ` Uladzislau Rezki @ 2025-12-17 11:52 ` Mateusz Guzik 2025-12-18 4:37 ` Oliver Sang 1 sibling, 0 replies; 7+ messages in thread From: Mateusz Guzik @ 2025-12-17 11:52 UTC (permalink / raw) To: Uladzislau Rezki Cc: Oliver Sang, oe-lkp, lkp, linux-kernel, Andrew Morton, Michal Hocko, Baoquan He, Alexander Potapenko, Andrey Ryabinin, Marco Elver, Michal Hocko, linux-mm On Wed, Dec 17, 2025 at 12:04:20PM +0100, Uladzislau Rezki wrote: > Hello, Oliver. > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed a 21.3% regression of stress-ng.bigheap.realloc_calls_per_sec on: > > > > > > > > > > > > commit: 9c47753167a6a585d0305663c6912f042e131c2d ("mm/vmalloc: defer freeing partly initialized vm_struct") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91] > > > > [still regression on linux-next/master 008d3547aae5bc86fac3eda317489169c3fda112] > > > > > > > > testcase: stress-ng > > > > config: x86_64-rhel-9.4 > > > > compiler: gcc-14 > > > > test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory > > > > parameters: > > > > > > > > nr_threads: 100% > > > > testtime: 60s > > > > test: bigheap > > > > cpufreq_governor: performance > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > the same patch/commit), kindly add following tags > > > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > > > | Closes: https://lore.kernel.org/oe-lkp/202512121138.986f6a6b-lkp@intel.com > > > > > > > > > > > > [...] > > > > > > > > > Could you please test below patch and confirm if it solves regression: > > > > we directly apply the patch upon 9c47753167, so our test branch looks like below > > > > * f7991e8a0136cb <---- below patch from you > > * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct > > * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area() > > > > but found it has little performance impacts > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > > gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp3/bigheap/stress-ng/60s > > > > 86e968d8ca6dc823 9c47753167a6a585d0305663c69 f7991e8a0136cb0fdf35f11e28a > > ---------------- --------------------------- --------------------------- > > %stddev %change %stddev %change %stddev > > \ | \ | \ > > 48320196 -10.9% 43072080 -10.8% 43116499 stress-ng.bigheap.ops > > 785159 -9.8% 708390 -9.7% 708644 stress-ng.bigheap.ops_per_sec > > 879805 -21.3% 692805 -20.7% 697312 stress-ng.bigheap.realloc_calls_per_sec > > > Thank you for testing. I had same expectations. No difference. > Honestly i can not figure out how: > > * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct > * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area() > > can effect performance. I am not doing anything related to performance. > I would like to ask you if you could test one more thing. I see that > > [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91] > > contains also below patch: > > <snip> > commit a0615780439938e8e61343f1f92a4c54a71dc6a5 > mm/vmalloc: request large order pages from buddy allocator > <snip> > > where we try to use larger order for vmalloc. Could you please revert > it and rerun same tests? > This being stress-ng it is not doing what you think it is doing. Profile shows increased contention on swapinfo spinlock: %stddev %change %stddev \ | \ 40.08 ± 2% +9.8 49.92 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.si_swapinfo.do_sysinfo.__do_sys_sysinfo The spinlock and the data it operates on are not annotated. The commit deferring freeing adds 2 global vars which most likely shifted things around to add cacheline bouncing. That's a 3rd case in last few weeks that I know of. I asked gcc people to do osmething about it, so far no takers: https://gcc.gnu.org/pipermail/gcc/2024-October/245004.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linus:master] [mm/vmalloc] 9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression 2025-12-17 11:04 ` Uladzislau Rezki 2025-12-17 11:52 ` Mateusz Guzik @ 2025-12-18 4:37 ` Oliver Sang 2025-12-18 17:37 ` Uladzislau Rezki 1 sibling, 1 reply; 7+ messages in thread From: Oliver Sang @ 2025-12-18 4:37 UTC (permalink / raw) To: Uladzislau Rezki Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Michal Hocko, Baoquan He, Alexander Potapenko, Andrey Ryabinin, Marco Elver, Michal Hocko, linux-mm, oliver.sang hi, Uladzislau Rezki, On Wed, Dec 17, 2025 at 12:04:20PM +0100, Uladzislau Rezki wrote: > Hello, Oliver. > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed a 21.3% regression of stress-ng.bigheap.realloc_calls_per_sec on: > > > > > > > > > > > > commit: 9c47753167a6a585d0305663c6912f042e131c2d ("mm/vmalloc: defer freeing partly initialized vm_struct") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91] > > > > [still regression on linux-next/master 008d3547aae5bc86fac3eda317489169c3fda112] > > > > > > > > testcase: stress-ng > > > > config: x86_64-rhel-9.4 > > > > compiler: gcc-14 > > > > test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory > > > > parameters: > > > > > > > > nr_threads: 100% > > > > testtime: 60s > > > > test: bigheap > > > > cpufreq_governor: performance > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > the same patch/commit), kindly add following tags > > > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > > > | Closes: https://lore.kernel.org/oe-lkp/202512121138.986f6a6b-lkp@intel.com > > > > > > > > > > > > [...] > > > > > > > > > Could you please test below patch and confirm if it solves regression: > > > > we directly apply the patch upon 9c47753167, so our test branch looks like below > > > > * f7991e8a0136cb <---- below patch from you > > * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct > > * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area() > > > > but found it has little performance impacts > > > > ========================================================================================= > > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > > gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp3/bigheap/stress-ng/60s > > > > 86e968d8ca6dc823 9c47753167a6a585d0305663c69 f7991e8a0136cb0fdf35f11e28a > > ---------------- --------------------------- --------------------------- > > %stddev %change %stddev %change %stddev > > \ | \ | \ > > 48320196 -10.9% 43072080 -10.8% 43116499 stress-ng.bigheap.ops > > 785159 -9.8% 708390 -9.7% 708644 stress-ng.bigheap.ops_per_sec > > 879805 -21.3% 692805 -20.7% 697312 stress-ng.bigheap.realloc_calls_per_sec > > > Thank you for testing. I had same expectations. No difference. > Honestly i can not figure out how: > > * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct > * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area() > > can effect performance. I am not doing anything related to performance. > I would like to ask you if you could test one more thing. I see that > > [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91] > > contains also below patch: > > <snip> > commit a0615780439938e8e61343f1f92a4c54a71dc6a5 > mm/vmalloc: request large order pages from buddy allocator > <snip> > > where we try to use larger order for vmalloc. Could you please revert > it and rerun same tests? > > Thank you in advance! we've seen comments from Mateusz Guzik in https://lore.kernel.org/all/e4b6sjeh22uqhxhxudsbanlnyo2potwowuy7mkrp6tvxnftjn4@mcjyes2s3eu6/ that this could be related with cacheline bouncing. not sure if you still want us to do the test you mentioned? if so, sorry that I cannot make a clean covert of a061578043993 on c9b47175e9131. could you prepare a patch for us? thanks > > -- > Uladzislau Rezki ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [linus:master] [mm/vmalloc] 9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression 2025-12-18 4:37 ` Oliver Sang @ 2025-12-18 17:37 ` Uladzislau Rezki 0 siblings, 0 replies; 7+ messages in thread From: Uladzislau Rezki @ 2025-12-18 17:37 UTC (permalink / raw) To: Oliver Sang Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Michal Hocko, Baoquan He, Alexander Potapenko, Andrey Ryabinin, Marco Elver, Michal Hocko, linux-mm Hello, Oliver. No, thank you! As i mentioned i do not see how pointed patches can affect performance. -- Uladzislsua Rezki On Thu, Dec 18, 2025 at 5:38 AM Oliver Sang <oliver.sang@intel.com> wrote: > > hi, Uladzislau Rezki, > > On Wed, Dec 17, 2025 at 12:04:20PM +0100, Uladzislau Rezki wrote: > > Hello, Oliver. > > > > > > > > > > > > Hello, > > > > > > > > > > kernel test robot noticed a 21.3% regression of stress-ng.bigheap.realloc_calls_per_sec on: > > > > > > > > > > > > > > > commit: 9c47753167a6a585d0305663c6912f042e131c2d ("mm/vmalloc: defer freeing partly initialized vm_struct") > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > > > [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91] > > > > > [still regression on linux-next/master 008d3547aae5bc86fac3eda317489169c3fda112] > > > > > > > > > > testcase: stress-ng > > > > > config: x86_64-rhel-9.4 > > > > > compiler: gcc-14 > > > > > test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P CPU @ 2.4GHz (Granite Rapids) with 256G memory > > > > > parameters: > > > > > > > > > > nr_threads: 100% > > > > > testtime: 60s > > > > > test: bigheap > > > > > cpufreq_governor: performance > > > > > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > > the same patch/commit), kindly add following tags > > > > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > > > > | Closes: https://lore.kernel.org/oe-lkp/202512121138.986f6a6b-lkp@intel.com > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > Could you please test below patch and confirm if it solves regression: > > > > > > we directly apply the patch upon 9c47753167, so our test branch looks like below > > > > > > * f7991e8a0136cb <---- below patch from you > > > * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct > > > * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area() > > > > > > but found it has little performance impacts > > > > > > ========================================================================================= > > > compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > > > gcc-14/performance/x86_64-rhel-9.4/100%/debian-13-x86_64-20250902.cgz/lkp-gnr-2sp3/bigheap/stress-ng/60s > > > > > > 86e968d8ca6dc823 9c47753167a6a585d0305663c69 f7991e8a0136cb0fdf35f11e28a > > > ---------------- --------------------------- --------------------------- > > > %stddev %change %stddev %change %stddev > > > \ | \ | \ > > > 48320196 -10.9% 43072080 -10.8% 43116499 stress-ng.bigheap.ops > > > 785159 -9.8% 708390 -9.7% 708644 stress-ng.bigheap.ops_per_sec > > > 879805 -21.3% 692805 -20.7% 697312 stress-ng.bigheap.realloc_calls_per_sec > > > > > Thank you for testing. I had same expectations. No difference. > > Honestly i can not figure out how: > > > > * 9c47753167a6a5 mm/vmalloc: defer freeing partly initialized vm_struct > > * 86e968d8ca6dc8 mm/vmalloc: support non-blocking GFP flags in alloc_vmap_area() > > > > can effect performance. I am not doing anything related to performance. > > I would like to ask you if you could test one more thing. I see that > > > > [still regression on linus/master c9b47175e9131118e6f221cc8fb81397d62e7c91] > > > > contains also below patch: > > > > <snip> > > commit a0615780439938e8e61343f1f92a4c54a71dc6a5 > > mm/vmalloc: request large order pages from buddy allocator > > <snip> > > > > where we try to use larger order for vmalloc. Could you please revert > > it and rerun same tests? > > > > Thank you in advance! > > we've seen comments from Mateusz Guzik in > https://lore.kernel.org/all/e4b6sjeh22uqhxhxudsbanlnyo2potwowuy7mkrp6tvxnftjn4@mcjyes2s3eu6/ > that this could be related with cacheline bouncing. > > not sure if you still want us to do the test you mentioned? > > if so, sorry that I cannot make a clean covert of a061578043993 on > c9b47175e9131. could you prepare a patch for us? thanks > > > > > > > -- > > Uladzislau Rezki -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-12-18 17:37 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-12-12 3:27 [linus:master] [mm/vmalloc] 9c47753167: stress-ng.bigheap.realloc_calls_per_sec 21.3% regression kernel test robot 2025-12-15 12:19 ` Uladzislau Rezki 2025-12-17 5:27 ` Oliver Sang 2025-12-17 11:04 ` Uladzislau Rezki 2025-12-17 11:52 ` Mateusz Guzik 2025-12-18 4:37 ` Oliver Sang 2025-12-18 17:37 ` Uladzislau Rezki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox