* [linux-next:master] [mm] cc8cb3697a: stress-ng.pkey.ops_per_sec 4.4% improvement
@ 2024-09-16 8:11 kernel test robot
0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2024-09-16 8:11 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
Liam R. Howlett, Mark Brown, Vlastimil Babka, Bert Karwatzki,
Jeff Xu, Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
Paul E. McKenney, Paul Moore, Sidhartha Kumar,
Suren Baghdasaryan, linux-kernel, ying.huang, feng.tang,
fengwei.yin, oliver.sang
Hello,
kernel test robot noticed a 4.4% improvement of stress-ng.pkey.ops_per_sec on:
commit: cc8cb3697a8d8eabe1fb9acb8768b11c1ab607d8 ("mm: refactor vma_merge() into modify-only vma_merge_existing_range()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: pkey
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240916/202409161559.af0a1b99-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pkey/stress-ng/60s
commit:
65e0aa64df ("mm: introduce commit_merge(), abstracting final commit of merge")
cc8cb3697a ("mm: refactor vma_merge() into modify-only vma_merge_existing_range()")
65e0aa64df916861 cc8cb3697a8d8eabe1fb9acb876
---------------- ---------------------------
%stddev %change %stddev
\ | \
159916 ± 5% +14.9% 183809 ± 10% meminfo.DirectMap4k
15.42 ± 23% +46.5% 22.58 ± 17% sched_debug.cpu.nr_uninterruptible.max
2.158e+08 +4.4% 2.253e+08 stress-ng.pkey.ops
3596484 +4.4% 3755565 stress-ng.pkey.ops_per_sec
196.30 +4.9% 205.86 stress-ng.time.user_time
25782400 +3.4% 26666903 proc-vmstat.numa_hit
25707363 +3.5% 26600006 proc-vmstat.numa_local
44223158 +3.4% 45721027 proc-vmstat.pgalloc_normal
39763569 +3.5% 41151044 proc-vmstat.pgfree
3.568e+10 +1.4% 3.619e+10 perf-stat.i.branch-instructions
87058419 ± 2% +3.1% 89795461 perf-stat.i.branch-misses
1.482e+08 +2.7% 1.521e+08 perf-stat.i.cache-references
1854 -2.2% 1813 perf-stat.i.cycles-between-cache-misses
1.68e+11 +1.1% 1.699e+11 perf-stat.i.instructions
0.64 +1.8% 0.65 perf-stat.overall.MPKI
1812 -2.5% 1766 perf-stat.overall.cycles-between-cache-misses
1.045e+08 +2.6% 1.073e+08 perf-stat.ps.cache-misses
1.446e+08 +2.8% 1.486e+08 perf-stat.ps.cache-references
25.66 ±116% -96.6% 0.86 ±168% perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
9.35 ± 40% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
9.63 ± 36% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
3.87 ± 38% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
10.81 ± 36% -76.2% 2.57 ±142% perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
3.74 ± 55% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
2.32 ± 34% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
1.32 ±104% -80.1% 0.26 ±221% perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
19.81 ±188% -99.3% 0.14 ±142% perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.39 ± 57% -81.6% 0.07 ±153% perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
180.87 ±203% -99.1% 1.55 ±153% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
0.36 ±108% -96.5% 0.01 ±187% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
1.44 ± 19% -85.8% 0.20 ±171% perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
40.94 ±115% -99.8% 0.10 ±143% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
112.73 ±118% -98.9% 1.19 ±142% perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
335.83 ± 29% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
301.19 ± 35% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_merge.constprop.0
22.34 ± 98% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
473.14 ± 21% -76.2% 112.54 ±144% perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
6.84 ± 51% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
7.07 ± 72% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
0.42 ±147% -98.1% 0.01 ±141% perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
2373 ± 40% -78.6% 507.50 ±152% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
1.70 ±111% -96.7% 0.06 ±212% perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
1309 ± 77% -99.6% 5.06 ±165% perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2745 ± 25% -81.5% 507.97 ±152% perf-sched.total_sch_delay.max.ms
10044 ± 4% -74.3% 2576 ±141% perf-sched.total_wait_and_delay.count.ms
6234 ± 21% -77.2% 1421 ±141% perf-sched.total_wait_and_delay.max.ms
18.71 ± 40% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
19.26 ± 36% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
21.62 ± 36% -76.2% 5.15 ±142% perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
885.96 ± 42% -79.1% 185.28 ±142% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
144.50 ± 24% -86.4% 19.67 ±145% perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
131.83 ± 9% -73.7% 34.67 ±141% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
159.83 ± 8% -75.8% 38.67 ±144% perf-sched.wait_and_delay.count.__cond_resched.change_pmd_range.isra.0.change_pud_range
227.83 ± 9% -77.0% 52.33 ±141% perf-sched.wait_and_delay.count.__cond_resched.change_pud_range.isra.0.change_protection_range
75.00 ± 8% -71.6% 21.33 ±143% perf-sched.wait_and_delay.count.__cond_resched.down_write.__x64_sys_pkey_free.do_syscall_64.entry_SYSCALL_64_after_hwframe
82.00 ± 9% -76.4% 19.33 ±141% perf-sched.wait_and_delay.count.__cond_resched.down_write.anon_vma_clone.__split_vma.vma_modify
412.67 ± 7% -82.6% 71.83 ±141% perf-sched.wait_and_delay.count.__cond_resched.down_write.mprotect_fixup.do_mprotect_pkey.__x64_sys_pkey_mprotect
125.83 ± 9% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
86.83 ± 14% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down_write.vma_merge.constprop.0
225.33 ± 7% -76.1% 53.83 ±142% perf-sched.wait_and_delay.count.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
314.67 ± 31% -87.1% 40.50 ±142% perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
118.17 ± 12% -80.0% 23.67 ±143% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
206.50 ± 8% -77.2% 47.17 ±141% perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vma_modify
76.33 ± 23% -90.6% 7.17 ±223% perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
45.33 ± 21% -83.1% 7.67 ±148% perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
626.00 ± 66% -92.6% 46.17 ±142% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
10.33 ± 14% -83.9% 1.67 ±223% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
54.17 ± 27% -70.2% 16.17 ±141% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
1976 ± 7% -77.3% 447.67 ±141% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1760 ± 10% -74.8% 443.33 ±147% perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
195.50 ± 9% -75.9% 47.17 ±141% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
671.66 ± 29% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
602.38 ± 35% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.vma_merge.constprop.0
946.28 ± 21% -76.2% 225.08 ±144% perf-sched.wait_and_delay.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
4225 ± 39% -75.8% 1022 ±141% perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
2837 ± 31% -88.2% 334.64 ±223% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
4535 ± 33% -74.1% 1173 ±143% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
25.66 ±116% -96.6% 0.86 ±168% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
9.36 ± 40% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
9.63 ± 36% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
3.87 ± 38% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
10.81 ± 36% -76.2% 2.57 ±142% perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
2.32 ± 34% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
286.97 ±115% -99.8% 0.71 ±182% perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
0.39 ± 57% -81.6% 0.07 ±153% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
705.09 ± 56% -73.9% 183.73 ±142% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
112.73 ±118% -98.9% 1.19 ±142% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
335.83 ± 29% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
301.19 ± 35% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_merge.constprop.0
22.34 ± 98% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
473.14 ± 21% -76.2% 112.54 ±144% perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
7.07 ± 72% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
835.83 ±107% -99.8% 1.31 ±200% perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
1536 ± 83% -77.9% 339.71 ±141% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
2836 ± 31% -88.2% 334.51 ±223% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-09-16 8:12 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-16 8:11 [linux-next:master] [mm] cc8cb3697a: stress-ng.pkey.ops_per_sec 4.4% improvement kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox