linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [linux-next:master] [mm]  cc8cb3697a:  stress-ng.pkey.ops_per_sec 4.4% improvement
@ 2024-09-16  8:11 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2024-09-16  8:11 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: oe-lkp, lkp, Linux Memory Management List, Andrew Morton,
	Liam R. Howlett, Mark Brown, Vlastimil Babka, Bert Karwatzki,
	Jeff Xu, Jiri Olsa, Kees Cook, Lorenzo Stoakes, Matthew Wilcox,
	Paul E. McKenney, Paul Moore, Sidhartha Kumar,
	Suren Baghdasaryan, linux-kernel, ying.huang, feng.tang,
	fengwei.yin, oliver.sang



Hello,

kernel test robot noticed a 4.4% improvement of stress-ng.pkey.ops_per_sec on:


commit: cc8cb3697a8d8eabe1fb9acb8768b11c1ab607d8 ("mm: refactor vma_merge() into modify-only vma_merge_existing_range()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: pkey
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240916/202409161559.af0a1b99-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/pkey/stress-ng/60s

commit: 
  65e0aa64df ("mm: introduce commit_merge(), abstracting final commit of merge")
  cc8cb3697a ("mm: refactor vma_merge() into modify-only vma_merge_existing_range()")

65e0aa64df916861 cc8cb3697a8d8eabe1fb9acb876 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    159916 ±  5%     +14.9%     183809 ± 10%  meminfo.DirectMap4k
     15.42 ± 23%     +46.5%      22.58 ± 17%  sched_debug.cpu.nr_uninterruptible.max
 2.158e+08            +4.4%  2.253e+08        stress-ng.pkey.ops
   3596484            +4.4%    3755565        stress-ng.pkey.ops_per_sec
    196.30            +4.9%     205.86        stress-ng.time.user_time
  25782400            +3.4%   26666903        proc-vmstat.numa_hit
  25707363            +3.5%   26600006        proc-vmstat.numa_local
  44223158            +3.4%   45721027        proc-vmstat.pgalloc_normal
  39763569            +3.5%   41151044        proc-vmstat.pgfree
 3.568e+10            +1.4%  3.619e+10        perf-stat.i.branch-instructions
  87058419 ±  2%      +3.1%   89795461        perf-stat.i.branch-misses
 1.482e+08            +2.7%  1.521e+08        perf-stat.i.cache-references
      1854            -2.2%       1813        perf-stat.i.cycles-between-cache-misses
  1.68e+11            +1.1%  1.699e+11        perf-stat.i.instructions
      0.64            +1.8%       0.65        perf-stat.overall.MPKI
      1812            -2.5%       1766        perf-stat.overall.cycles-between-cache-misses
 1.045e+08            +2.6%  1.073e+08        perf-stat.ps.cache-misses
 1.446e+08            +2.8%  1.486e+08        perf-stat.ps.cache-references
     25.66 ±116%     -96.6%       0.86 ±168%  perf-sched.sch_delay.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
      9.35 ± 40%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
      9.63 ± 36%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
      3.87 ± 38%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
     10.81 ± 36%     -76.2%       2.57 ±142%  perf-sched.sch_delay.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
      3.74 ± 55%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
      2.32 ± 34%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
      1.32 ±104%     -80.1%       0.26 ±221%  perf-sched.sch_delay.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
     19.81 ±188%     -99.3%       0.14 ±142%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.39 ± 57%     -81.6%       0.07 ±153%  perf-sched.sch_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
    180.87 ±203%     -99.1%       1.55 ±153%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
      0.36 ±108%     -96.5%       0.01 ±187%  perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      1.44 ± 19%     -85.8%       0.20 ±171%  perf-sched.sch_delay.avg.ms.wait_for_partner.fifo_open.do_dentry_open.vfs_open
     40.94 ±115%     -99.8%       0.10 ±143%  perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    112.73 ±118%     -98.9%       1.19 ±142%  perf-sched.sch_delay.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
    335.83 ± 29%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
    301.19 ± 35%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_merge.constprop.0
     22.34 ± 98%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
    473.14 ± 21%     -76.2%     112.54 ±144%  perf-sched.sch_delay.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
      6.84 ± 51%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.khugepaged.kthread.ret_from_fork.ret_from_fork_asm
      7.07 ± 72%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
      0.42 ±147%     -98.1%       0.01 ±141%  perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
      2373 ± 40%     -78.6%     507.50 ±152%  perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      1.70 ±111%     -96.7%       0.06 ±212%  perf-sched.sch_delay.max.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
      1309 ± 77%     -99.6%       5.06 ±165%  perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2745 ± 25%     -81.5%     507.97 ±152%  perf-sched.total_sch_delay.max.ms
     10044 ±  4%     -74.3%       2576 ±141%  perf-sched.total_wait_and_delay.count.ms
      6234 ± 21%     -77.2%       1421 ±141%  perf-sched.total_wait_and_delay.max.ms
     18.71 ± 40%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
     19.26 ± 36%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
     21.62 ± 36%     -76.2%       5.15 ±142%  perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
    885.96 ± 42%     -79.1%     185.28 ±142%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
    144.50 ± 24%     -86.4%      19.67 ±145%  perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages_noprof.alloc_pages_mpol_noprof.folio_alloc_mpol_noprof.shmem_alloc_folio
    131.83 ±  9%     -73.7%      34.67 ±141%  perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    159.83 ±  8%     -75.8%      38.67 ±144%  perf-sched.wait_and_delay.count.__cond_resched.change_pmd_range.isra.0.change_pud_range
    227.83 ±  9%     -77.0%      52.33 ±141%  perf-sched.wait_and_delay.count.__cond_resched.change_pud_range.isra.0.change_protection_range
     75.00 ±  8%     -71.6%      21.33 ±143%  perf-sched.wait_and_delay.count.__cond_resched.down_write.__x64_sys_pkey_free.do_syscall_64.entry_SYSCALL_64_after_hwframe
     82.00 ±  9%     -76.4%      19.33 ±141%  perf-sched.wait_and_delay.count.__cond_resched.down_write.anon_vma_clone.__split_vma.vma_modify
    412.67 ±  7%     -82.6%      71.83 ±141%  perf-sched.wait_and_delay.count.__cond_resched.down_write.mprotect_fixup.do_mprotect_pkey.__x64_sys_pkey_mprotect
    125.83 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
     86.83 ± 14%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.down_write.vma_merge.constprop.0
    225.33 ±  7%     -76.1%      53.83 ±142%  perf-sched.wait_and_delay.count.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
    314.67 ± 31%     -87.1%      40.50 ±142%  perf-sched.wait_and_delay.count.__cond_resched.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
    118.17 ± 12%     -80.0%      23.67 ±143%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.__split_vma
    206.50 ±  8%     -77.2%      47.17 ±141%  perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc_noprof.vm_area_dup.__split_vma.vma_modify
     76.33 ± 23%     -90.6%       7.17 ±223%  perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_write_begin
     45.33 ± 21%     -83.1%       7.67 ±148%  perf-sched.wait_and_delay.count.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
    626.00 ± 66%     -92.6%      46.17 ±142%  perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
     10.33 ± 14%     -83.9%       1.67 ±223%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
     54.17 ± 27%     -70.2%      16.17 ±141%  perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      1976 ±  7%     -77.3%     447.67 ±141%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1760 ± 10%     -74.8%     443.33 ±147%  perf-sched.wait_and_delay.count.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
    195.50 ±  9%     -75.9%      47.17 ±141%  perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    671.66 ± 29%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
    602.38 ± 35%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.vma_merge.constprop.0
    946.28 ± 21%     -76.2%     225.08 ±144%  perf-sched.wait_and_delay.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
      4225 ± 39%     -75.8%       1022 ±141%  perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      2837 ± 31%     -88.2%     334.64 ±223%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      4535 ± 33%     -74.1%       1173 ±143%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
     25.66 ±116%     -96.6%       0.86 ±168%  perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
      9.36 ± 40%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
      9.63 ± 36%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_merge.constprop.0
      3.87 ± 38%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
     10.81 ± 36%     -76.2%       2.57 ±142%  perf-sched.wait_time.avg.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
      2.32 ± 34%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
    286.97 ±115%     -99.8%       0.71 ±182%  perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      0.39 ± 57%     -81.6%       0.07 ±153%  perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown]
    705.09 ± 56%     -73.9%     183.73 ±142%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
    112.73 ±118%     -98.9%       1.19 ±142%  perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_noprof.alloc_slab_obj_exts.allocate_slab.___slab_alloc
    335.83 ± 29%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.unlink_anon_vmas.vma_complete.vma_merge
    301.19 ± 35%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_merge.constprop.0
     22.34 ± 98%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.vma_prepare.vma_merge.constprop
    473.14 ± 21%     -76.2%     112.54 ±144%  perf-sched.wait_time.max.ms.__cond_resched.down_write_killable.do_mprotect_pkey.__x64_sys_pkey_mprotect.do_syscall_64
      7.07 ± 72%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.mas_alloc_nodes.mas_preallocate.vma_merge
    835.83 ±107%     -99.8%       1.31 ±200%  perf-sched.wait_time.max.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep
      1536 ± 83%     -77.9%     339.71 ±141%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      2836 ± 31%     -88.2%     334.51 ±223%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-09-16  8:12 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-16  8:11 [linux-next:master] [mm] cc8cb3697a: stress-ng.pkey.ops_per_sec 4.4% improvement kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox