Hello, kernel test robot noticed a 11.6% improvement of stress-ng.sendfile.ops_per_sec on: commit: 2cb1e08985e3dc59d0a4ebf770a87e3e2410d985 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master testcase: stress-ng test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% testtime: 60s class: pipe test: sendfile cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. ========================================================================================= class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: pipe/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp8/sendfile/stress-ng/60s commit: ab82513126 ("cifs: Use filemap_splice_read()") 2cb1e08985 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()") ab82513126f8b426 2cb1e08985e3dc59d0a4ebf770a ---------------- --------------------------- %stddev %change %stddev \ | \ 568180 -1.5% 559667 proc-vmstat.pgalloc_normal 348772 -1.7% 342744 proc-vmstat.pgfault 39953 +11.7% 44609 stress-ng.sendfile.MB_per_sec_sent_to_/dev/null 38320456 +11.6% 42768635 stress-ng.sendfile.ops 638671 +11.6% 712807 stress-ng.sendfile.ops_per_sec 0.18 ± 6% -0.1 0.11 ± 8% perf-stat.i.branch-miss-rate% 61342100 -61.5% 23631851 ± 3% perf-stat.i.branch-misses 0.74 +3.7% 0.77 perf-stat.i.cpi 0.28 ±222% -0.3 0.00 ± 4% perf-stat.i.dTLB-load-miss-rate% 7.958e+11 ±223% -100.0% 105622 ± 6% perf-stat.i.dTLB-load-misses 8.398e+10 -10.4% 7.528e+10 perf-stat.i.dTLB-loads 4.702e+10 -17.3% 3.888e+10 perf-stat.i.dTLB-stores 2.965e+11 -4.7% 2.825e+11 perf-stat.i.instructions 1.36 -4.0% 1.31 perf-stat.i.ipc 1632 -7.4% 1511 perf-stat.i.metric.M/sec 0.11 -0.1 0.04 ± 3% perf-stat.overall.branch-miss-rate% 0.73 +4.8% 0.76 perf-stat.overall.cpi 16.38 ±223% -16.4 0.00 ± 6% perf-stat.overall.dTLB-load-miss-rate% 0.00 ± 3% +0.0 0.00 ± 3% perf-stat.overall.dTLB-store-miss-rate% 1.38 -4.6% 1.32 perf-stat.overall.ipc 60279316 -61.5% 23221084 ± 3% perf-stat.ps.branch-misses 7.58e+11 ±223% -100.0% 104910 ± 6% perf-stat.ps.dTLB-load-misses 8.264e+10 -10.4% 7.408e+10 perf-stat.ps.dTLB-loads 4.628e+10 -17.3% 3.826e+10 perf-stat.ps.dTLB-stores 2.918e+11 -4.7% 2.78e+11 perf-stat.ps.instructions 1.832e+13 -4.8% 1.745e+13 perf-stat.total.instructions 73.32 -73.3 0.00 perf-profile.calltrace.cycles-pp.generic_file_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64 68.75 -68.7 0.00 perf-profile.calltrace.cycles-pp.filemap_read.generic_file_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile 24.87 ± 3% -24.9 0.00 perf-profile.calltrace.cycles-pp.filemap_get_pages.filemap_read.generic_file_splice_read.splice_direct_to_actor.do_splice_direct 23.72 ± 4% -23.7 0.00 perf-profile.calltrace.cycles-pp.filemap_get_read_batch.filemap_get_pages.filemap_read.generic_file_splice_read.splice_direct_to_actor 20.27 ± 3% -20.3 0.00 perf-profile.calltrace.cycles-pp.copy_page_to_iter_pipe.filemap_read.generic_file_splice_read.splice_direct_to_actor.do_splice_direct 0.58 +0.1 0.65 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_splice_read.splice_direct_to_actor.do_splice_direct 0.80 +0.1 0.89 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile 1.78 +0.1 1.88 perf-profile.calltrace.cycles-pp.page_cache_pipe_buf_confirm.__splice_from_pipe.splice_from_pipe.direct_splice_actor.splice_direct_to_actor 1.33 ± 2% +0.1 1.47 perf-profile.calltrace.cycles-pp.__fsnotify_parent.vfs_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile 3.07 +0.3 3.39 perf-profile.calltrace.cycles-pp.vfs_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64 0.00 +0.6 0.58 perf-profile.calltrace.cycles-pp.xas_descend.xas_load.filemap_get_read_batch.filemap_get_pages.filemap_splice_read 0.00 +0.6 0.61 perf-profile.calltrace.cycles-pp.current_time.atime_needs_update.touch_atime.filemap_splice_read.splice_direct_to_actor 0.00 +1.3 1.30 perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.filemap_splice_read.splice_direct_to_actor.do_splice_direct 0.00 +1.6 1.58 perf-profile.calltrace.cycles-pp.xas_load.filemap_get_read_batch.filemap_get_pages.filemap_splice_read.splice_direct_to_actor 0.00 +1.7 1.65 perf-profile.calltrace.cycles-pp.touch_atime.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile 10.13 +1.7 11.83 perf-profile.calltrace.cycles-pp.page_cache_pipe_buf_release.__splice_from_pipe.splice_from_pipe.direct_splice_actor.splice_direct_to_actor 0.00 +2.2 2.20 perf-profile.calltrace.cycles-pp.folio_mark_accessed.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile 22.17 +2.6 24.73 perf-profile.calltrace.cycles-pp.splice_from_pipe.direct_splice_actor.splice_direct_to_actor.do_splice_direct.do_sendfile 22.48 +2.6 25.04 perf-profile.calltrace.cycles-pp.direct_splice_actor.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64 20.17 +2.8 22.99 perf-profile.calltrace.cycles-pp.__splice_from_pipe.splice_from_pipe.direct_splice_actor.splice_direct_to_actor.do_splice_direct 0.00 +13.8 13.80 perf-profile.calltrace.cycles-pp.release_pages.__pagevec_release.filemap_splice_read.splice_direct_to_actor.do_splice_direct 0.00 +14.4 14.44 perf-profile.calltrace.cycles-pp.__pagevec_release.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile 0.00 +18.5 18.54 perf-profile.calltrace.cycles-pp.splice_folio_into_pipe.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile 0.00 +25.9 25.92 perf-profile.calltrace.cycles-pp.filemap_get_read_batch.filemap_get_pages.filemap_splice_read.splice_direct_to_actor.do_splice_direct 0.00 +27.2 27.15 perf-profile.calltrace.cycles-pp.filemap_get_pages.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile 0.00 +69.0 69.03 perf-profile.calltrace.cycles-pp.filemap_splice_read.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64 73.48 -73.5 0.00 perf-profile.children.cycles-pp.generic_file_splice_read 69.92 -69.9 0.00 perf-profile.children.cycles-pp.filemap_read 20.75 -20.8 0.00 perf-profile.children.cycles-pp.copy_page_to_iter_pipe 3.04 -1.3 1.75 perf-profile.children.cycles-pp.touch_atime 2.54 -1.1 1.47 perf-profile.children.cycles-pp.atime_needs_update 1.20 -0.5 0.69 perf-profile.children.cycles-pp.current_time 2.84 -0.2 2.64 perf-profile.children.cycles-pp.folio_mark_accessed 0.34 -0.1 0.19 ± 2% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64 0.26 ± 2% -0.1 0.15 ± 4% perf-profile.children.cycles-pp.make_vfsgid 0.25 ± 2% -0.1 0.16 ± 3% perf-profile.children.cycles-pp.make_vfsuid 0.08 +0.0 0.09 perf-profile.children.cycles-pp.pipe_unlock 0.26 +0.0 0.28 perf-profile.children.cycles-pp.__get_task_ioprio 0.26 +0.0 0.29 perf-profile.children.cycles-pp.aa_file_perm 0.30 ± 3% +0.0 0.33 perf-profile.children.cycles-pp.fsnotify_perm 0.18 ± 2% +0.0 0.20 ± 2% perf-profile.children.cycles-pp.rw_verify_area 0.69 +0.0 0.72 perf-profile.children.cycles-pp.xas_descend 0.42 +0.0 0.45 perf-profile.children.cycles-pp.xas_start 0.28 ± 2% +0.0 0.32 perf-profile.children.cycles-pp.splice_from_pipe_next 0.29 +0.0 0.33 ± 2% perf-profile.children.cycles-pp.rcu_all_qs 0.68 +0.1 0.75 perf-profile.children.cycles-pp.apparmor_file_permission 0.95 +0.1 1.04 perf-profile.children.cycles-pp.pipe_to_null 0.95 +0.1 1.06 perf-profile.children.cycles-pp.security_file_permission 1.76 +0.1 1.86 perf-profile.children.cycles-pp.xas_load 0.00 +0.1 0.11 perf-profile.children.cycles-pp.mlock_drain_local 0.76 ± 2% +0.1 0.87 perf-profile.children.cycles-pp.__cond_resched 2.20 +0.1 2.33 perf-profile.children.cycles-pp.page_cache_pipe_buf_confirm 1.43 ± 2% +0.2 1.58 perf-profile.children.cycles-pp.__fsnotify_parent 0.00 +0.2 0.25 perf-profile.children.cycles-pp.free_unref_page_list 0.00 +0.3 0.27 ± 3% perf-profile.children.cycles-pp.lru_add_drain_cpu 3.13 +0.3 3.46 perf-profile.children.cycles-pp.vfs_splice_read 0.00 +0.5 0.47 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list 10.17 +1.7 11.83 perf-profile.children.cycles-pp.page_cache_pipe_buf_release 23.91 ± 4% +2.2 26.14 perf-profile.children.cycles-pp.filemap_get_read_batch 24.98 ± 3% +2.3 27.29 perf-profile.children.cycles-pp.filemap_get_pages 21.14 +2.4 23.56 perf-profile.children.cycles-pp.__splice_from_pipe 22.34 +2.6 24.91 perf-profile.children.cycles-pp.splice_from_pipe 22.54 +2.6 25.11 perf-profile.children.cycles-pp.direct_splice_actor 0.00 +14.0 13.99 perf-profile.children.cycles-pp.release_pages 0.00 +14.6 14.62 perf-profile.children.cycles-pp.__pagevec_release 0.00 +18.3 18.30 perf-profile.children.cycles-pp.splice_folio_into_pipe 0.00 +70.5 70.52 perf-profile.children.cycles-pp.filemap_splice_read 16.46 -16.5 0.00 perf-profile.self.cycles-pp.filemap_read 16.16 -16.2 0.00 perf-profile.self.cycles-pp.copy_page_to_iter_pipe 0.95 ± 2% -0.4 0.54 perf-profile.self.cycles-pp.atime_needs_update 0.86 -0.4 0.50 perf-profile.self.cycles-pp.current_time 0.50 -0.2 0.25 ± 2% perf-profile.self.cycles-pp.touch_atime 2.32 ± 2% -0.1 2.19 perf-profile.self.cycles-pp.folio_mark_accessed 0.27 -0.1 0.15 ± 3% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64 0.20 ± 3% -0.1 0.11 ± 4% perf-profile.self.cycles-pp.make_vfsgid 0.19 ± 3% -0.1 0.12 ± 3% perf-profile.self.cycles-pp.make_vfsuid 0.23 ± 2% +0.0 0.25 perf-profile.self.cycles-pp.__get_task_ioprio 0.23 ± 2% +0.0 0.25 ± 2% perf-profile.self.cycles-pp.aa_file_perm 0.27 ± 3% +0.0 0.29 ± 2% perf-profile.self.cycles-pp.fsnotify_perm 0.56 +0.0 0.58 perf-profile.self.cycles-pp.xas_descend 0.14 ± 2% +0.0 0.16 ± 2% perf-profile.self.cycles-pp.rw_verify_area 0.35 +0.0 0.38 ± 2% perf-profile.self.cycles-pp.xas_start 0.19 ± 2% +0.0 0.22 ± 2% perf-profile.self.cycles-pp.rcu_all_qs 0.33 +0.0 0.35 ± 2% perf-profile.self.cycles-pp.splice_direct_to_actor 0.25 ± 2% +0.0 0.28 perf-profile.self.cycles-pp.splice_from_pipe_next 0.37 +0.0 0.41 perf-profile.self.cycles-pp.apparmor_file_permission 0.48 +0.0 0.52 perf-profile.self.cycles-pp.pipe_to_null 0.72 +0.0 0.76 perf-profile.self.cycles-pp.xas_load 0.31 ± 2% +0.0 0.35 ± 2% perf-profile.self.cycles-pp.security_file_permission 0.50 ± 2% +0.0 0.54 perf-profile.self.cycles-pp.vfs_splice_read 0.48 ± 2% +0.1 0.54 perf-profile.self.cycles-pp.__cond_resched 0.00 +0.1 0.07 ± 5% perf-profile.self.cycles-pp.mlock_drain_local 1.75 +0.1 1.85 perf-profile.self.cycles-pp.page_cache_pipe_buf_confirm 1.02 +0.1 1.14 perf-profile.self.cycles-pp.filemap_get_pages 1.39 ± 2% +0.1 1.54 perf-profile.self.cycles-pp.__fsnotify_parent 1.10 ± 2% +0.2 1.25 perf-profile.self.cycles-pp.splice_from_pipe 0.00 +0.2 0.19 ± 2% perf-profile.self.cycles-pp.free_unref_page_list 0.00 +0.2 0.24 ± 2% perf-profile.self.cycles-pp.lru_add_drain_cpu 0.00 +0.3 0.32 ± 2% perf-profile.self.cycles-pp.__pagevec_release 0.00 +0.4 0.40 perf-profile.self.cycles-pp.__mem_cgroup_uncharge_list 8.70 +0.6 9.31 perf-profile.self.cycles-pp.__splice_from_pipe 9.60 +1.6 11.20 perf-profile.self.cycles-pp.page_cache_pipe_buf_release 22.00 ± 4% +2.1 24.10 perf-profile.self.cycles-pp.filemap_get_read_batch 0.00 +6.5 6.50 perf-profile.self.cycles-pp.filemap_splice_read 0.00 +13.3 13.29 perf-profile.self.cycles-pp.release_pages 0.00 +17.5 17.53 perf-profile.self.cycles-pp.splice_folio_into_pipe Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki