Greeting, FYI, we noticed a 13.7% improvement of will-it-scale.per_thread_ops due to commit: commit: 4601e2fc8b57840660ce1a1ee98aea873fa15eee ("shmem: convert shmem_file_read_iter() to use shmem_get_folio()") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory with following parameters: nr_task: 100% mode: thread test: pread2 cpufreq_governor: performance test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp5/pread2/will-it-scale commit: eff1f906c2 ("shmem: convert shmem_write_begin() to use shmem_get_folio()") 4601e2fc8b ("shmem: convert shmem_file_read_iter() to use shmem_get_folio()") eff1f906c2dcd83c 4601e2fc8b57840660ce1a1ee98 ---------------- --------------------------- %stddev %change %stddev \ | \ 1508791 ± 3% +13.7% 1715505 ± 2% will-it-scale.128.threads 11786 ± 3% +13.7% 13401 ± 2% will-it-scale.per_thread_ops 1508791 ± 3% +13.7% 1715505 ± 2% will-it-scale.workload 2.92 ± 15% +43.7% 4.20 ± 16% turbostat.CPU%c1 58550 ± 4% -16.4% 48936 ± 5% sched_debug.cfs_rq:/.min_vruntime.stddev 0.20 ± 9% +17.5% 0.23 ± 5% sched_debug.cfs_rq:/.nr_running.stddev 58605 ± 5% -16.5% 48957 ± 5% sched_debug.cfs_rq:/.spread0.stddev 191.02 ± 4% +16.1% 221.72 ± 5% sched_debug.cfs_rq:/.util_est_enqueued.stddev 0.23 ± 3% +11.1% 0.25 ± 4% sched_debug.cpu.nr_running.stddev 12.20 -1.1% 12.07 perf-stat.i.cpi 0.00 ± 9% -0.0 0.00 ± 5% perf-stat.i.dTLB-store-miss-rate% 9.003e+08 ± 2% +6.4% 9.582e+08 perf-stat.i.dTLB-stores 82.71 +2.2 84.95 perf-stat.i.node-store-miss-rate% 5815837 +10.2% 6408731 perf-stat.i.node-store-misses 1223798 ± 2% -6.6% 1142824 ± 2% perf-stat.i.node-stores 12.19 -1.0% 12.06 perf-stat.overall.cpi 0.01 ± 3% -0.0 0.00 ± 5% perf-stat.overall.dTLB-store-miss-rate% 82.60 +2.2 84.85 perf-stat.overall.node-store-miss-rate% 6712074 ± 2% -12.0% 5904631 ± 2% perf-stat.overall.path-length 8.981e+08 ± 2% +6.4% 9.558e+08 perf-stat.ps.dTLB-stores 5796378 +10.2% 6387291 perf-stat.ps.node-store-misses 1220724 ± 2% -6.6% 1140426 ± 2% perf-stat.ps.node-stores 41.14 -41.1 0.00 perf-profile.calltrace.cycles-pp.shmem_getpage.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64 41.10 -41.1 0.00 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_getpage.shmem_file_read_iter.vfs_read.__x64_sys_pread64 41.04 -41.0 0.00 perf-profile.calltrace.cycles-pp.__filemap_get_folio.shmem_get_folio_gfp.shmem_getpage.shmem_file_read_iter.vfs_read 40.18 -40.2 0.00 perf-profile.calltrace.cycles-pp.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_getpage.shmem_file_read_iter 39.18 -39.2 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_getpage 0.00 +0.6 0.59 ± 7% perf-profile.calltrace.cycles-pp.io_schedule.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter 0.00 +39.4 39.45 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter 0.00 +40.5 40.46 perf-profile.calltrace.cycles-pp.folio_wait_bit_common.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read 0.00 +41.2 41.24 perf-profile.calltrace.cycles-pp.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64 0.00 +41.3 41.30 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.__x64_sys_pread64.do_syscall_64 41.14 -41.1 0.00 perf-profile.children.cycles-pp.shmem_getpage 0.10 ± 4% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.copyout 0.12 ± 3% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string 0.07 +0.0 0.09 perf-profile.children.cycles-pp.folio_unlock 0.12 ± 3% +0.0 0.14 ± 3% perf-profile.children.cycles-pp._copy_to_iter 0.13 ± 2% +0.0 0.15 ± 4% perf-profile.children.cycles-pp.copy_page_to_iter 0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.PageHeadHuge 0.46 -0.1 0.37 ± 3% perf-profile.self.cycles-pp.shmem_file_read_iter 0.82 ± 2% -0.1 0.74 ± 4% perf-profile.self.cycles-pp.__filemap_get_folio 0.12 ± 3% +0.0 0.14 ± 3% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string 0.07 +0.0 0.09 perf-profile.self.cycles-pp.folio_unlock To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://01.org/lkp