hi, Chuck Lever, we just reported a -15.5% regression of will-it-scale.per_thread_ops for this commit on https://lore.kernel.org/oe-lkp/202307171436.29248fcf-oliver.sang@intel.com/ but then we found the commit has already been in linux-next/master and caused regression on a different platform for a different test. so we report again FYI. Hello, kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: commit: ad9717ca487a35b0cbd9b0a9b5472c0e4005a473 ("shmem: stable directory offsets") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master testcase: aim9 test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory parameters: testtime: 300s test: disk_src cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-lkp/202307171640.e299f8d5-oliver.sang@intel.com Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/disk_src/aim9/300s commit: c1fdf67ddd ("shmem: Refactor shmem_symlink()") ad9717ca48 ("shmem: stable directory offsets") c1fdf67ddd6264d5 ad9717ca487a35b0cbd9b0a9b54 ---------------- --------------------------- %stddev %change %stddev \ | \ 1483 ± 5% -5.5% 1401 ± 2% vmstat.system.cs 0.26 ± 6% +0.1 0.36 ± 3% mpstat.cpu.all.soft% 0.61 -0.1 0.52 mpstat.cpu.all.usr% 202486 -19.0% 163938 aim9.disk_src.ops_per_sec 94.83 -4.0% 91.00 aim9.time.percent_of_cpu_this_job_got 73.72 -16.3% 61.72 aim9.time.user_time 23497 +6.6% 25048 proc-vmstat.nr_slab_reclaimable 1383577 ± 18% +397.0% 6876327 proc-vmstat.numa_hit 1333301 ± 19% +412.2% 6829356 proc-vmstat.numa_local 4877549 ± 26% +453.1% 26979997 proc-vmstat.pgalloc_normal 4851765 ± 26% +454.6% 26908162 proc-vmstat.pgfree 146.51 ± 69% -60.4% 58.03 ± 20% sched_debug.cfs_rq:/.load_avg.avg 114882 ± 30% -47.2% 60675 ± 17% sched_debug.cfs_rq:/.min_vruntime.max 21297 ± 29% -38.2% 13153 ± 12% sched_debug.cfs_rq:/.min_vruntime.stddev 21297 ± 29% -38.2% 13154 ± 12% sched_debug.cfs_rq:/.spread0.stddev 0.00 ± 2% -20.0% 0.00 ± 18% sched_debug.cpu.next_balance.stddev 11.44 ± 3% +21.9% 13.94 perf-stat.i.MPKI 7.193e+08 -4.6% 6.862e+08 perf-stat.i.branch-instructions 2.75 ± 2% +0.2 2.91 perf-stat.i.branch-miss-rate% 20223977 +1.6% 20554390 perf-stat.i.branch-misses 17.34 -2.7 14.66 perf-stat.i.cache-miss-rate% 40769426 +18.7% 48406438 perf-stat.i.cache-references 55.22 ± 2% +3.2% 56.99 perf-stat.i.cpu-migrations 1.006e+09 -4.2% 9.636e+08 perf-stat.i.dTLB-loads 0.26 ± 5% -0.0 0.23 ± 2% perf-stat.i.dTLB-store-miss-rate% 8.46e+08 +5.2% 8.901e+08 perf-stat.i.dTLB-stores 3.696e+09 -3.5% 3.565e+09 perf-stat.i.instructions 0.57 -3.7% 0.55 perf-stat.i.ipc 857.81 -6.4% 802.65 perf-stat.i.metric.K/sec 47.64 -2.6 45.06 perf-stat.i.node-load-miss-rate% 103372 ± 6% +15.6% 119451 ± 2% perf-stat.i.node-loads 102852 ± 3% +29.9% 133625 ± 6% perf-stat.i.node-stores 11.03 +23.1% 13.58 perf-stat.overall.MPKI 2.81 +0.2 3.00 perf-stat.overall.branch-miss-rate% 17.25 -2.6 14.62 perf-stat.overall.cache-miss-rate% 1.74 +3.9% 1.81 perf-stat.overall.cpi 0.26 ± 5% -0.0 0.23 ± 2% perf-stat.overall.dTLB-store-miss-rate% 0.57 -3.7% 0.55 perf-stat.overall.ipc 31.17 ± 5% -5.8 25.41 ± 9% perf-stat.overall.node-store-miss-rate% 7.168e+08 -4.6% 6.839e+08 perf-stat.ps.branch-instructions 20156378 +1.6% 20483447 perf-stat.ps.branch-misses 40634537 +18.7% 48244265 perf-stat.ps.cache-references 55.05 ± 2% +3.2% 56.83 perf-stat.ps.cpu-migrations 1.002e+09 -4.2% 9.604e+08 perf-stat.ps.dTLB-loads 8.431e+08 +5.2% 8.871e+08 perf-stat.ps.dTLB-stores 3.684e+09 -3.5% 3.553e+09 perf-stat.ps.instructions 103023 ± 6% +15.6% 119065 ± 2% perf-stat.ps.node-loads 102443 ± 3% +30.0% 133176 ± 6% perf-stat.ps.node-stores 1.113e+12 -3.7% 1.071e+12 perf-stat.total.instructions 1.02 ± 10% -0.3 0.76 ± 12% perf-profile.calltrace.cycles-pp.__d_alloc.d_alloc.d_alloc_parallel.lookup_open.open_last_lookups 0.85 ± 9% -0.2 0.65 ± 9% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.__d_alloc.d_alloc.d_alloc_parallel.lookup_open 0.00 +0.7 0.68 ± 9% perf-profile.calltrace.cycles-pp.__call_rcu_common.xas_store.__xa_erase.xa_erase.simple_offset_remove 0.00 +0.8 0.76 ± 13% perf-profile.calltrace.cycles-pp.radix_tree_node_ctor.shuffle_freelist.allocate_slab.___slab_alloc.kmem_cache_alloc_lru 1.59 ± 19% +0.8 2.44 ± 10% perf-profile.calltrace.cycles-pp.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +0.9 0.88 ± 29% perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store 0.00 +0.9 0.92 ± 12% perf-profile.calltrace.cycles-pp.shuffle_freelist.allocate_slab.___slab_alloc.kmem_cache_alloc_lru.xas_alloc 0.00 +1.1 1.06 ± 31% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_expand.xas_create.xas_store 0.00 +1.1 1.12 ± 21% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store.__xa_alloc 0.00 +1.1 1.14 ± 12% perf-profile.calltrace.cycles-pp.xas_store.__xa_erase.xa_erase.simple_offset_remove.shmem_unlink 0.00 +1.2 1.15 ± 29% perf-profile.calltrace.cycles-pp.xas_alloc.xas_expand.xas_create.xas_store.__xa_alloc 0.00 +1.2 1.16 ± 12% perf-profile.calltrace.cycles-pp.__xa_erase.xa_erase.simple_offset_remove.shmem_unlink.vfs_unlink 0.00 +1.2 1.19 ± 20% perf-profile.calltrace.cycles-pp.xas_alloc.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic 0.00 +1.2 1.21 ± 12% perf-profile.calltrace.cycles-pp.xa_erase.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat 0.00 +1.2 1.22 ± 12% perf-profile.calltrace.cycles-pp.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink 0.00 +1.3 1.31 ± 27% perf-profile.calltrace.cycles-pp.xas_expand.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic 0.00 +1.5 1.47 ± 10% perf-profile.calltrace.cycles-pp.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64 6.16 ± 13% +2.1 8.21 ± 9% perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2 0.46 ± 85% +2.5 2.94 ± 88% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.rest_init.arch_call_rest_init 0.47 ± 85% +2.5 2.95 ± 87% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.rest_init.arch_call_rest_init.start_kernel 0.47 ± 85% +2.5 2.95 ± 87% perf-profile.calltrace.cycles-pp.cpu_startup_entry.rest_init.arch_call_rest_init.start_kernel.x86_64_start_reservations 0.47 ± 85% +2.5 2.95 ± 87% perf-profile.calltrace.cycles-pp.x86_64_start_kernel.secondary_startup_64_no_verify 0.47 ± 85% +2.5 2.95 ± 87% perf-profile.calltrace.cycles-pp.x86_64_start_reservations.x86_64_start_kernel.secondary_startup_64_no_verify 0.47 ± 85% +2.5 2.95 ± 87% perf-profile.calltrace.cycles-pp.start_kernel.x86_64_start_reservations.x86_64_start_kernel.secondary_startup_64_no_verify 0.47 ± 85% +2.5 2.95 ± 87% perf-profile.calltrace.cycles-pp.arch_call_rest_init.start_kernel.x86_64_start_reservations.x86_64_start_kernel.secondary_startup_64_no_verify 0.47 ± 85% +2.5 2.95 ± 87% perf-profile.calltrace.cycles-pp.rest_init.arch_call_rest_init.start_kernel.x86_64_start_reservations.x86_64_start_kernel 0.35 ±112% +2.5 2.88 ± 91% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.rest_init 3.43 ± 14% +2.6 6.01 ± 9% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open 0.00 +2.7 2.68 ± 9% perf-profile.calltrace.cycles-pp.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add 0.00 +2.8 2.79 ± 8% perf-profile.calltrace.cycles-pp.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod 0.00 +3.1 3.05 ± 9% perf-profile.calltrace.cycles-pp.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open 0.00 +3.1 3.08 ± 9% perf-profile.calltrace.cycles-pp.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups 0.00 +3.2 3.19 ± 9% perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat 1.02 ± 10% -0.3 0.77 ± 12% perf-profile.children.cycles-pp.__d_alloc 0.84 ± 5% -0.2 0.68 ± 10% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook 0.55 ± 14% -0.2 0.39 ± 23% perf-profile.children.cycles-pp.mod_objcg_state 0.33 ± 6% -0.1 0.18 ± 33% perf-profile.children.cycles-pp.shmem_get_partial_folio 0.26 ± 18% -0.1 0.14 ± 28% perf-profile.children.cycles-pp.filemap_get_entry 0.47 ± 14% -0.1 0.36 ± 14% perf-profile.children.cycles-pp._IO_fgets 0.29 ± 22% -0.1 0.21 ± 28% perf-profile.children.cycles-pp.current_time 0.31 ± 13% -0.1 0.23 ± 18% perf-profile.children.cycles-pp.dentry_unlink_inode 0.27 ± 14% -0.1 0.20 ± 13% perf-profile.children.cycles-pp.apparmor_file_open 0.16 ± 13% -0.1 0.10 ± 29% perf-profile.children.cycles-pp.up_write 0.10 ± 25% -0.1 0.05 ± 45% perf-profile.children.cycles-pp.file_free_rcu 0.20 ± 17% -0.1 0.15 ± 13% perf-profile.children.cycles-pp.apparmor_path_unlink 0.15 ± 8% -0.0 0.11 ± 20% perf-profile.children.cycles-pp._IO_default_xsputn 0.10 ± 23% -0.0 0.06 ± 29% perf-profile.children.cycles-pp.refill_obj_stock 0.14 ± 12% +0.1 0.20 ± 23% perf-profile.children.cycles-pp.rb_next 0.00 +0.1 0.07 ± 15% perf-profile.children.cycles-pp.shmem_destroy_inode 0.38 ± 13% +0.1 0.47 ± 9% perf-profile.children.cycles-pp.timerqueue_add 0.00 +0.1 0.09 ± 21% perf-profile.children.cycles-pp.rmqueue 0.00 +0.1 0.10 ± 30% perf-profile.children.cycles-pp.xas_find_marked 0.03 ±100% +0.1 0.13 ± 26% perf-profile.children.cycles-pp.__unfreeze_partials 0.69 ± 12% +0.1 0.83 ± 9% perf-profile.children.cycles-pp.tick_nohz_next_event 0.10 ± 30% +0.1 0.25 ± 18% perf-profile.children.cycles-pp.rcu_segcblist_enqueue 0.03 ±102% +0.2 0.20 ± 14% perf-profile.children.cycles-pp.xas_descend 0.01 ±223% +0.2 0.18 ± 19% perf-profile.children.cycles-pp.get_page_from_freelist 0.04 ± 45% +0.2 0.24 ± 21% perf-profile.children.cycles-pp.__alloc_pages 0.49 ± 28% +0.4 0.92 ± 11% perf-profile.children.cycles-pp.__slab_free 0.58 ± 18% +0.6 1.16 ± 9% perf-profile.children.cycles-pp.__call_rcu_common 0.00 +0.8 0.76 ± 17% perf-profile.children.cycles-pp.radix_tree_node_rcu_free 1.60 ± 19% +0.8 2.44 ± 10% perf-profile.children.cycles-pp.vfs_unlink 0.37 ± 23% +1.1 1.48 ± 10% perf-profile.children.cycles-pp.shmem_unlink 0.00 +1.1 1.11 ± 9% perf-profile.children.cycles-pp.radix_tree_node_ctor 0.00 +1.2 1.16 ± 12% perf-profile.children.cycles-pp.__xa_erase 0.00 +1.2 1.21 ± 12% perf-profile.children.cycles-pp.xa_erase 0.00 +1.2 1.22 ± 12% perf-profile.children.cycles-pp.simple_offset_remove 0.24 ± 23% +1.2 1.47 ± 10% perf-profile.children.cycles-pp.shuffle_freelist 0.00 +1.3 1.31 ± 27% perf-profile.children.cycles-pp.xas_expand 3.25 ± 16% +1.4 4.68 ± 8% perf-profile.children.cycles-pp.__do_softirq 1.82 ± 29% +1.4 3.25 ± 11% perf-profile.children.cycles-pp.rcu_core 1.70 ± 30% +1.4 3.15 ± 12% perf-profile.children.cycles-pp.rcu_do_batch 0.33 ± 18% +1.5 1.80 ± 11% perf-profile.children.cycles-pp.allocate_slab 0.46 ± 14% +1.6 2.03 ± 9% perf-profile.children.cycles-pp.___slab_alloc 3.22 ± 17% +1.7 4.91 ± 7% perf-profile.children.cycles-pp.__irq_exit_rcu 1.83 ± 11% +1.8 3.67 ± 9% perf-profile.children.cycles-pp.kmem_cache_alloc_lru 6.18 ± 13% +2.0 8.22 ± 9% perf-profile.children.cycles-pp.lookup_open 0.00 +2.3 2.34 ± 9% perf-profile.children.cycles-pp.xas_alloc 0.55 ± 62% +2.4 2.95 ± 87% perf-profile.children.cycles-pp.x86_64_start_kernel 0.55 ± 62% +2.4 2.95 ± 87% perf-profile.children.cycles-pp.x86_64_start_reservations 0.55 ± 62% +2.4 2.95 ± 87% perf-profile.children.cycles-pp.start_kernel 0.55 ± 62% +2.4 2.95 ± 87% perf-profile.children.cycles-pp.arch_call_rest_init 0.55 ± 62% +2.4 2.95 ± 87% perf-profile.children.cycles-pp.rest_init 3.44 ± 14% +2.6 6.01 ± 9% perf-profile.children.cycles-pp.shmem_mknod 0.00 +2.7 2.70 ± 8% perf-profile.children.cycles-pp.xas_create 0.00 +3.1 3.05 ± 9% perf-profile.children.cycles-pp.__xa_alloc 0.00 +3.1 3.08 ± 9% perf-profile.children.cycles-pp.__xa_alloc_cyclic 0.00 +3.2 3.20 ± 9% perf-profile.children.cycles-pp.simple_offset_add 0.00 +3.9 3.95 ± 9% perf-profile.children.cycles-pp.xas_store 0.64 ± 13% -0.1 0.50 ± 12% perf-profile.self.cycles-pp.path_init 0.44 ± 15% -0.1 0.30 ± 16% perf-profile.self.cycles-pp._IO_fgets 0.30 ± 15% -0.1 0.21 ± 28% perf-profile.self.cycles-pp.__fput 0.25 ± 12% -0.1 0.17 ± 22% perf-profile.self.cycles-pp.apparmor_file_open 0.53 ± 3% -0.1 0.46 ± 7% perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook 0.18 ± 15% -0.1 0.10 ± 25% perf-profile.self.cycles-pp.filemap_get_entry 0.10 ± 21% -0.1 0.03 ±100% perf-profile.self.cycles-pp.shmem_mknod 0.21 ± 21% -0.1 0.14 ± 18% perf-profile.self.cycles-pp.apparmor_file_alloc_security 0.17 ± 20% -0.1 0.11 ± 31% perf-profile.self.cycles-pp.inode_init_always 0.16 ± 14% -0.1 0.10 ± 26% perf-profile.self.cycles-pp.up_write 0.14 ± 16% -0.1 0.08 ± 23% perf-profile.self.cycles-pp.security_inode_init_security 0.09 ± 26% -0.1 0.04 ± 71% perf-profile.self.cycles-pp.file_free_rcu 0.11 ± 14% -0.1 0.06 ± 54% perf-profile.self.cycles-pp.__destroy_inode 0.18 ± 17% -0.0 0.13 ± 12% perf-profile.self.cycles-pp.apparmor_path_unlink 0.14 ± 13% -0.0 0.10 ± 19% perf-profile.self.cycles-pp._IO_default_xsputn 0.10 ± 13% -0.0 0.05 ± 51% perf-profile.self.cycles-pp.obj_cgroup_charge 0.10 ± 26% -0.0 0.06 ± 29% perf-profile.self.cycles-pp.refill_obj_stock 0.11 ± 17% -0.0 0.08 ± 21% perf-profile.self.cycles-pp.ihold 0.12 ± 11% +0.1 0.18 ± 21% perf-profile.self.cycles-pp.rb_next 0.02 ±144% +0.1 0.09 ± 23% perf-profile.self.cycles-pp.rcu_do_batch 0.00 +0.1 0.10 ± 27% perf-profile.self.cycles-pp.xas_find_marked 0.00 +0.1 0.11 ± 34% perf-profile.self.cycles-pp.xas_create 0.03 ±100% +0.1 0.16 ± 19% perf-profile.self.cycles-pp.xas_descend 0.09 ± 27% +0.1 0.24 ± 19% perf-profile.self.cycles-pp.rcu_segcblist_enqueue 0.00 +0.1 0.14 ± 15% perf-profile.self.cycles-pp.xas_alloc 0.00 +0.2 0.15 ± 18% perf-profile.self.cycles-pp.xas_expand 0.25 ± 16% +0.3 0.54 ± 14% perf-profile.self.cycles-pp.kmem_cache_alloc_lru 0.38 ± 21% +0.3 0.68 ± 8% perf-profile.self.cycles-pp.__call_rcu_common 0.00 +0.3 0.31 ± 18% perf-profile.self.cycles-pp.xas_store 0.49 ± 28% +0.4 0.91 ± 10% perf-profile.self.cycles-pp.__slab_free 0.00 +0.7 0.75 ± 17% perf-profile.self.cycles-pp.radix_tree_node_rcu_free 0.00 +1.0 1.01 ± 10% perf-profile.self.cycles-pp.radix_tree_node_ctor Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki