Hello, kernel test robot noticed a -18.6% regression of aim9.disk_src.ops_per_sec on: commit: 77323f99e9314a98a2535de3eb50f3559053fd1f ("shmem: stable directory cookies") git://git.kernel.org/cgit/linux/kernel/git/cel/linux topic-shmem-stable-dir-cookies testcase: aim9 test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory parameters: testtime: 300s test: disk_src cpufreq_governor: performance If you fix the issue, kindly add following tag | Reported-by: kernel test robot | Link: https://lore.kernel.org/oe-lkp/202304292120.e5736a73-oliver.sang@intel.com Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: gcc-11/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/disk_src/aim9/300s commit: v6.3-rc7 77323f99e9 ("shmem: stable directory cookies") v6.3-rc7 77323f99e9314a98a2535de3eb5 ---------------- --------------------------- %stddev %change %stddev \ | \ 18.00 ± 2% +3.6% 18.64 turbostat.RAMWatt 0.26 ± 6% +0.1 0.36 ± 3% mpstat.cpu.all.soft% 0.60 -0.1 0.52 mpstat.cpu.all.usr% 197551 -18.6% 160860 aim9.disk_src.ops_per_sec 9037 ± 81% -87.8% 1102 ± 17% aim9.time.involuntary_context_switches 95.00 -4.2% 91.00 aim9.time.percent_of_cpu_this_job_got 72.86 -16.5% 60.87 aim9.time.user_time 53647 ± 9% -18.4% 43763 ± 6% sched_debug.cfs_rq:/.load.avg 585894 ± 22% -22.0% 457111 ± 14% sched_debug.cfs_rq:/.load.max 150918 ± 10% -18.3% 123227 ± 7% sched_debug.cfs_rq:/.load.stddev 32854 ± 8% -18.6% 26750 ± 14% sched_debug.cfs_rq:/.min_vruntime.avg 23424 +5.9% 24807 proc-vmstat.nr_slab_reclaimable 1500956 ± 17% +350.4% 6760894 proc-vmstat.numa_hit 1461265 ± 18% +359.7% 6717790 proc-vmstat.numa_local 5178282 ± 23% +410.0% 26410329 proc-vmstat.pgalloc_normal 5155883 ± 23% +410.8% 26334081 proc-vmstat.pgfree 10.85 +27.1% 13.78 ± 2% perf-stat.i.MPKI 7.294e+08 -4.6% 6.96e+08 perf-stat.i.branch-instructions 2.64 +0.2 2.84 ± 3% perf-stat.i.branch-miss-rate% 17.54 -2.9 14.65 perf-stat.i.cache-miss-rate% 40048936 +20.0% 48046774 perf-stat.i.cache-references 1.71 +5.2% 1.80 ± 2% perf-stat.i.cpi 1.06e+09 -4.3% 1.015e+09 perf-stat.i.dTLB-loads 0.20 ± 2% -0.0 0.17 ± 5% perf-stat.i.dTLB-store-miss-rate% 1669316 ± 2% -7.4% 1546270 ± 5% perf-stat.i.dTLB-store-misses 8.508e+08 +4.7% 8.905e+08 perf-stat.i.dTLB-stores 3.787e+09 -3.7% 3.647e+09 perf-stat.i.instructions 0.59 -3.9% 0.57 perf-stat.i.ipc 875.82 -41.7% 510.99 ± 46% perf-stat.i.metric.K/sec 47.70 -1.5 46.23 perf-stat.i.node-load-miss-rate% 101351 ± 3% +8.5% 110004 ± 2% perf-stat.i.node-loads 98742 ± 3% +21.2% 119649 ± 3% perf-stat.i.node-stores 10.57 +24.5% 13.16 perf-stat.overall.MPKI 2.73 +0.2 2.91 perf-stat.overall.branch-miss-rate% 17.55 -2.9 14.64 perf-stat.overall.cache-miss-rate% 1.68 +3.7% 1.74 perf-stat.overall.cpi 0.20 ± 2% -0.0 0.17 ± 5% perf-stat.overall.dTLB-store-miss-rate% 0.59 -3.6% 0.57 perf-stat.overall.ipc 44.97 -1.8 43.20 perf-stat.overall.node-load-miss-rate% 7.271e+08 -4.5% 6.942e+08 perf-stat.ps.branch-instructions 19823815 +1.7% 20169956 perf-stat.ps.branch-misses 39915110 +20.0% 47885795 perf-stat.ps.cache-references 1.057e+09 -4.2% 1.012e+09 perf-stat.ps.dTLB-loads 1663740 ± 2% -7.4% 1541057 ± 5% perf-stat.ps.dTLB-store-misses 8.48e+08 +4.7% 8.876e+08 perf-stat.ps.dTLB-stores 3.776e+09 -3.7% 3.638e+09 perf-stat.ps.instructions 101092 ± 3% +8.8% 109960 ± 2% perf-stat.ps.node-loads 98399 ± 3% +21.7% 119795 ± 2% perf-stat.ps.node-stores 1.137e+12 -3.7% 1.096e+12 perf-stat.total.instructions 0.00 +0.7 0.68 ± 14% perf-profile.calltrace.cycles-pp.__call_rcu_common.xas_store.__xa_erase.xa_erase.shmem_unlink 0.00 +0.8 0.80 ± 34% perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store 0.00 +1.0 1.00 ± 29% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store.__xa_alloc 0.00 +1.1 1.08 ± 27% perf-profile.calltrace.cycles-pp.xas_alloc.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic 0.00 +1.1 1.11 ± 27% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_expand.xas_create.xas_store 0.00 +1.2 1.17 ± 13% perf-profile.calltrace.cycles-pp.xas_store.__xa_erase.xa_erase.shmem_unlink.vfs_unlink 0.00 +1.2 1.19 ± 13% perf-profile.calltrace.cycles-pp.__xa_erase.xa_erase.shmem_unlink.vfs_unlink.do_unlinkat 0.00 +1.2 1.21 ± 25% perf-profile.calltrace.cycles-pp.xas_alloc.xas_expand.xas_create.xas_store.__xa_alloc 0.00 +1.3 1.26 ± 13% perf-profile.calltrace.cycles-pp.xa_erase.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink 1.23 ± 26% +1.3 2.55 ± 16% perf-profile.calltrace.cycles-pp.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +1.4 1.37 ± 24% perf-profile.calltrace.cycles-pp.xas_expand.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic 0.12 ±223% +1.4 1.51 ± 15% perf-profile.calltrace.cycles-pp.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64 0.00 +2.6 2.60 ± 18% perf-profile.calltrace.cycles-pp.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic.shmem_doff_add 0.00 +2.7 2.72 ± 18% perf-profile.calltrace.cycles-pp.xas_store.__xa_alloc.__xa_alloc_cyclic.shmem_doff_add.shmem_mknod 0.00 +3.0 2.97 ± 17% perf-profile.calltrace.cycles-pp.__xa_alloc.__xa_alloc_cyclic.shmem_doff_add.shmem_mknod.lookup_open 0.00 +3.0 3.01 ± 17% perf-profile.calltrace.cycles-pp.__xa_alloc_cyclic.shmem_doff_add.shmem_mknod.lookup_open.open_last_lookups 0.00 +3.1 3.13 ± 17% perf-profile.calltrace.cycles-pp.shmem_doff_add.shmem_mknod.lookup_open.open_last_lookups.path_openat 2.56 ± 15% +3.4 5.96 ± 17% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open 4.76 ± 15% +3.7 8.42 ± 17% perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2 5.40 ± 17% +3.7 9.11 ± 16% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat 10.34 ± 17% +3.9 14.23 ± 15% perf-profile.calltrace.cycles-pp.__x64_sys_creat.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64 9.02 ± 17% +3.9 12.91 ± 15% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_creat.do_syscall_64.entry_SYSCALL_64_after_hwframe 8.93 ± 17% +3.9 12.82 ± 15% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat.do_syscall_64 10.26 ± 17% +3.9 14.16 ± 15% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_creat.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64 11.94 ± 17% +4.1 16.02 ± 14% perf-profile.calltrace.cycles-pp.creat64 0.22 ± 23% -0.1 0.16 ± 12% perf-profile.children.cycles-pp.rb_erase 0.17 ± 18% +0.1 0.22 ± 14% perf-profile.children.cycles-pp.__update_blocked_fair 0.00 +0.1 0.06 ± 17% perf-profile.children.cycles-pp.get_any_partial 0.01 ±223% +0.1 0.08 ± 27% perf-profile.children.cycles-pp.free_unref_page 0.02 ±144% +0.1 0.09 ± 34% perf-profile.children.cycles-pp.rcu_cblist_dequeue 0.09 ± 55% +0.1 0.16 ± 27% perf-profile.children.cycles-pp.find_get_entries 0.14 ± 23% +0.1 0.24 ± 29% perf-profile.children.cycles-pp.inode_init_once 0.02 ±144% +0.1 0.12 ± 18% perf-profile.children.cycles-pp.rcu_nocb_try_bypass 0.00 +0.1 0.11 ± 24% perf-profile.children.cycles-pp.__unfreeze_partials 0.11 ± 28% +0.1 0.22 ± 19% perf-profile.children.cycles-pp.xas_start 0.00 +0.1 0.12 ± 18% perf-profile.children.cycles-pp.rmqueue 0.00 +0.1 0.13 ± 19% perf-profile.children.cycles-pp.xas_find_marked 0.09 ± 53% +0.1 0.23 ± 24% perf-profile.children.cycles-pp.rcu_segcblist_enqueue 0.06 ± 17% +0.2 0.21 ± 13% perf-profile.children.cycles-pp.get_page_from_freelist 0.08 ± 19% +0.2 0.29 ± 11% perf-profile.children.cycles-pp.__alloc_pages 0.20 ± 17% +0.2 0.41 ± 17% perf-profile.children.cycles-pp.xas_load 0.12 ± 37% +0.2 0.37 ± 50% perf-profile.children.cycles-pp.syscall_enter_from_user_mode 0.33 ± 38% +0.5 0.88 ± 19% perf-profile.children.cycles-pp.__slab_free 0.48 ± 30% +0.6 1.06 ± 15% perf-profile.children.cycles-pp.__call_rcu_common 0.00 +0.8 0.77 ± 17% perf-profile.children.cycles-pp.radix_tree_node_rcu_free 0.00 +1.0 1.04 ± 22% perf-profile.children.cycles-pp.radix_tree_node_ctor 0.36 ± 54% +1.2 1.51 ± 15% perf-profile.children.cycles-pp.shmem_unlink 0.16 ± 26% +1.2 1.33 ± 20% perf-profile.children.cycles-pp.setup_object 0.00 +1.2 1.19 ± 13% perf-profile.children.cycles-pp.__xa_erase 0.00 +1.3 1.26 ± 13% perf-profile.children.cycles-pp.xa_erase 0.21 ± 27% +1.3 1.51 ± 20% perf-profile.children.cycles-pp.shuffle_freelist 1.25 ± 26% +1.3 2.56 ± 16% perf-profile.children.cycles-pp.vfs_unlink 0.00 +1.4 1.37 ± 24% perf-profile.children.cycles-pp.xas_expand 0.27 ± 23% +1.6 1.88 ± 18% perf-profile.children.cycles-pp.allocate_slab 0.38 ± 15% +1.7 2.11 ± 18% perf-profile.children.cycles-pp.___slab_alloc 1.36 ± 37% +1.8 3.17 ± 12% perf-profile.children.cycles-pp.rcu_core 12.34 ± 8% +1.8 14.15 ± 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 1.22 ± 37% +1.8 3.07 ± 13% perf-profile.children.cycles-pp.rcu_do_batch 2.84 ± 18% +1.9 4.70 ± 9% perf-profile.children.cycles-pp.__do_softirq 11.62 ± 8% +2.0 13.59 ± 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 2.69 ± 8% +2.2 4.92 ± 8% perf-profile.children.cycles-pp.__irq_exit_rcu 0.00 +2.3 2.30 ± 18% perf-profile.children.cycles-pp.xas_alloc 1.38 ± 14% +2.4 3.76 ± 17% perf-profile.children.cycles-pp.kmem_cache_alloc_lru 0.00 +2.6 2.61 ± 18% perf-profile.children.cycles-pp.xas_create 0.00 +3.0 2.97 ± 17% perf-profile.children.cycles-pp.__xa_alloc 0.00 +3.0 3.01 ± 17% perf-profile.children.cycles-pp.__xa_alloc_cyclic 0.00 +3.1 3.14 ± 17% perf-profile.children.cycles-pp.shmem_doff_add 2.56 ± 15% +3.4 5.97 ± 17% perf-profile.children.cycles-pp.shmem_mknod 4.78 ± 15% +3.7 8.45 ± 16% perf-profile.children.cycles-pp.lookup_open 5.44 ± 17% +3.7 9.14 ± 16% perf-profile.children.cycles-pp.open_last_lookups 9.12 ± 16% +3.9 13.00 ± 15% perf-profile.children.cycles-pp.path_openat 10.34 ± 17% +3.9 14.23 ± 15% perf-profile.children.cycles-pp.__x64_sys_creat 9.18 ± 16% +3.9 13.08 ± 15% perf-profile.children.cycles-pp.do_filp_open 0.00 +3.9 3.90 ± 16% perf-profile.children.cycles-pp.xas_store 10.47 ± 17% +3.9 14.38 ± 15% perf-profile.children.cycles-pp.do_sys_openat2 11.99 ± 17% +4.1 16.07 ± 14% perf-profile.children.cycles-pp.creat64 0.15 ± 28% -0.1 0.10 ± 16% perf-profile.self.cycles-pp.dput 0.21 ± 24% -0.1 0.16 ± 11% perf-profile.self.cycles-pp.rb_erase 0.12 ± 23% -0.0 0.07 ± 53% perf-profile.self.cycles-pp.current_time 0.06 ± 50% +0.1 0.12 ± 18% perf-profile.self.cycles-pp.___slab_alloc 0.12 ± 28% +0.1 0.19 ± 15% perf-profile.self.cycles-pp.inode_init_once 0.08 ± 12% +0.1 0.16 ± 36% perf-profile.self.cycles-pp.xas_load 0.01 ±223% +0.1 0.10 ± 16% perf-profile.self.cycles-pp.rcu_nocb_try_bypass 0.10 ± 25% +0.1 0.20 ± 22% perf-profile.self.cycles-pp.xas_start 0.02 ±144% +0.1 0.14 ± 36% perf-profile.self.cycles-pp.shuffle_freelist 0.00 +0.1 0.12 ± 22% perf-profile.self.cycles-pp.xas_find_marked 0.09 ± 51% +0.1 0.22 ± 28% perf-profile.self.cycles-pp.rcu_segcblist_enqueue 0.00 +0.1 0.14 ± 26% perf-profile.self.cycles-pp.xas_alloc 0.00 +0.1 0.14 ± 30% perf-profile.self.cycles-pp.xas_create 0.00 +0.2 0.15 ± 18% perf-profile.self.cycles-pp.xas_expand 0.00 +0.3 0.30 ± 11% perf-profile.self.cycles-pp.xas_store 0.32 ± 32% +0.3 0.63 ± 14% perf-profile.self.cycles-pp.__call_rcu_common 0.16 ± 27% +0.4 0.53 ± 15% perf-profile.self.cycles-pp.kmem_cache_alloc_lru 0.33 ± 39% +0.5 0.79 ± 21% perf-profile.self.cycles-pp.__slab_free 0.00 +0.8 0.76 ± 16% perf-profile.self.cycles-pp.radix_tree_node_rcu_free 0.00 +1.0 0.95 ± 21% perf-profile.self.cycles-pp.radix_tree_node_ctor Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests