Hello, kernel test robot noticed a -7.3% regression of stress-ng.sock.ops_per_sec on: commit: dfa2f0483360d4d6f2324405464c9f281156bd87 ("tcp: get rid of sysctl_tcp_adv_win_scale") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master testcase: stress-ng test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory parameters: nr_threads: 1 disk: 1HDD testtime: 60s fs: ext4 class: os test: sock cpufreq_governor: performance If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-lkp/202307312121.d8479e5e-oliver.sang@intel.com Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. ========================================================================================= class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/sock/stress-ng/60s commit: 63c8778d91 ("Merge branch 'net-mana-fix-doorbell-access-for-receive-queues'") dfa2f04833 ("tcp: get rid of sysctl_tcp_adv_win_scale") 63c8778d9149d5df dfa2f0483360d4d6f2324405464 ---------------- --------------------------- %stddev %change %stddev \ | \ 8094125 +21.5% 9832824 ± 18% cpuidle..usage 5.04 -6.1% 4.73 ± 10% iostat.cpu.system 330990 ± 2% -32.3% 223958 ± 3% turbostat.C1 4685666 +22.3% 5729557 turbostat.POLL 23600 ± 8% +51.9% 35849 ± 25% sched_debug.cfs_rq:/.min_vruntime.max 4907 ± 7% +44.2% 7073 ± 45% sched_debug.cfs_rq:/.min_vruntime.stddev 4911 ± 7% +44.1% 7075 ± 45% sched_debug.cfs_rq:/.spread0.stddev 43.08 ± 15% -41.0% 25.42 ± 32% perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 269948 ± 2% +8.1% 291932 ± 2% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked 43.08 ± 15% -41.0% 25.42 ± 32% perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write 0.02 ± 31% +35.0% 0.03 ± 5% perf-sched.wait_time.max.ms.__cond_resched.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.__sys_sendto 93552 -7.3% 86706 stress-ng.sock.ops 1559 -7.3% 1445 stress-ng.sock.ops_per_sec 139.17 -3.4% 134.50 stress-ng.time.percent_of_cpu_this_job_got 5092570 +18.6% 6039727 stress-ng.time.voluntary_context_switches 1.45 +1.4 2.83 ±105% perf-stat.i.branch-miss-rate% 1620951 ± 30% -39.7% 977769 ± 37% perf-stat.i.dTLB-store-misses 911.68 -3.6% 878.55 perf-stat.i.instructions-per-iTLB-miss 1.54 +0.2 1.69 ± 15% perf-stat.overall.branch-miss-rate% 0.16 ± 30% -0.1 0.10 ± 22% perf-stat.overall.dTLB-store-miss-rate% 742.16 -4.3% 710.16 perf-stat.overall.instructions-per-iTLB-miss 1595258 ± 30% -39.6% 962800 ± 37% perf-stat.ps.dTLB-store-misses 67709 +12.6% 76211 ± 14% proc-vmstat.nr_active_anon 73849 +11.0% 81975 ± 11% proc-vmstat.nr_shmem 67709 +12.6% 76211 ± 14% proc-vmstat.nr_zone_active_anon 6320969 -6.7% 5895784 proc-vmstat.numa_hit 6314894 -6.8% 5885708 proc-vmstat.numa_local 102508 +5.9% 108525 proc-vmstat.pgactivate 48068383 -7.3% 44558110 proc-vmstat.pgalloc_normal 47937851 -7.3% 44421205 proc-vmstat.pgfree 0.70 ± 14% +0.2 0.88 ± 14% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry 0.48 ± 47% +0.2 0.70 ± 14% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 2.76 ± 9% +0.5 3.30 ± 2% perf-profile.calltrace.cycles-pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish 0.39 ± 72% +0.6 0.95 ± 24% perf-profile.calltrace.cycles-pp.try_to_wake_up.__wake_up_common.__wake_up_common_lock.sock_def_readable.tcp_data_queue 3.32 ± 10% +0.7 4.00 perf-profile.calltrace.cycles-pp.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core 6.88 ± 7% +0.8 7.71 ± 2% perf-profile.calltrace.cycles-pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action.__do_softirq 7.18 ± 7% +0.8 8.02 ± 2% perf-profile.calltrace.cycles-pp.__napi_poll.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip 7.16 ± 7% +0.9 8.02 ± 2% perf-profile.calltrace.cycles-pp.process_backlog.__napi_poll.net_rx_action.__do_softirq.do_softirq 8.90 ± 6% +1.0 9.89 perf-profile.calltrace.cycles-pp.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit 9.37 ± 6% +1.0 10.40 perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb 9.33 ± 6% +1.0 10.37 perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit 9.26 ± 6% +1.0 10.30 perf-profile.calltrace.cycles-pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2 2.48 ± 17% +1.3 3.82 ± 2% perf-profile.calltrace.cycles-pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_sendmsg_locked 2.61 ± 17% +1.3 3.96 ± 2% perf-profile.calltrace.cycles-pp.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_sendmsg_locked.tcp_sendmsg 0.80 ± 15% -0.4 0.43 ± 10% perf-profile.children.cycles-pp.tcp_rcv_space_adjust 1.35 ± 5% -0.2 1.19 ± 6% perf-profile.children.cycles-pp.__entry_text_start 0.56 ± 15% -0.2 0.40 ± 11% perf-profile.children.cycles-pp.__x64_sys_connect 0.56 ± 15% -0.2 0.40 ± 11% perf-profile.children.cycles-pp.__sys_connect 0.55 ± 14% -0.2 0.40 ± 12% perf-profile.children.cycles-pp.inet_stream_connect 0.55 ± 15% -0.1 0.40 ± 12% perf-profile.children.cycles-pp.__inet_stream_connect 0.38 ± 11% -0.1 0.28 ± 21% perf-profile.children.cycles-pp.exit_to_user_mode_loop 0.44 ± 9% -0.1 0.33 ± 13% perf-profile.children.cycles-pp.__close 0.37 ± 12% -0.1 0.27 ± 20% perf-profile.children.cycles-pp.task_work_run 0.77 ± 5% -0.1 0.68 ± 8% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 0.34 ± 12% -0.1 0.26 ± 21% perf-profile.children.cycles-pp.__fput 0.31 ± 11% -0.1 0.23 ± 18% perf-profile.children.cycles-pp.tcp_v4_connect 0.22 ± 14% -0.1 0.16 ± 22% perf-profile.children.cycles-pp.__sock_release 0.22 ± 14% -0.1 0.16 ± 22% perf-profile.children.cycles-pp.sock_close 0.23 ± 19% -0.1 0.16 ± 14% perf-profile.children.cycles-pp.tcp_try_coalesce 0.09 ± 14% -0.0 0.05 ± 48% perf-profile.children.cycles-pp.new_inode_pseudo 0.07 ± 12% -0.0 0.04 ± 72% perf-profile.children.cycles-pp.__ns_get_path 0.17 ± 8% +0.0 0.22 ± 8% perf-profile.children.cycles-pp.ip_send_check 0.23 ± 7% +0.0 0.28 ± 7% perf-profile.children.cycles-pp.ip_local_out 0.09 ± 22% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.available_idle_cpu 0.22 ± 9% +0.1 0.26 ± 7% perf-profile.children.cycles-pp.__ip_local_out 0.46 ± 11% +0.1 0.56 ± 4% perf-profile.children.cycles-pp.ttwu_queue_wakelist 0.92 ± 3% +0.1 1.06 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 7.10 ± 2% +0.7 7.76 ± 3% perf-profile.children.cycles-pp.tcp_v4_rcv 7.21 ± 2% +0.7 7.90 ± 3% perf-profile.children.cycles-pp.ip_protocol_deliver_rcu 7.42 ± 2% +0.7 8.12 ± 3% perf-profile.children.cycles-pp.ip_local_deliver_finish 8.00 ± 2% +0.7 8.71 ± 2% perf-profile.children.cycles-pp.__netif_receive_skb_one_core 8.34 ± 2% +0.7 9.06 ± 2% perf-profile.children.cycles-pp.__napi_poll 8.32 ± 2% +0.7 9.05 ± 2% perf-profile.children.cycles-pp.process_backlog 11.71 ± 3% +0.9 12.63 ± 2% perf-profile.children.cycles-pp.__dev_queue_xmit 13.86 ± 2% +0.9 14.78 ± 2% perf-profile.children.cycles-pp.__tcp_transmit_skb 11.92 ± 2% +0.9 12.86 ± 2% perf-profile.children.cycles-pp.ip_finish_output2 10.05 ± 3% +0.9 10.99 ± 2% perf-profile.children.cycles-pp.net_rx_action 12.66 ± 2% +1.0 13.62 ± 2% perf-profile.children.cycles-pp.__ip_queue_xmit 10.56 ± 3% +1.0 11.53 ± 2% perf-profile.children.cycles-pp.do_softirq 10.82 ± 3% +1.0 11.80 ± 2% perf-profile.children.cycles-pp.__local_bh_enable_ip 10.94 ± 4% +1.0 11.94 ± 2% perf-profile.children.cycles-pp.__do_softirq 0.52 ± 21% -0.4 0.16 ± 16% perf-profile.self.cycles-pp.tcp_rcv_space_adjust 0.62 ± 7% -0.1 0.48 ± 10% perf-profile.self.cycles-pp.tcp_sendmsg 0.63 ± 5% -0.1 0.55 ± 7% perf-profile.self.cycles-pp.__entry_text_start 0.10 ± 15% +0.0 0.14 ± 13% perf-profile.self.cycles-pp.schedule_timeout 0.10 ± 20% +0.0 0.14 ± 16% perf-profile.self.cycles-pp.enqueue_entity 0.08 ± 22% +0.1 0.14 ± 11% perf-profile.self.cycles-pp.available_idle_cpu 0.37 ± 8% +0.1 0.44 ± 4% perf-profile.self.cycles-pp.net_rx_action 0.92 ± 3% +0.1 1.06 ± 5% perf-profile.self.cycles-pp._raw_spin_lock_irqsave Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki