Greeting, FYI, we noticed a 1097.4% improvement of stress-ng.klog.ops_per_sec due to commit: commit: 996e966640ddea7b535cbe7b749e87a3b24f37e8 ("printk: remove logbuf_lock") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: stress-ng on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory with following parameters: nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: klog cpufreq_governor: performance ucode: 0x5003006 Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml bin/lkp run compatible-job.yaml ========================================================================================= class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode: os/gcc-9/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp5/klog/stress-ng/60s/0x5003006 commit: f9f3f02db9 ("printk: introduce a kmsg_dump iterator") 996e966640 ("printk: remove logbuf_lock") f9f3f02db98bbe67 996e966640ddea7b535cbe7b749 ---------------- --------------------------- %stddev %change %stddev \ | \ 26314 +1097.3% 315061 stress-ng.klog.ops 438.54 +1097.4% 5250 stress-ng.klog.ops_per_sec 961.83 ± 2% -15.9% 809.00 stress-ng.time.involuntary_context_switches 8450 ± 39% +4572.7% 394845 ±162% cpuidle.POLL.usage 7225 ± 22% +27.2% 9187 ± 11% softirqs.CPU2.SCHED 1844 -3.7% 1775 vmstat.system.cs 176153 ± 70% -50.0% 88051 ±141% numa-vmstat.node0.nr_dirtied 176152 ± 70% -50.0% 88051 ±141% numa-vmstat.node0.nr_written 88056 ±141% +100.1% 176157 ± 70% numa-vmstat.node1.nr_dirtied 88055 ±141% +100.1% 176156 ± 70% numa-vmstat.node1.nr_written 38516 +39.4% 53679 ± 27% sched_debug.cpu.clock.avg 38521 +39.4% 53683 ± 27% sched_debug.cpu.clock.max 38510 +39.4% 53674 ± 27% sched_debug.cpu.clock.min 38388 +39.2% 53441 ± 27% sched_debug.cpu.clock_task.avg 38507 +39.1% 53581 ± 27% sched_debug.cpu.clock_task.max 38513 +39.4% 53676 ± 27% sched_debug.cpu_clk 38019 +39.9% 53181 ± 28% sched_debug.ktime 38879 +39.0% 54033 ± 27% sched_debug.sched_clk 4.47 ± 7% -92.3% 0.35 ± 37% perf-stat.i.MPKI 1.824e+09 +709.2% 1.476e+10 perf-stat.i.branch-instructions 0.49 ± 7% -0.2 0.28 ± 4% perf-stat.i.branch-miss-rate% 8126698 ± 3% +291.8% 31840686 perf-stat.i.branch-misses 30.49 ± 10% -14.0 16.46 ± 3% perf-stat.i.cache-miss-rate% 9809108 ± 8% -83.9% 1581490 ± 3% perf-stat.i.cache-misses 32576128 ± 4% -70.0% 9756759 ± 6% perf-stat.i.cache-references 1700 -4.5% 1623 perf-stat.i.context-switches 3.61 -86.3% 0.49 perf-stat.i.cpi 106.88 -1.6% 105.21 perf-stat.i.cpu-migrations 2857 ± 7% +557.6% 18789 ± 4% perf-stat.i.cycles-between-cache-misses 1.543e+09 +608.5% 1.093e+10 ± 2% perf-stat.i.dTLB-loads 7.862e+08 +893.9% 7.813e+09 perf-stat.i.dTLB-stores 46.51 +3.6 50.13 ± 5% perf-stat.i.iTLB-load-miss-rate% 1764355 +14.1% 2013317 perf-stat.i.iTLB-load-misses 7.571e+09 +699.3% 6.051e+10 perf-stat.i.instructions 4308 +598.5% 30094 perf-stat.i.instructions-per-iTLB-miss 0.29 +636.7% 2.13 perf-stat.i.ipc 12.32 -8.6% 11.26 perf-stat.i.major-faults 0.63 ± 11% +72.3% 1.08 ± 12% perf-stat.i.metric.K/sec 43.60 +700.6% 349.10 perf-stat.i.metric.M/sec 3214 -2.0% 3149 perf-stat.i.minor-faults 97.27 -10.4 86.91 perf-stat.i.node-load-miss-rate% 2442015 ± 9% -89.9% 246595 ± 17% perf-stat.i.node-load-misses 96.91 -11.0 85.92 ± 8% perf-stat.i.node-store-miss-rate% 1985853 ± 9% -97.0% 59228 ± 6% perf-stat.i.node-store-misses 3226 -2.0% 3160 perf-stat.i.page-faults 4.29 ± 3% -96.2% 0.16 ± 6% perf-stat.overall.MPKI 0.44 ± 3% -0.2 0.22 perf-stat.overall.branch-miss-rate% 30.37 ± 11% -14.1 16.27 ± 6% perf-stat.overall.cache-miss-rate% 3.64 -87.4% 0.46 perf-stat.overall.cpi 2824 ± 7% +522.1% 17572 ± 3% perf-stat.overall.cycles-between-cache-misses 0.00 ± 81% -0.0 0.00 ± 74% perf-stat.overall.dTLB-load-miss-rate% 0.00 ± 31% -0.0 0.00 ± 30% perf-stat.overall.dTLB-store-miss-rate% 46.46 +3.7 50.11 ± 5% perf-stat.overall.iTLB-load-miss-rate% 4306 +598.3% 30070 perf-stat.overall.instructions-per-iTLB-miss 0.27 +694.3% 2.18 perf-stat.overall.ipc 98.28 -13.2 85.05 perf-stat.overall.node-load-miss-rate% 99.06 -21.7 77.41 ± 6% perf-stat.overall.node-store-miss-rate% 1.795e+09 +709.1% 1.452e+10 perf-stat.ps.branch-instructions 7911307 ± 3% +296.0% 31331201 perf-stat.ps.branch-misses 9665170 ± 8% -83.9% 1555589 ± 3% perf-stat.ps.cache-misses 31965660 ± 4% -70.0% 9598649 ± 6% perf-stat.ps.cache-references 1664 -4.1% 1597 perf-stat.ps.context-switches 104.80 -1.3% 103.49 perf-stat.ps.cpu-migrations 1.518e+09 +608.7% 1.076e+10 ± 2% perf-stat.ps.dTLB-loads 7.733e+08 +894.4% 7.689e+09 perf-stat.ps.dTLB-stores 1730085 +14.5% 1980925 perf-stat.ps.iTLB-load-misses 7.449e+09 +699.5% 5.955e+10 perf-stat.ps.instructions 2408929 ± 9% -89.9% 242555 ± 17% perf-stat.ps.node-load-misses 1959947 ± 9% -97.0% 58243 ± 6% perf-stat.ps.node-store-misses 4.701e+11 +699.2% 3.757e+12 perf-stat.total.instructions 58.25 ± 7% -58.3 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64 57.16 ± 7% -57.2 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.syslog_print_all.do_syslog.__x64_sys_syslog 0.80 ± 8% +0.2 1.00 ± 4% perf-profile.calltrace.cycles-pp.desc_read.desc_read_finalized_seq._prb_read_valid.prb_read_valid.syslog_print_all 0.00 +0.6 0.57 ± 4% perf-profile.calltrace.cycles-pp.put_dec_trunc8.number.vsnprintf.sprintf.info_print_prefix 0.00 +0.6 0.58 ± 3% perf-profile.calltrace.cycles-pp.memmove.record_print_text.syslog_print_all.do_syslog.__x64_sys_syslog 0.00 +0.6 0.61 ± 7% perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled._copy_to_user.syslog_print_all.do_syslog.__x64_sys_syslog 0.88 ± 8% +0.7 1.61 ± 3% perf-profile.calltrace.cycles-pp.desc_read_finalized_seq._prb_read_valid.prb_read_valid.syslog_print_all.do_syslog 0.00 +0.7 0.74 ± 4% perf-profile.calltrace.cycles-pp.memcpy_erms._prb_read_valid.prb_read_valid.syslog_print_all.do_syslog 0.00 +0.8 0.84 ± 5% perf-profile.calltrace.cycles-pp.memcpy_erms.record_print_text.syslog_print_all.do_syslog.__x64_sys_syslog 0.00 +1.3 1.28 ± 4% perf-profile.calltrace.cycles-pp.desc_read.desc_read_finalized_seq._prb_read_valid.prb_read_valid_info.find_first_fitting_seq 0.00 +1.4 1.40 ± 4% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_to_user.syslog_print_all.do_syslog.__x64_sys_syslog 0.00 +1.4 1.44 ± 4% perf-profile.calltrace.cycles-pp.__check_heap_object.__check_object_size.syslog_print_all.do_syslog.__x64_sys_syslog 0.00 +2.0 2.02 ± 4% perf-profile.calltrace.cycles-pp.memchr.record_print_text.syslog_print_all.do_syslog.__x64_sys_syslog 0.00 +2.0 2.03 ± 2% perf-profile.calltrace.cycles-pp.desc_read_finalized_seq._prb_read_valid.prb_read_valid_info.find_first_fitting_seq.syslog_print_all 0.00 +2.2 2.20 ± 4% perf-profile.calltrace.cycles-pp.memchr._prb_read_valid.prb_read_valid_info.find_first_fitting_seq.syslog_print_all 0.00 +2.9 2.93 ± 2% perf-profile.calltrace.cycles-pp.__check_object_size.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64 0.00 +2.9 2.93 ± 3% perf-profile.calltrace.cycles-pp._copy_to_user.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64 1.12 ± 8% +3.3 4.44 ± 4% perf-profile.calltrace.cycles-pp._prb_read_valid.prb_read_valid.syslog_print_all.do_syslog.__x64_sys_syslog 1.13 ± 8% +3.5 4.63 ± 4% perf-profile.calltrace.cycles-pp.prb_read_valid.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64 0.00 +3.6 3.63 ± 3% perf-profile.calltrace.cycles-pp.memcpy_erms.vsnprintf.sprintf.info_print_prefix.record_print_text 0.00 +3.9 3.86 ± 3% perf-profile.calltrace.cycles-pp.number.vsnprintf.sprintf.info_print_prefix.record_print_text 0.00 +4.2 4.24 ± 2% perf-profile.calltrace.cycles-pp.memcpy_erms.vsnprintf.sprintf.info_print_prefix.find_first_fitting_seq 0.00 +4.6 4.56 ± 4% perf-profile.calltrace.cycles-pp.number.vsnprintf.sprintf.info_print_prefix.find_first_fitting_seq 0.00 +5.1 5.09 ± 3% perf-profile.calltrace.cycles-pp.format_decode.vsnprintf.sprintf.info_print_prefix.record_print_text 0.00 +6.1 6.08 ± 6% perf-profile.calltrace.cycles-pp.format_decode.vsnprintf.sprintf.info_print_prefix.find_first_fitting_seq 0.47 ± 44% +6.4 6.84 ± 4% perf-profile.calltrace.cycles-pp._prb_read_valid.prb_read_valid_info.find_first_fitting_seq.syslog_print_all.do_syslog 0.49 ± 44% +6.6 7.12 ± 3% perf-profile.calltrace.cycles-pp.prb_read_valid_info.find_first_fitting_seq.syslog_print_all.do_syslog.__x64_sys_syslog 1.41 ± 6% +15.7 17.08 ± 3% perf-profile.calltrace.cycles-pp.vsnprintf.sprintf.info_print_prefix.record_print_text.syslog_print_all 1.50 ± 7% +16.8 18.34 ± 3% perf-profile.calltrace.cycles-pp.sprintf.info_print_prefix.record_print_text.syslog_print_all.do_syslog 1.57 ± 7% +17.7 19.25 ± 3% perf-profile.calltrace.cycles-pp.info_print_prefix.record_print_text.syslog_print_all.do_syslog.__x64_sys_syslog 1.61 ± 6% +18.4 20.00 ± 3% perf-profile.calltrace.cycles-pp.vsnprintf.sprintf.info_print_prefix.find_first_fitting_seq.syslog_print_all 1.70 ± 6% +19.8 21.54 ± 3% perf-profile.calltrace.cycles-pp.sprintf.info_print_prefix.find_first_fitting_seq.syslog_print_all.do_syslog 1.78 ± 6% +20.8 22.59 ± 3% perf-profile.calltrace.cycles-pp.info_print_prefix.find_first_fitting_seq.syslog_print_all.do_syslog.__x64_sys_syslog 2.08 ± 7% +22.4 24.43 ± 3% perf-profile.calltrace.cycles-pp.record_print_text.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64 2.39 ± 7% +27.8 30.14 ± 3% perf-profile.calltrace.cycles-pp.find_first_fitting_seq.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64 58.35 ± 7% -58.3 0.04 ± 71% perf-profile.children.cycles-pp._raw_spin_lock 57.20 ± 7% -57.2 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 0.00 +0.1 0.11 ± 7% perf-profile.children.cycles-pp.check_stack_object 0.00 +0.1 0.12 ± 3% perf-profile.children.cycles-pp.put_dec 0.00 +0.1 0.14 ± 5% perf-profile.children.cycles-pp.___might_sleep 0.00 +0.1 0.15 ± 5% perf-profile.children.cycles-pp.__printk_safe_enter 0.00 +0.2 0.18 ± 4% perf-profile.children.cycles-pp.__might_sleep 0.01 ±223% +0.4 0.41 ± 5% perf-profile.children.cycles-pp.__virt_addr_valid 0.01 ±223% +0.4 0.43 ± 4% perf-profile.children.cycles-pp.__might_fault 0.01 ±223% +0.5 0.46 ± 7% perf-profile.children.cycles-pp.memcpy 0.06 ± 9% +0.6 0.62 ± 3% perf-profile.children.cycles-pp.memmove 0.07 ± 14% +0.7 0.72 ± 7% perf-profile.children.cycles-pp.copy_user_generic_unrolled 0.07 ± 19% +0.7 0.79 ± 5% perf-profile.children.cycles-pp.get_data 0.09 ± 11% +0.9 1.01 ± 3% perf-profile.children.cycles-pp.put_dec_trunc8 0.09 ± 16% +1.1 1.19 ± 3% perf-profile.children.cycles-pp.skip_atoi 0.12 ± 16% +1.4 1.48 ± 4% perf-profile.children.cycles-pp.__check_heap_object 0.14 ± 7% +1.4 1.54 ± 4% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string 0.90 ± 8% +1.4 2.31 ± 4% perf-profile.children.cycles-pp.desc_read 0.26 ± 6% +2.7 2.97 ± 3% perf-profile.children.cycles-pp._copy_to_user 1.04 ± 8% +2.8 3.81 ± 3% perf-profile.children.cycles-pp.desc_read_finalized_seq 0.24 ± 10% +2.8 3.06 ± 2% perf-profile.children.cycles-pp.__check_object_size 1.14 ± 8% +3.5 4.68 ± 4% perf-profile.children.cycles-pp.prb_read_valid 0.40 ± 7% +4.3 4.67 ± 4% perf-profile.children.cycles-pp.memchr 0.57 ± 8% +6.6 7.17 ± 3% perf-profile.children.cycles-pp.prb_read_valid_info 0.70 ± 9% +7.9 8.56 ± 4% perf-profile.children.cycles-pp.number 0.83 ± 8% +9.3 10.13 ± 2% perf-profile.children.cycles-pp.memcpy_erms 1.69 ± 7% +9.8 11.47 ± 3% perf-profile.children.cycles-pp._prb_read_valid 0.89 ± 4% +10.3 11.20 ± 3% perf-profile.children.cycles-pp.format_decode 2.09 ± 7% +22.4 24.50 ± 3% perf-profile.children.cycles-pp.record_print_text 2.39 ± 7% +27.8 30.18 ± 3% perf-profile.children.cycles-pp.find_first_fitting_seq 3.08 ± 6% +35.2 38.28 ± 3% perf-profile.children.cycles-pp.vsnprintf 3.21 ± 6% +36.9 40.07 ± 3% perf-profile.children.cycles-pp.sprintf 3.37 ± 6% +38.6 41.98 ± 3% perf-profile.children.cycles-pp.info_print_prefix 57.20 ± 7% -57.2 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 1.15 ± 10% -1.1 0.04 ± 71% perf-profile.self.cycles-pp._raw_spin_lock 0.00 +0.1 0.08 ± 11% perf-profile.self.cycles-pp.check_stack_object 0.00 +0.1 0.10 ± 10% perf-profile.self.cycles-pp.__might_fault 0.00 +0.1 0.11 ± 5% perf-profile.self.cycles-pp.__printk_safe_enter 0.00 +0.1 0.12 ± 3% perf-profile.self.cycles-pp.put_dec 0.00 +0.1 0.14 ± 5% perf-profile.self.cycles-pp.___might_sleep 0.00 +0.2 0.15 ± 7% perf-profile.self.cycles-pp.prb_read_valid 0.00 +0.2 0.18 ± 4% perf-profile.self.cycles-pp.__might_sleep 0.00 +0.3 0.26 ± 5% perf-profile.self.cycles-pp.prb_read_valid_info 0.00 +0.3 0.31 ± 4% perf-profile.self.cycles-pp._copy_to_user 0.00 +0.4 0.38 ± 3% perf-profile.self.cycles-pp.__virt_addr_valid 0.00 +0.4 0.38 ± 7% perf-profile.self.cycles-pp.find_first_fitting_seq 0.01 ±223% +0.4 0.42 ± 8% perf-profile.self.cycles-pp.memcpy 0.14 ± 14% +0.5 0.61 ± 5% perf-profile.self.cycles-pp.syslog_print_all 0.06 ± 9% +0.6 0.61 ± 3% perf-profile.self.cycles-pp.memmove 0.06 ± 14% +0.6 0.70 ± 6% perf-profile.self.cycles-pp.copy_user_generic_unrolled 0.06 ± 21% +0.7 0.75 ± 5% perf-profile.self.cycles-pp.get_data 0.09 ± 11% +0.9 1.01 ± 3% perf-profile.self.cycles-pp.put_dec_trunc8 0.08 ± 18% +1.0 1.10 ± 3% perf-profile.self.cycles-pp.skip_atoi 0.09 ± 12% +1.0 1.13 ± 4% perf-profile.self.cycles-pp.__check_object_size 0.90 ± 8% +1.3 2.17 ± 4% perf-profile.self.cycles-pp.desc_read 0.16 ± 9% +1.3 1.44 ± 5% perf-profile.self.cycles-pp.record_print_text 0.11 ± 15% +1.3 1.44 ± 4% perf-profile.self.cycles-pp.__check_heap_object 0.13 ± 5% +1.4 1.50 ± 4% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string 0.14 ± 11% +1.4 1.56 ± 4% perf-profile.self.cycles-pp.desc_read_finalized_seq 0.14 ± 6% +1.6 1.71 ± 4% perf-profile.self.cycles-pp.sprintf 0.16 ± 13% +1.8 1.98 ± 5% perf-profile.self.cycles-pp.info_print_prefix 0.30 ± 7% +3.4 3.70 ± 4% perf-profile.self.cycles-pp._prb_read_valid 0.40 ± 8% +4.2 4.64 ± 4% perf-profile.self.cycles-pp.memchr 0.59 ± 8% +6.6 7.23 ± 4% perf-profile.self.cycles-pp.number 0.78 ± 3% +8.5 9.26 ± 4% perf-profile.self.cycles-pp.format_decode 0.79 ± 8% +8.9 9.71 ± 2% perf-profile.self.cycles-pp.memcpy_erms 0.84 ± 6% +10.3 11.11 ± 4% perf-profile.self.cycles-pp.vsnprintf stress-ng.klog.ops 350000 +------------------------------------------------------------------+ |O OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO | 300000 |-+ | | | 250000 |-+ | | | 200000 |-+ | | | 150000 |-+ | | | 100000 |-+ | | | 50000 |-+ | |+.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.+| 0 +------------------------------------------------------------------+ stress-ng.klog.ops_per_sec 5500 +--------------------------------------------------------------------+ 5000 |-OO OO OO OO OO OO OO O OO OO OO OO OO OO OO O O | | | 4500 |-+ | 4000 |-+ | 3500 |-+ | 3000 |-+ | | | 2500 |-+ | 2000 |-+ | 1500 |-+ | 1000 |-+ | | | 500 |.++.++.++.++.++.++.++.+.++.++.++.++.++.++.++.+.++.++.++.++.++.++.++.| 0 +--------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. --- 0DAY/LKP+ Test Infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation Thanks, Oliver Sang