please be noted we didn't figure out the profiling data connecting to this improvment, but in our tests, the data is very stable. for this commit: "will-it-scale.per_process_ops": [ 75636, 75623, 75642, 75613, 75592, 75628, 74016, 75637, 75622, 75617, 75605, 74013, 75629, 75618, 75595, 75614, 75611, 75619, 75629, 75619 ], for parent: "will-it-scale.per_process_ops": [ 72665, 72628, 72665, 72656, 72668, 72642, 72660, 72642, 72648, 72661, 72650, 72648, 72651, 72639, 72630, 72641, 72655, 72650, 72624, 72649 ], and thanks a lot Fengwei (Cced) helped review and made below comments: "This patch could bring performance improvement. It use RCU lock to replace mutex for some consoles. I expect some lock contention reduction. Didn't find from perf calltrace profiling. Could see some in perf self profiling. But without calltrace, we don't know who is the owner of the lock." so we made below report FYI. Greeting, FYI, we noticed a 3.9% improvement of will-it-scale.per_process_ops due to commit: commit: 8bdbdd7f43cd74c7faca6add8a62d541503ae21d ("printk: Prepare for SRCU console list protection") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: will-it-scale on test machine: 128 threads 4 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory with following parameters: nr_task: 50% mode: process test: open1 cpufreq_governor: performance test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp2/open1/will-it-scale commit: 318eb6d938 ("printk: Convert console_drivers list to hlist") 8bdbdd7f43 ("printk: Prepare for SRCU console list protection") 318eb6d938484a5a 8bdbdd7f43cd74c7faca6add8a6 ---------------- --------------------------- %stddev %change %stddev \ | \ 4649546 +3.9% 4829404 will-it-scale.64.processes 72648 +3.9% 75458 will-it-scale.per_process_ops 4649546 +3.9% 4829404 will-it-scale.workload 346119 +12.7% 390049 meminfo.SUnreclaim 3480 -22.8% 2688 ± 2% vmstat.system.cs 86566 ± 7% +31.1% 113460 ± 5% numa-meminfo.node0.SUnreclaim 65513 ± 13% +40.6% 92118 ± 6% numa-meminfo.node1.SUnreclaim 86566 +12.7% 97547 proc-vmstat.nr_slab_unreclaimable 20789147 +6.0% 22031599 proc-vmstat.numa_hit 20615296 +6.0% 21857885 proc-vmstat.numa_local 80423862 +6.2% 85382185 proc-vmstat.pgalloc_normal 80433923 +6.2% 85390274 proc-vmstat.pgfree 8319772 ± 6% -20.4% 6621861 ± 9% sched_debug.cfs_rq:/.min_vruntime.max 1246763 ± 12% -46.0% 673096 ± 12% sched_debug.cfs_rq:/.min_vruntime.stddev -3067572 -58.9% -1261852 sched_debug.cfs_rq:/.spread0.min 1246909 ± 12% -46.0% 673159 ± 12% sched_debug.cfs_rq:/.spread0.stddev 6632 ± 3% -15.0% 5638 ± 5% sched_debug.cpu.nr_switches.avg 2796 ± 9% -22.2% 2176 ± 9% sched_debug.cpu.nr_switches.min 1594604 +206.3% 4884774 numa-numastat.node0.local_node 1644243 +199.8% 4929305 numa-numastat.node0.numa_hit 1592483 +206.3% 4878540 numa-numastat.node1.local_node 1638420 +200.1% 4916699 numa-numastat.node1.numa_hit 8690370 -30.6% 6028530 numa-numastat.node2.local_node 8729078 -30.4% 6074883 numa-numastat.node2.numa_hit 8734995 -30.6% 6063318 numa-numastat.node3.local_node 8774563 -30.4% 6107989 numa-numastat.node3.numa_hit 21691 ± 7% +32.4% 28715 ± 5% numa-vmstat.node0.nr_slab_unreclaimable 1644113 +199.8% 4929234 numa-vmstat.node0.numa_hit 1594474 +206.4% 4884703 numa-vmstat.node0.numa_local 16427 ± 12% +42.0% 23321 ± 7% numa-vmstat.node1.nr_slab_unreclaimable 1638283 +200.1% 4916648 numa-vmstat.node1.numa_hit 1592346 +206.4% 4878490 numa-vmstat.node1.numa_local 8728930 -30.4% 6074798 numa-vmstat.node2.numa_hit 8690222 -30.6% 6028445 numa-vmstat.node2.numa_local 8774491 -30.4% 6108017 numa-vmstat.node3.numa_hit 8734922 -30.6% 6063345 numa-vmstat.node3.numa_local 43.31 -0.9 42.42 perf-profile.calltrace.cycles-pp.security_file_open.do_dentry_open.do_open.path_openat.do_filp_open 43.27 -0.9 42.38 perf-profile.calltrace.cycles-pp.apparmor_file_open.security_file_open.do_dentry_open.do_open.path_openat 43.85 -0.9 42.97 perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2 44.30 -0.9 43.41 perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat 0.63 ± 6% +0.2 0.80 ± 28% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.apparmor_file_open.security_file_open.do_dentry_open.do_open 0.59 ± 6% +0.2 0.77 ± 29% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.apparmor_file_open.security_file_open.do_dentry_open 43.33 -0.9 42.43 perf-profile.children.cycles-pp.security_file_open 43.31 -0.9 42.41 perf-profile.children.cycles-pp.apparmor_file_open 43.89 -0.9 43.00 perf-profile.children.cycles-pp.do_dentry_open 44.32 -0.9 43.43 perf-profile.children.cycles-pp.do_open 0.24 ± 9% -0.1 0.16 ± 11% perf-profile.children.cycles-pp.menu_select 0.19 ± 10% -0.1 0.12 ± 15% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length 0.14 ± 9% -0.1 0.08 ± 11% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 0.08 ± 10% -0.0 0.03 ± 82% perf-profile.children.cycles-pp.get_next_timer_interrupt 0.11 ± 13% -0.0 0.08 ± 18% perf-profile.children.cycles-pp.tick_nohz_next_event 0.12 ± 7% +0.0 0.15 ± 4% perf-profile.children.cycles-pp.shuffle_freelist 0.14 ± 7% +0.0 0.18 ± 2% perf-profile.children.cycles-pp.allocate_slab 0.20 ± 5% +0.0 0.25 ± 3% perf-profile.children.cycles-pp.___slab_alloc 42.69 -1.1 41.62 ± 2% perf-profile.self.cycles-pp.apparmor_file_open 0.66 ± 11% -0.5 0.16 ± 10% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt 0.14 ± 9% -0.1 0.08 ± 13% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.10 ± 7% +0.0 0.13 ± 3% perf-profile.self.cycles-pp.shuffle_freelist 2.99 +3.5% 3.09 perf-stat.i.MPKI 5.498e+09 +3.8% 5.706e+09 perf-stat.i.branch-instructions 34.60 -1.7 32.95 perf-stat.i.cache-miss-rate% 28569086 +2.3% 29217344 perf-stat.i.cache-misses 82561656 +7.4% 88636951 perf-stat.i.cache-references 3383 -23.5% 2587 ± 2% perf-stat.i.context-switches 224.62 ± 3% -7.1% 208.68 ± 3% perf-stat.i.cpu-migrations 8.243e+09 +3.7% 8.545e+09 perf-stat.i.dTLB-loads 4.682e+09 +3.6% 4.849e+09 perf-stat.i.dTLB-stores 2.764e+10 +3.8% 2.868e+10 perf-stat.i.instructions 785.42 +5.8% 831.18 perf-stat.i.metric.K/sec 143.88 +3.7% 149.17 perf-stat.i.metric.M/sec 6893040 +4.8% 7226385 perf-stat.i.node-load-misses 206202 ± 2% +10.1% 227052 perf-stat.i.node-loads 76.95 -2.8 74.10 perf-stat.i.node-store-miss-rate% 8358863 -7.7% 7714585 perf-stat.i.node-store-misses 2506806 ± 2% +7.6% 2698200 ± 2% perf-stat.i.node-stores 2.99 +3.5% 3.09 perf-stat.overall.MPKI 34.61 -1.6 32.99 perf-stat.overall.cache-miss-rate% 76.93 -2.8 74.09 perf-stat.overall.node-store-miss-rate% 5.479e+09 +3.8% 5.687e+09 perf-stat.ps.branch-instructions 28481165 +2.3% 29144556 perf-stat.ps.cache-misses 82289248 +7.4% 88361058 perf-stat.ps.cache-references 3371 -23.6% 2577 ± 2% perf-stat.ps.context-switches 223.97 ± 3% -7.0% 208.27 ± 3% perf-stat.ps.cpu-migrations 8.214e+09 +3.7% 8.517e+09 perf-stat.ps.dTLB-loads 4.665e+09 +3.6% 4.833e+09 perf-stat.ps.dTLB-stores 2.754e+10 +3.8% 2.858e+10 perf-stat.ps.instructions 6871767 +4.9% 7207616 perf-stat.ps.node-load-misses 205592 ± 2% +10.1% 226439 perf-stat.ps.node-loads 8330952 -7.7% 7692044 perf-stat.ps.node-store-misses 2498719 ± 2% +7.7% 2690666 ± 2% perf-stat.ps.node-stores 8.33e+12 +3.9% 8.654e+12 perf-stat.total.instructions Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://01.org/lkp