* [linux-next:master] [cpuidle] 38f83090f5: fsmark.app_overhead 51.9% regression
@ 2024-10-07 14:43 kernel test robot
2024-10-07 15:18 ` Rafael J. Wysocki
2024-10-07 19:07 ` Christian Loehle
0 siblings, 2 replies; 4+ messages in thread
From: kernel test robot @ 2024-10-07 14:43 UTC (permalink / raw)
To: Christian Loehle
Cc: oe-lkp, lkp, Linux Memory Management List, Rafael J. Wysocki,
linux-pm, ying.huang, feng.tang, fengwei.yin, oliver.sang
Hello,
kernel test robot noticed a 51.9% regression of fsmark.app_overhead on:
(
but there is no performance difference for fsmark.files_per_sec
18.58 -0.2% 18.55 fsmark.files_per_sec
)
commit: 38f83090f515b4b5d59382dfada1e7457f19aa47 ("cpuidle: menu: Remove iowait influence")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: fsmark
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:
iterations: 1x
nr_threads: 1t
disk: 1HDD
fs: btrfs
fs2: nfsv4
filesize: 4K
test_size: 40M
sync_method: fsyncBeforeClose
nr_files_per_directory: 1fpd
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202410072214.11d18a3c-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241007/202410072214.11d18a3c-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
commit:
v6.12-rc1
38f83090f5 ("cpuidle: menu: Remove iowait influence")
v6.12-rc1 38f83090f515b4b5d59382dfada
---------------- ---------------------------
%stddev %change %stddev
\ | \
2032015 ± 3% +51.9% 3087623 fsmark.app_overhead
18.58 -0.2% 18.55 fsmark.files_per_sec
2944 -2.9% 2858 vmstat.system.cs
0.02 +0.0 0.02 mpstat.cpu.all.irq%
0.01 ± 2% +0.0 0.01 mpstat.cpu.all.soft%
0.04 ± 2% +0.0 0.05 ± 3% mpstat.cpu.all.sys%
4.07 ± 18% -53.4% 1.90 ± 53% sched_debug.cfs_rq:/.removed.runnable_avg.avg
267.72 ± 38% -62.7% 99.92 ± 75% sched_debug.cfs_rq:/.removed.runnable_avg.max
30.08 ± 29% -58.5% 12.50 ± 63% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
4.07 ± 18% -53.5% 1.89 ± 53% sched_debug.cfs_rq:/.removed.util_avg.avg
267.67 ± 38% -62.7% 99.92 ± 75% sched_debug.cfs_rq:/.removed.util_avg.max
30.08 ± 29% -58.5% 12.49 ± 63% sched_debug.cfs_rq:/.removed.util_avg.stddev
20.43 ± 17% -25.5% 15.21 ± 16% sched_debug.cfs_rq:/.util_est.stddev
7.85 ± 14% +21.6% 9.55 ± 12% sched_debug.cpu.clock.stddev
0.00 ± 25% -47.7% 0.00 ± 44% sched_debug.cpu.next_balance.stddev
0.02 ± 10% -18.9% 0.02 ± 11% sched_debug.cpu.nr_running.avg
0.14 ± 5% -14.5% 0.12 ± 4% sched_debug.cpu.nr_running.stddev
5.19 +0.6 5.79 perf-stat.i.branch-miss-rate%
4096977 ± 4% +8.4% 4442600 ± 2% perf-stat.i.branch-misses
1.79 ± 7% -0.2 1.59 ± 3% perf-stat.i.cache-miss-rate%
11620307 +22.2% 14202690 perf-stat.i.cache-references
2925 -3.2% 2830 perf-stat.i.context-switches
1.68 +38.6% 2.32 perf-stat.i.cpi
4.457e+08 ± 3% +23.8% 5.518e+08 ± 2% perf-stat.i.cpu-cycles
1630 ± 8% +28.6% 2096 ± 4% perf-stat.i.cycles-between-cache-misses
0.63 -25.5% 0.47 perf-stat.i.ipc
5.26 +0.2 5.48 perf-stat.overall.branch-miss-rate%
1.16 +18.4% 1.38 perf-stat.overall.cpi
0.86 -15.6% 0.73 perf-stat.overall.ipc
4103944 ± 4% +7.9% 4429579 perf-stat.ps.branch-misses
11617199 +22.1% 14186503 perf-stat.ps.cache-references
2919 -3.2% 2825 perf-stat.ps.context-switches
4.492e+08 ± 3% +23.2% 5.534e+08 ± 2% perf-stat.ps.cpu-cycles
0.12 ± 4% -23.1% 0.09 ± 12% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.__synchronize_srcu.fsnotify_connector_destroy_workfn.process_one_work
0.00 ± 8% +52.0% 0.01 ± 17% perf-sched.sch_delay.avg.ms.btrfs_start_ordered_extent.btrfs_wait_ordered_range.btrfs_sync_file.btrfs_do_write_iter
0.01 ± 13% +178.9% 0.03 ± 11% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
0.01 ± 13% +47.2% 0.01 ± 14% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.00 ± 14% +257.1% 0.01 ± 20% perf-sched.sch_delay.avg.ms.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.nfs4_run_open_task
0.06 ± 2% +11.8% 0.06 ± 3% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.12 ± 7% -14.5% 0.10 ± 4% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.__synchronize_srcu.fsnotify_connector_destroy_workfn
0.09 ± 8% +13.9% 0.10 ± 9% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
0.03 ± 2% +27.5% 0.03 ± 14% perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
0.14 ± 6% +20.5% 0.17 ± 11% perf-sched.sch_delay.max.ms.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.__rpc_execute
0.01 ± 2% +867.1% 0.14 ± 21% perf-sched.sch_delay.max.ms.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.nfs4_run_open_task
0.27 ± 17% +652.4% 2.03 ±183% perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.11 ± 7% +67.0% 0.19 ± 12% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
33.20 ± 3% +29.7% 43.08 ± 4% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.84 ± 4% -16.6% 0.70 ± 10% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.__synchronize_srcu.fsnotify_mark_destroy_workfn
2121 ± 3% -23.3% 1626 ± 5% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
2.98 ±143% +5665.5% 171.90 ±215% perf-sched.wait_and_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.56 ± 14% +38.0% 0.77 ± 7% perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
0.35 ± 9% +50.9% 0.53 ± 8% perf-sched.wait_and_delay.max.ms.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.nfs4_run_open_task
3768 ± 16% +17.0% 4411 ± 5% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.16 ± 4% +20.4% 0.20 ± 4% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.__synchronize_srcu.fsnotify_connector_destroy_workfn.process_one_work
0.89 ± 14% -35.2% 0.58 ± 17% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.__synchronize_srcu.fsnotify_mark_destroy_workfn.process_one_work
0.24 ± 20% +728.0% 1.99 ±186% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.10 ± 9% +57.0% 0.16 ± 12% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
33.20 ± 3% +29.7% 43.07 ± 4% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.79 ± 5% -19.7% 0.63 ± 12% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.__synchronize_srcu.fsnotify_mark_destroy_workfn
0.27 ± 8% +13.3% 0.30 ± 5% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.__synchronize_srcu.fsnotify_connector_destroy_workfn.process_one_work
0.50 ± 17% +33.3% 0.67 ± 8% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
0.34 ± 9% +23.1% 0.42 ± 8% perf-sched.wait_time.max.ms.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.nfs4_run_open_task
5.60 ± 7% -1.3 4.32 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
5.57 ± 7% -1.3 4.30 ± 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.33 ± 7% -0.5 0.86 ± 15% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
1.32 ± 7% -0.5 0.86 ± 15% perf-profile.calltrace.cycles-pp.mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
1.30 ± 8% -0.5 0.85 ± 15% perf-profile.calltrace.cycles-pp.exit_mmap.mmput.exit_mm.do_exit.do_group_exit
1.88 ± 12% -0.4 1.48 ± 8% perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.78 ± 13% +0.2 0.95 ± 12% perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread
0.83 ± 15% +0.2 1.06 ± 11% perf-profile.calltrace.cycles-pp.__vfork
1.11 ± 15% +0.2 1.34 ± 10% perf-profile.calltrace.cycles-pp.update_sg_lb_stats.update_sd_lb_stats.sched_balance_find_src_group.sched_balance_rq.sched_balance_newidle
0.39 ± 71% +0.3 0.72 ± 9% perf-profile.calltrace.cycles-pp.start_ordered_ops.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write
0.54 ± 46% +0.4 0.91 ± 16% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
0.92 ± 20% +0.4 1.32 ± 11% perf-profile.calltrace.cycles-pp.irq_enter_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.18 ±141% +0.4 0.61 ± 10% perf-profile.calltrace.cycles-pp.readn.evsel__read_counter.read_counters.process_interval.dispatch_events
0.84 ± 20% +0.4 1.28 ± 10% perf-profile.calltrace.cycles-pp.tick_irq_enter.irq_enter_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
0.09 ±223% +0.5 0.60 ± 10% perf-profile.calltrace.cycles-pp.read.readn.evsel__read_counter.read_counters.process_interval
0.00 +0.6 0.58 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read.readn.evsel__read_counter
0.00 +0.6 0.58 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read.readn.evsel__read_counter.read_counters
3.34 ± 8% +0.8 4.16 ± 9% perf-profile.calltrace.cycles-pp.update_process_times.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
17.29 ± 2% +0.9 18.18 ± 3% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
17.29 ± 2% +0.9 18.18 ± 3% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
17.29 ± 2% +0.9 18.18 ± 3% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
4.19 ± 5% +1.3 5.51 ± 4% perf-profile.calltrace.cycles-pp.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
5.20 ± 8% +1.5 6.67 ± 3% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
6.40 ± 9% +1.6 7.96 ± 4% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
6.53 ± 10% +1.7 8.21 ± 3% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
11.90 ± 7% +2.4 14.29 ± 3% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
14.08 ± 7% +2.6 16.65 ± 4% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.04 ± 29% -0.9 0.12 ± 77% perf-profile.children.cycles-pp.poll_idle
1.87 ± 9% -0.5 1.35 ± 7% perf-profile.children.cycles-pp.mmput
1.84 ± 11% -0.5 1.34 ± 7% perf-profile.children.cycles-pp.exit_mmap
2.08 ± 11% -0.5 1.62 ± 8% perf-profile.children.cycles-pp.x64_sys_call
1.94 ± 12% -0.5 1.48 ± 7% perf-profile.children.cycles-pp.do_exit
1.76 ± 12% -0.5 1.30 ± 11% perf-profile.children.cycles-pp.__x64_sys_exit_group
1.75 ± 12% -0.5 1.30 ± 11% perf-profile.children.cycles-pp.do_group_exit
1.42 ± 7% -0.4 1.00 ± 11% perf-profile.children.cycles-pp.exit_mm
0.94 ± 12% -0.4 0.59 ± 11% perf-profile.children.cycles-pp.unmap_vmas
1.03 ± 14% -0.3 0.72 ± 7% perf-profile.children.cycles-pp.btree_write_cache_pages
0.85 ± 9% -0.3 0.54 ± 11% perf-profile.children.cycles-pp.unmap_page_range
0.96 ± 17% -0.3 0.67 ± 7% perf-profile.children.cycles-pp.submit_eb_page
0.82 ± 8% -0.3 0.53 ± 10% perf-profile.children.cycles-pp.zap_pmd_range
0.78 ± 9% -0.3 0.50 ± 11% perf-profile.children.cycles-pp.zap_pte_range
0.65 ± 15% -0.3 0.39 ± 14% perf-profile.children.cycles-pp.zap_present_ptes
0.70 ± 19% -0.2 0.46 ± 22% perf-profile.children.cycles-pp.sched_clock_cpu
0.61 ± 23% -0.2 0.37 ± 21% perf-profile.children.cycles-pp.sched_clock
0.65 ± 8% -0.2 0.45 ± 27% perf-profile.children.cycles-pp.dl_server_stop
0.55 ± 6% -0.2 0.38 ± 5% perf-profile.children.cycles-pp.btrfs_commit_transaction
0.45 ± 6% -0.2 0.29 ± 28% perf-profile.children.cycles-pp.alloc_empty_file
0.65 ± 15% -0.2 0.49 ± 16% perf-profile.children.cycles-pp.read_tsc
0.23 ± 36% -0.1 0.09 ± 27% perf-profile.children.cycles-pp.vma_alloc_folio_noprof
0.37 ± 5% -0.1 0.25 perf-profile.children.cycles-pp.__btrfs_run_delayed_refs
0.37 ± 5% -0.1 0.25 perf-profile.children.cycles-pp.btrfs_run_delayed_refs
0.19 ± 53% -0.1 0.07 ± 89% perf-profile.children.cycles-pp.mas_next_slot
0.36 ± 6% -0.1 0.25 ± 5% perf-profile.children.cycles-pp.btrfs_run_delayed_refs_for_head
0.35 ± 4% -0.1 0.24 ± 9% perf-profile.children.cycles-pp.run_delayed_data_ref
0.34 ± 9% -0.1 0.25 ± 19% perf-profile.children.cycles-pp.try_address
0.14 ± 27% -0.1 0.06 ± 81% perf-profile.children.cycles-pp.number
0.15 ± 35% -0.1 0.07 ± 58% perf-profile.children.cycles-pp.obj_cgroup_charge
0.14 ± 32% -0.1 0.05 ±109% perf-profile.children.cycles-pp.wake_up_q
0.29 ± 5% -0.1 0.22 ± 25% perf-profile.children.cycles-pp.i2c_outb
0.35 ± 6% -0.1 0.27 ± 19% perf-profile.children.cycles-pp.__i2c_transfer
0.35 ± 6% -0.1 0.27 ± 19% perf-profile.children.cycles-pp.ast_vga_connector_helper_detect_ctx
0.35 ± 6% -0.1 0.27 ± 19% perf-profile.children.cycles-pp.bit_xfer
0.35 ± 6% -0.1 0.27 ± 19% perf-profile.children.cycles-pp.drm_connector_helper_detect_from_ddc
0.35 ± 6% -0.1 0.27 ± 19% perf-profile.children.cycles-pp.drm_do_probe_ddc_edid
0.35 ± 6% -0.1 0.27 ± 19% perf-profile.children.cycles-pp.drm_helper_probe_detect_ctx
0.35 ± 6% -0.1 0.27 ± 19% perf-profile.children.cycles-pp.drm_probe_ddc
0.35 ± 6% -0.1 0.27 ± 19% perf-profile.children.cycles-pp.i2c_transfer
0.35 ± 6% -0.1 0.27 ± 19% perf-profile.children.cycles-pp.output_poll_execute
0.16 ± 21% -0.1 0.09 ± 30% perf-profile.children.cycles-pp.btrfs_write_and_wait_transaction
0.03 ± 70% +0.0 0.08 ± 31% perf-profile.children.cycles-pp.sd_setup_read_write_cmnd
0.06 ± 57% +0.1 0.11 ± 30% perf-profile.children.cycles-pp._copy_from_user
0.06 ± 89% +0.1 0.13 ± 30% perf-profile.children.cycles-pp.blk_mq_requeue_work
0.03 ±111% +0.1 0.12 ± 27% perf-profile.children.cycles-pp.rcu_accelerate_cbs
0.25 ± 29% +0.1 0.37 ± 14% perf-profile.children.cycles-pp.extent_writepage_io
0.11 ± 31% +0.1 0.24 ± 20% perf-profile.children.cycles-pp.rw_verify_area
0.17 ± 63% +0.1 0.30 ± 27% perf-profile.children.cycles-pp.perf_event_read
0.07 ± 62% +0.1 0.21 ± 30% perf-profile.children.cycles-pp.svc_pool_wake_idle_thread
0.14 ± 58% +0.2 0.29 ± 26% perf-profile.children.cycles-pp.svc_tcp_read_marker
0.15 ± 55% +0.2 0.31 ± 34% perf-profile.children.cycles-pp.__write_extent_buffer
0.15 ± 64% +0.2 0.31 ± 28% perf-profile.children.cycles-pp.smp_call_function_single
0.35 ± 18% +0.2 0.52 ± 16% perf-profile.children.cycles-pp.arch_cpu_idle_enter
0.11 ± 75% +0.2 0.30 ± 20% perf-profile.children.cycles-pp.copy_extent_buffer_full
0.54 ± 15% +0.2 0.72 ± 9% perf-profile.children.cycles-pp.start_ordered_ops
0.15 ± 22% +0.2 0.34 ± 41% perf-profile.children.cycles-pp.nfsd_set_fh_dentry
0.28 ± 27% +0.2 0.48 ± 30% perf-profile.children.cycles-pp.__fh_verify
0.28 ± 27% +0.2 0.48 ± 29% perf-profile.children.cycles-pp.fh_verify
0.41 ± 24% +0.2 0.62 ± 10% perf-profile.children.cycles-pp.perf_read
0.45 ± 20% +0.2 0.66 ± 6% perf-profile.children.cycles-pp.__memcpy
0.18 ± 48% +0.2 0.40 ± 15% perf-profile.children.cycles-pp.generic_exec_single
0.83 ± 15% +0.2 1.06 ± 11% perf-profile.children.cycles-pp.__vfork
0.28 ± 30% +0.2 0.53 ± 52% perf-profile.children.cycles-pp._nfs4_proc_access
0.33 ± 22% +0.3 0.58 ± 19% perf-profile.children.cycles-pp.tick_nohz_stop_idle
0.28 ± 30% +0.3 0.54 ± 51% perf-profile.children.cycles-pp.nfs4_proc_access
0.63 ± 14% +0.3 0.92 ± 16% perf-profile.children.cycles-pp.evsel__read_counter
0.46 ± 26% +0.3 0.76 ± 20% perf-profile.children.cycles-pp.finish_task_switch
0.81 ± 16% +0.3 1.14 ± 9% perf-profile.children.cycles-pp.readn
1.10 ± 15% +0.4 1.48 ± 20% perf-profile.children.cycles-pp.xs_tcp_send_request
1.15 ± 12% +0.4 1.53 ± 20% perf-profile.children.cycles-pp.call_transmit
1.14 ± 12% +0.4 1.51 ± 21% perf-profile.children.cycles-pp.xprt_transmit
1.12 ± 13% +0.4 1.50 ± 21% perf-profile.children.cycles-pp.xprt_request_transmit
1.74 ± 6% +0.4 2.14 ± 11% perf-profile.children.cycles-pp.sched_tick
3.24 ± 9% +0.4 3.67 ± 6% perf-profile.children.cycles-pp.cmd_stat
3.24 ± 9% +0.4 3.67 ± 6% perf-profile.children.cycles-pp.dispatch_events
1.07 ± 15% +0.4 1.50 ± 3% perf-profile.children.cycles-pp.irq_enter_rcu
3.24 ± 9% +0.4 3.66 ± 6% perf-profile.children.cycles-pp.process_interval
1.01 ± 16% +0.4 1.45 ± 4% perf-profile.children.cycles-pp.tick_irq_enter
1.54 ± 17% +0.5 2.02 ± 14% perf-profile.children.cycles-pp._raw_spin_lock
3.08 ± 9% +0.5 3.58 ± 6% perf-profile.children.cycles-pp.read_counters
5.90 ± 8% +0.8 6.71 ± 6% perf-profile.children.cycles-pp.handle_softirqs
17.29 ± 2% +0.9 18.18 ± 3% perf-profile.children.cycles-pp.kthread
3.57 ± 7% +0.9 4.50 ± 10% perf-profile.children.cycles-pp.update_process_times
17.42 ± 2% +1.0 18.38 ± 2% perf-profile.children.cycles-pp.ret_from_fork
17.44 ± 2% +1.0 18.41 ± 2% perf-profile.children.cycles-pp.ret_from_fork_asm
4.47 ± 4% +1.4 5.85 ± 5% perf-profile.children.cycles-pp.tick_nohz_handler
5.49 ± 6% +1.6 7.05 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
6.76 ± 8% +1.7 8.42 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
6.90 ± 9% +1.8 8.68 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
12.60 ± 7% +2.6 15.18 ± 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
13.99 ± 7% +2.7 16.65 ± 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.98 ± 27% -0.9 0.11 ± 74% perf-profile.self.cycles-pp.poll_idle
0.53 ± 24% -0.2 0.28 ± 28% perf-profile.self.cycles-pp.__schedule
0.28 ± 21% -0.2 0.11 ± 24% perf-profile.self.cycles-pp._find_next_bit
0.32 ± 29% -0.1 0.20 ± 23% perf-profile.self.cycles-pp.zap_present_ptes
0.06 ± 57% +0.0 0.10 ± 31% perf-profile.self.cycles-pp._copy_from_user
0.06 ± 58% +0.1 0.11 ± 28% perf-profile.self.cycles-pp.__virt_addr_valid
0.06 ± 68% +0.1 0.20 ± 37% perf-profile.self.cycles-pp.tick_nohz_handler
0.44 ± 19% +0.2 0.66 ± 7% perf-profile.self.cycles-pp.__memcpy
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-next:master] [cpuidle] 38f83090f5: fsmark.app_overhead 51.9% regression
2024-10-07 14:43 [linux-next:master] [cpuidle] 38f83090f5: fsmark.app_overhead 51.9% regression kernel test robot
@ 2024-10-07 15:18 ` Rafael J. Wysocki
2024-10-07 15:58 ` Christian Loehle
2024-10-07 19:07 ` Christian Loehle
1 sibling, 1 reply; 4+ messages in thread
From: Rafael J. Wysocki @ 2024-10-07 15:18 UTC (permalink / raw)
To: kernel test robot
Cc: Christian Loehle, oe-lkp, lkp, Linux Memory Management List,
Rafael J. Wysocki, linux-pm, ying.huang, feng.tang, fengwei.yin
On Mon, Oct 7, 2024 at 4:44 PM kernel test robot <oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed a 51.9% regression of fsmark.app_overhead on:
What exactly is fsmark.app_overhead? What does it measure?
>
> (
> but there is no performance difference for fsmark.files_per_sec
> 18.58 -0.2% 18.55 fsmark.files_per_sec
> )
>
>
> commit: 38f83090f515b4b5d59382dfada1e7457f19aa47 ("cpuidle: menu: Remove iowait influence")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: fsmark
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> parameters:
>
> iterations: 1x
> nr_threads: 1t
> disk: 1HDD
> fs: btrfs
> fs2: nfsv4
> filesize: 4K
> test_size: 40M
> sync_method: fsyncBeforeClose
> nr_files_per_directory: 1fpd
> cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202410072214.11d18a3c-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20241007/202410072214.11d18a3c-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
>
> commit:
> v6.12-rc1
> 38f83090f5 ("cpuidle: menu: Remove iowait influence")
>
> v6.12-rc1 38f83090f515b4b5d59382dfada
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 2032015 ą 3% +51.9% 3087623 fsmark.app_overhead
> 18.58 -0.2% 18.55 fsmark.files_per_sec
> 2944 -2.9% 2858 vmstat.system.cs
> 0.02 +0.0 0.02 mpstat.cpu.all.irq%
> 0.01 ą 2% +0.0 0.01 mpstat.cpu.all.soft%
> 0.04 ą 2% +0.0 0.05 ą 3% mpstat.cpu.all.sys%
> 4.07 ą 18% -53.4% 1.90 ą 53% sched_debug.cfs_rq:/.removed.runnable_avg.avg
> 267.72 ą 38% -62.7% 99.92 ą 75% sched_debug.cfs_rq:/.removed.runnable_avg.max
> 30.08 ą 29% -58.5% 12.50 ą 63% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
> 4.07 ą 18% -53.5% 1.89 ą 53% sched_debug.cfs_rq:/.removed.util_avg.avg
> 267.67 ą 38% -62.7% 99.92 ą 75% sched_debug.cfs_rq:/.removed.util_avg.max
> 30.08 ą 29% -58.5% 12.49 ą 63% sched_debug.cfs_rq:/.removed.util_avg.stddev
> 20.43 ą 17% -25.5% 15.21 ą 16% sched_debug.cfs_rq:/.util_est.stddev
> 7.85 ą 14% +21.6% 9.55 ą 12% sched_debug.cpu.clock.stddev
> 0.00 ą 25% -47.7% 0.00 ą 44% sched_debug.cpu.next_balance.stddev
> 0.02 ą 10% -18.9% 0.02 ą 11% sched_debug.cpu.nr_running.avg
> 0.14 ą 5% -14.5% 0.12 ą 4% sched_debug.cpu.nr_running.stddev
> 5.19 +0.6 5.79 perf-stat.i.branch-miss-rate%
> 4096977 ą 4% +8.4% 4442600 ą 2% perf-stat.i.branch-misses
> 1.79 ą 7% -0.2 1.59 ą 3% perf-stat.i.cache-miss-rate%
> 11620307 +22.2% 14202690 perf-stat.i.cache-references
> 2925 -3.2% 2830 perf-stat.i.context-switches
> 1.68 +38.6% 2.32 perf-stat.i.cpi
> 4.457e+08 ą 3% +23.8% 5.518e+08 ą 2% perf-stat.i.cpu-cycles
> 1630 ą 8% +28.6% 2096 ą 4% perf-stat.i.cycles-between-cache-misses
> 0.63 -25.5% 0.47 perf-stat.i.ipc
> 5.26 +0.2 5.48 perf-stat.overall.branch-miss-rate%
> 1.16 +18.4% 1.38 perf-stat.overall.cpi
> 0.86 -15.6% 0.73 perf-stat.overall.ipc
> 4103944 ą 4% +7.9% 4429579 perf-stat.ps.branch-misses
> 11617199 +22.1% 14186503 perf-stat.ps.cache-references
> 2919 -3.2% 2825 perf-stat.ps.context-switches
> 4.492e+08 ą 3% +23.2% 5.534e+08 ą 2% perf-stat.ps.cpu-cycles
> 0.12 ą 4% -23.1% 0.09 ą 12% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.__synchronize_srcu.fsnotify_connector_destroy_workfn.process_one_work
> 0.00 ą 8% +52.0% 0.01 ą 17% perf-sched.sch_delay.avg.ms.btrfs_start_ordered_extent.btrfs_wait_ordered_range.btrfs_sync_file.btrfs_do_write_iter
> 0.01 ą 13% +178.9% 0.03 ą 11% perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
> 0.01 ą 13% +47.2% 0.01 ą 14% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
> 0.00 ą 14% +257.1% 0.01 ą 20% perf-sched.sch_delay.avg.ms.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.nfs4_run_open_task
> 0.06 ą 2% +11.8% 0.06 ą 3% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
> 0.12 ą 7% -14.5% 0.10 ą 4% perf-sched.sch_delay.avg.ms.schedule_timeout.__wait_for_common.__synchronize_srcu.fsnotify_connector_destroy_workfn
> 0.09 ą 8% +13.9% 0.10 ą 9% perf-sched.sch_delay.avg.ms.schedule_timeout.kcompactd.kthread.ret_from_fork
> 0.03 ą 2% +27.5% 0.03 ą 14% perf-sched.sch_delay.max.ms.irq_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.14 ą 6% +20.5% 0.17 ą 11% perf-sched.sch_delay.max.ms.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.__rpc_execute
> 0.01 ą 2% +867.1% 0.14 ą 21% perf-sched.sch_delay.max.ms.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.nfs4_run_open_task
> 0.27 ą 17% +652.4% 2.03 ą183% perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
> 0.11 ą 7% +67.0% 0.19 ą 12% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
> 33.20 ą 3% +29.7% 43.08 ą 4% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
> 0.84 ą 4% -16.6% 0.70 ą 10% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.__synchronize_srcu.fsnotify_mark_destroy_workfn
> 2121 ą 3% -23.3% 1626 ą 5% perf-sched.wait_and_delay.count.pipe_read.vfs_read.ksys_read.do_syscall_64
> 2.98 ą143% +5665.5% 171.90 ą215% perf-sched.wait_and_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
> 0.56 ą 14% +38.0% 0.77 ą 7% perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
> 0.35 ą 9% +50.9% 0.53 ą 8% perf-sched.wait_and_delay.max.ms.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.nfs4_run_open_task
> 3768 ą 16% +17.0% 4411 ą 5% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
> 0.16 ą 4% +20.4% 0.20 ą 4% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.__synchronize_srcu.fsnotify_connector_destroy_workfn.process_one_work
> 0.89 ą 14% -35.2% 0.58 ą 17% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.__synchronize_srcu.fsnotify_mark_destroy_workfn.process_one_work
> 0.24 ą 20% +728.0% 1.99 ą186% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
> 0.10 ą 9% +57.0% 0.16 ą 12% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
> 33.20 ą 3% +29.7% 43.07 ą 4% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
> 0.79 ą 5% -19.7% 0.63 ą 12% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.__synchronize_srcu.fsnotify_mark_destroy_workfn
> 0.27 ą 8% +13.3% 0.30 ą 5% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.__synchronize_srcu.fsnotify_connector_destroy_workfn.process_one_work
> 0.50 ą 17% +33.3% 0.67 ą 8% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.folio_wait_writeback.__filemap_fdatawait_range
> 0.34 ą 9% +23.1% 0.42 ą 8% perf-sched.wait_time.max.ms.rpc_wait_bit_killable.__wait_on_bit.out_of_line_wait_on_bit.nfs4_run_open_task
> 5.60 ą 7% -1.3 4.32 ą 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> 5.57 ą 7% -1.3 4.30 ą 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.33 ą 7% -0.5 0.86 ą 15% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call
> 1.32 ą 7% -0.5 0.86 ą 15% perf-profile.calltrace.cycles-pp.mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
> 1.30 ą 8% -0.5 0.85 ą 15% perf-profile.calltrace.cycles-pp.exit_mmap.mmput.exit_mm.do_exit.do_group_exit
> 1.88 ą 12% -0.4 1.48 ą 8% perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.78 ą 13% +0.2 0.95 ą 12% perf-profile.calltrace.cycles-pp.call_transmit.__rpc_execute.rpc_async_schedule.process_one_work.worker_thread
> 0.83 ą 15% +0.2 1.06 ą 11% perf-profile.calltrace.cycles-pp.__vfork
> 1.11 ą 15% +0.2 1.34 ą 10% perf-profile.calltrace.cycles-pp.update_sg_lb_stats.update_sd_lb_stats.sched_balance_find_src_group.sched_balance_rq.sched_balance_newidle
> 0.39 ą 71% +0.3 0.72 ą 9% perf-profile.calltrace.cycles-pp.start_ordered_ops.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write
> 0.54 ą 46% +0.4 0.91 ą 16% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
> 0.92 ą 20% +0.4 1.32 ą 11% perf-profile.calltrace.cycles-pp.irq_enter_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
> 0.18 ą141% +0.4 0.61 ą 10% perf-profile.calltrace.cycles-pp.readn.evsel__read_counter.read_counters.process_interval.dispatch_events
> 0.84 ą 20% +0.4 1.28 ą 10% perf-profile.calltrace.cycles-pp.tick_irq_enter.irq_enter_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
> 0.09 ą223% +0.5 0.60 ą 10% perf-profile.calltrace.cycles-pp.read.readn.evsel__read_counter.read_counters.process_interval
> 0.00 +0.6 0.58 ą 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read.readn.evsel__read_counter
> 0.00 +0.6 0.58 ą 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read.readn.evsel__read_counter.read_counters
> 3.34 ą 8% +0.8 4.16 ą 9% perf-profile.calltrace.cycles-pp.update_process_times.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
> 17.29 ą 2% +0.9 18.18 ą 3% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
> 17.29 ą 2% +0.9 18.18 ą 3% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
> 17.29 ą 2% +0.9 18.18 ą 3% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
> 4.19 ą 5% +1.3 5.51 ą 4% perf-profile.calltrace.cycles-pp.tick_nohz_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
> 5.20 ą 8% +1.5 6.67 ą 3% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
> 6.40 ą 9% +1.6 7.96 ą 4% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
> 6.53 ą 10% +1.7 8.21 ą 3% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
> 11.90 ą 7% +2.4 14.29 ą 3% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
> 14.08 ą 7% +2.6 16.65 ą 4% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> 1.04 ą 29% -0.9 0.12 ą 77% perf-profile.children.cycles-pp.poll_idle
> 1.87 ą 9% -0.5 1.35 ą 7% perf-profile.children.cycles-pp.mmput
> 1.84 ą 11% -0.5 1.34 ą 7% perf-profile.children.cycles-pp.exit_mmap
> 2.08 ą 11% -0.5 1.62 ą 8% perf-profile.children.cycles-pp.x64_sys_call
> 1.94 ą 12% -0.5 1.48 ą 7% perf-profile.children.cycles-pp.do_exit
> 1.76 ą 12% -0.5 1.30 ą 11% perf-profile.children.cycles-pp.__x64_sys_exit_group
> 1.75 ą 12% -0.5 1.30 ą 11% perf-profile.children.cycles-pp.do_group_exit
> 1.42 ą 7% -0.4 1.00 ą 11% perf-profile.children.cycles-pp.exit_mm
> 0.94 ą 12% -0.4 0.59 ą 11% perf-profile.children.cycles-pp.unmap_vmas
> 1.03 ą 14% -0.3 0.72 ą 7% perf-profile.children.cycles-pp.btree_write_cache_pages
> 0.85 ą 9% -0.3 0.54 ą 11% perf-profile.children.cycles-pp.unmap_page_range
> 0.96 ą 17% -0.3 0.67 ą 7% perf-profile.children.cycles-pp.submit_eb_page
> 0.82 ą 8% -0.3 0.53 ą 10% perf-profile.children.cycles-pp.zap_pmd_range
> 0.78 ą 9% -0.3 0.50 ą 11% perf-profile.children.cycles-pp.zap_pte_range
> 0.65 ą 15% -0.3 0.39 ą 14% perf-profile.children.cycles-pp.zap_present_ptes
> 0.70 ą 19% -0.2 0.46 ą 22% perf-profile.children.cycles-pp.sched_clock_cpu
> 0.61 ą 23% -0.2 0.37 ą 21% perf-profile.children.cycles-pp.sched_clock
> 0.65 ą 8% -0.2 0.45 ą 27% perf-profile.children.cycles-pp.dl_server_stop
> 0.55 ą 6% -0.2 0.38 ą 5% perf-profile.children.cycles-pp.btrfs_commit_transaction
> 0.45 ą 6% -0.2 0.29 ą 28% perf-profile.children.cycles-pp.alloc_empty_file
> 0.65 ą 15% -0.2 0.49 ą 16% perf-profile.children.cycles-pp.read_tsc
> 0.23 ą 36% -0.1 0.09 ą 27% perf-profile.children.cycles-pp.vma_alloc_folio_noprof
> 0.37 ą 5% -0.1 0.25 perf-profile.children.cycles-pp.__btrfs_run_delayed_refs
> 0.37 ą 5% -0.1 0.25 perf-profile.children.cycles-pp.btrfs_run_delayed_refs
> 0.19 ą 53% -0.1 0.07 ą 89% perf-profile.children.cycles-pp.mas_next_slot
> 0.36 ą 6% -0.1 0.25 ą 5% perf-profile.children.cycles-pp.btrfs_run_delayed_refs_for_head
> 0.35 ą 4% -0.1 0.24 ą 9% perf-profile.children.cycles-pp.run_delayed_data_ref
> 0.34 ą 9% -0.1 0.25 ą 19% perf-profile.children.cycles-pp.try_address
> 0.14 ą 27% -0.1 0.06 ą 81% perf-profile.children.cycles-pp.number
> 0.15 ą 35% -0.1 0.07 ą 58% perf-profile.children.cycles-pp.obj_cgroup_charge
> 0.14 ą 32% -0.1 0.05 ą109% perf-profile.children.cycles-pp.wake_up_q
> 0.29 ą 5% -0.1 0.22 ą 25% perf-profile.children.cycles-pp.i2c_outb
> 0.35 ą 6% -0.1 0.27 ą 19% perf-profile.children.cycles-pp.__i2c_transfer
> 0.35 ą 6% -0.1 0.27 ą 19% perf-profile.children.cycles-pp.ast_vga_connector_helper_detect_ctx
> 0.35 ą 6% -0.1 0.27 ą 19% perf-profile.children.cycles-pp.bit_xfer
> 0.35 ą 6% -0.1 0.27 ą 19% perf-profile.children.cycles-pp.drm_connector_helper_detect_from_ddc
> 0.35 ą 6% -0.1 0.27 ą 19% perf-profile.children.cycles-pp.drm_do_probe_ddc_edid
> 0.35 ą 6% -0.1 0.27 ą 19% perf-profile.children.cycles-pp.drm_helper_probe_detect_ctx
> 0.35 ą 6% -0.1 0.27 ą 19% perf-profile.children.cycles-pp.drm_probe_ddc
> 0.35 ą 6% -0.1 0.27 ą 19% perf-profile.children.cycles-pp.i2c_transfer
> 0.35 ą 6% -0.1 0.27 ą 19% perf-profile.children.cycles-pp.output_poll_execute
> 0.16 ą 21% -0.1 0.09 ą 30% perf-profile.children.cycles-pp.btrfs_write_and_wait_transaction
> 0.03 ą 70% +0.0 0.08 ą 31% perf-profile.children.cycles-pp.sd_setup_read_write_cmnd
> 0.06 ą 57% +0.1 0.11 ą 30% perf-profile.children.cycles-pp._copy_from_user
> 0.06 ą 89% +0.1 0.13 ą 30% perf-profile.children.cycles-pp.blk_mq_requeue_work
> 0.03 ą111% +0.1 0.12 ą 27% perf-profile.children.cycles-pp.rcu_accelerate_cbs
> 0.25 ą 29% +0.1 0.37 ą 14% perf-profile.children.cycles-pp.extent_writepage_io
> 0.11 ą 31% +0.1 0.24 ą 20% perf-profile.children.cycles-pp.rw_verify_area
> 0.17 ą 63% +0.1 0.30 ą 27% perf-profile.children.cycles-pp.perf_event_read
> 0.07 ą 62% +0.1 0.21 ą 30% perf-profile.children.cycles-pp.svc_pool_wake_idle_thread
> 0.14 ą 58% +0.2 0.29 ą 26% perf-profile.children.cycles-pp.svc_tcp_read_marker
> 0.15 ą 55% +0.2 0.31 ą 34% perf-profile.children.cycles-pp.__write_extent_buffer
> 0.15 ą 64% +0.2 0.31 ą 28% perf-profile.children.cycles-pp.smp_call_function_single
> 0.35 ą 18% +0.2 0.52 ą 16% perf-profile.children.cycles-pp.arch_cpu_idle_enter
> 0.11 ą 75% +0.2 0.30 ą 20% perf-profile.children.cycles-pp.copy_extent_buffer_full
> 0.54 ą 15% +0.2 0.72 ą 9% perf-profile.children.cycles-pp.start_ordered_ops
> 0.15 ą 22% +0.2 0.34 ą 41% perf-profile.children.cycles-pp.nfsd_set_fh_dentry
> 0.28 ą 27% +0.2 0.48 ą 30% perf-profile.children.cycles-pp.__fh_verify
> 0.28 ą 27% +0.2 0.48 ą 29% perf-profile.children.cycles-pp.fh_verify
> 0.41 ą 24% +0.2 0.62 ą 10% perf-profile.children.cycles-pp.perf_read
> 0.45 ą 20% +0.2 0.66 ą 6% perf-profile.children.cycles-pp.__memcpy
> 0.18 ą 48% +0.2 0.40 ą 15% perf-profile.children.cycles-pp.generic_exec_single
> 0.83 ą 15% +0.2 1.06 ą 11% perf-profile.children.cycles-pp.__vfork
> 0.28 ą 30% +0.2 0.53 ą 52% perf-profile.children.cycles-pp._nfs4_proc_access
> 0.33 ą 22% +0.3 0.58 ą 19% perf-profile.children.cycles-pp.tick_nohz_stop_idle
> 0.28 ą 30% +0.3 0.54 ą 51% perf-profile.children.cycles-pp.nfs4_proc_access
> 0.63 ą 14% +0.3 0.92 ą 16% perf-profile.children.cycles-pp.evsel__read_counter
> 0.46 ą 26% +0.3 0.76 ą 20% perf-profile.children.cycles-pp.finish_task_switch
> 0.81 ą 16% +0.3 1.14 ą 9% perf-profile.children.cycles-pp.readn
> 1.10 ą 15% +0.4 1.48 ą 20% perf-profile.children.cycles-pp.xs_tcp_send_request
> 1.15 ą 12% +0.4 1.53 ą 20% perf-profile.children.cycles-pp.call_transmit
> 1.14 ą 12% +0.4 1.51 ą 21% perf-profile.children.cycles-pp.xprt_transmit
> 1.12 ą 13% +0.4 1.50 ą 21% perf-profile.children.cycles-pp.xprt_request_transmit
> 1.74 ą 6% +0.4 2.14 ą 11% perf-profile.children.cycles-pp.sched_tick
> 3.24 ą 9% +0.4 3.67 ą 6% perf-profile.children.cycles-pp.cmd_stat
> 3.24 ą 9% +0.4 3.67 ą 6% perf-profile.children.cycles-pp.dispatch_events
> 1.07 ą 15% +0.4 1.50 ą 3% perf-profile.children.cycles-pp.irq_enter_rcu
> 3.24 ą 9% +0.4 3.66 ą 6% perf-profile.children.cycles-pp.process_interval
> 1.01 ą 16% +0.4 1.45 ą 4% perf-profile.children.cycles-pp.tick_irq_enter
> 1.54 ą 17% +0.5 2.02 ą 14% perf-profile.children.cycles-pp._raw_spin_lock
> 3.08 ą 9% +0.5 3.58 ą 6% perf-profile.children.cycles-pp.read_counters
> 5.90 ą 8% +0.8 6.71 ą 6% perf-profile.children.cycles-pp.handle_softirqs
> 17.29 ą 2% +0.9 18.18 ą 3% perf-profile.children.cycles-pp.kthread
> 3.57 ą 7% +0.9 4.50 ą 10% perf-profile.children.cycles-pp.update_process_times
> 17.42 ą 2% +1.0 18.38 ą 2% perf-profile.children.cycles-pp.ret_from_fork
> 17.44 ą 2% +1.0 18.41 ą 2% perf-profile.children.cycles-pp.ret_from_fork_asm
> 4.47 ą 4% +1.4 5.85 ą 5% perf-profile.children.cycles-pp.tick_nohz_handler
> 5.49 ą 6% +1.6 7.05 ą 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
> 6.76 ą 8% +1.7 8.42 ą 3% perf-profile.children.cycles-pp.hrtimer_interrupt
> 6.90 ą 9% +1.8 8.68 ą 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> 12.60 ą 7% +2.6 15.18 ą 3% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> 13.99 ą 7% +2.7 16.65 ą 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> 0.98 ą 27% -0.9 0.11 ą 74% perf-profile.self.cycles-pp.poll_idle
> 0.53 ą 24% -0.2 0.28 ą 28% perf-profile.self.cycles-pp.__schedule
> 0.28 ą 21% -0.2 0.11 ą 24% perf-profile.self.cycles-pp._find_next_bit
> 0.32 ą 29% -0.1 0.20 ą 23% perf-profile.self.cycles-pp.zap_present_ptes
> 0.06 ą 57% +0.0 0.10 ą 31% perf-profile.self.cycles-pp._copy_from_user
> 0.06 ą 58% +0.1 0.11 ą 28% perf-profile.self.cycles-pp.__virt_addr_valid
> 0.06 ą 68% +0.1 0.20 ą 37% perf-profile.self.cycles-pp.tick_nohz_handler
> 0.44 ą 19% +0.2 0.66 ą 7% perf-profile.self.cycles-pp.__memcpy
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-next:master] [cpuidle] 38f83090f5: fsmark.app_overhead 51.9% regression
2024-10-07 15:18 ` Rafael J. Wysocki
@ 2024-10-07 15:58 ` Christian Loehle
0 siblings, 0 replies; 4+ messages in thread
From: Christian Loehle @ 2024-10-07 15:58 UTC (permalink / raw)
To: Rafael J. Wysocki, kernel test robot
Cc: oe-lkp, lkp, Linux Memory Management List, Rafael J. Wysocki,
linux-pm, ying.huang, feng.tang, fengwei.yin
On 10/7/24 16:18, Rafael J. Wysocki wrote:
> On Mon, Oct 7, 2024 at 4:44 PM kernel test robot <oliver.sang@intel.com> wrote:
>>
>>
>>
>> Hello,
>>
>> kernel test robot noticed a 51.9% regression of fsmark.app_overhead on:
>
> What exactly is fsmark.app_overhead? What does it measure?
"App overhead is time in microseconds spent in the test not doing file writing related system calls."
So the loop is:
/*
* MAIN FILE WRITE LOOP:
* This loop measures the specific steps in creating files:
* Step 1: Make up a file name
* Step 2: Creat(file_name);
* Step 3: write file data
* Step 4: fsync() file data (optional)
* Step 5: close() file descriptor
*/
And it gets the timestamps before and after each syscall.
It then subtracts all those times (spent in syscalls) from the total time.
I'll go take a look tomorrow, what's interesting is that the exit-latency
(which is the thing supposed to cause regressions here) should be time
spent blocking on syscalls.
If that is all correct that's the part that shouldn't regress on cpuidle.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [linux-next:master] [cpuidle] 38f83090f5: fsmark.app_overhead 51.9% regression
2024-10-07 14:43 [linux-next:master] [cpuidle] 38f83090f5: fsmark.app_overhead 51.9% regression kernel test robot
2024-10-07 15:18 ` Rafael J. Wysocki
@ 2024-10-07 19:07 ` Christian Loehle
1 sibling, 0 replies; 4+ messages in thread
From: Christian Loehle @ 2024-10-07 19:07 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, Linux Memory Management List, Rafael J. Wysocki,
linux-pm, ying.huang, feng.tang, fengwei.yin
On 10/7/24 15:43, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 51.9% regression of fsmark.app_overhead on:
>
> (
> but there is no performance difference for fsmark.files_per_sec
> 18.58 -0.2% 18.55 fsmark.files_per_sec
> )
>
>
> commit: 38f83090f515b4b5d59382dfada1e7457f19aa47 ("cpuidle: menu: Remove iowait influence")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: fsmark
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> parameters:
>
> iterations: 1x
> nr_threads: 1t
> disk: 1HDD
> fs: btrfs
> fs2: nfsv4
> filesize: 4K
> test_size: 40M
> sync_method: fsyncBeforeClose
> nr_files_per_directory: 1fpd
> cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202410072214.11d18a3c-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20241007/202410072214.11d18a3c-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
>
> commit:
> v6.12-rc1
> 38f83090f5 ("cpuidle: menu: Remove iowait influence")
>
> v6.12-rc1 38f83090f515b4b5d59382dfada
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 2032015 ± 3% +51.9% 3087623 fsmark.app_overhead
> 18.58 -0.2% 18.55 fsmark.files_per_sec
> 2944 -2.9% 2858 vmstat.system.cs
> 0.02 +0.0 0.02 mpstat.cpu.all.irq%
> 0.01 ± 2% +0.0 0.01 mpstat.cpu.all.soft%
> 0.04 ± 2% +0.0 0.05 ± 3% mpstat.cpu.all.sys%
> 4.07 ± 18% -53.4% 1.90 ± 53% sched_debug.cfs_rq:/.removed.runnable_avg.avg
> 267.72 ± 38% -62.7% 99.92 ± 75% sched_debug.cfs_rq:/.removed.runnable_avg.max
> 30.08 ± 29% -58.5% 12.50 ± 63% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
> 4.07 ± 18% -53.5% 1.89 ± 53% sched_debug.cfs_rq:/.removed.util_avg.avg
> 267.67 ± 38% -62.7% 99.92 ± 75% sched_debug.cfs_rq:/.removed.util_avg.max
> 30.08 ± 29% -58.5% 12.49 ± 63% sched_debug.cfs_rq:/.removed.util_avg.stddev
> 20.43 ± 17% -25.5% 15.21 ± 16% sched_debug.cfs_rq:/.util_est.stddev
> 7.85 ± 14% +21.6% 9.55 ± 12% sched_debug.cpu.clock.stddev
> 0.00 ± 25% -47.7% 0.00 ± 44% sched_debug.cpu.next_balance.stddev
> 0.02 ± 10% -18.9% 0.02 ± 11% sched_debug.cpu.nr_running.avg
> 0.14 ± 5% -14.5% 0.12 ± 4% sched_debug.cpu.nr_running.stddev
> 5.19 +0.6 5.79 perf-stat.i.branch-miss-rate%
> 4096977 ± 4% +8.4% 4442600 ± 2% perf-stat.i.branch-misses
> 1.79 ± 7% -0.2 1.59 ± 3% perf-stat.i.cache-miss-rate%
> 11620307 +22.2% 14202690 perf-stat.i.cache-references
> 2925 -3.2% 2830 perf-stat.i.context-switches
> 1.68 +38.6% 2.32 perf-stat.i.cpi
> 4.457e+08 ± 3% +23.8% 5.518e+08 ± 2% perf-stat.i.cpu-cycles
> 1630 ± 8% +28.6% 2096 ± 4% perf-stat.i.cycles-between-cache-misses
> 0.63 -25.5% 0.47 perf-stat.i.ipc
> 5.26 +0.2 5.48 perf-stat.overall.branch-miss-rate%
> 1.16 +18.4% 1.38 perf-stat.overall.cpi
> 0.86 -15.6% 0.73 perf-stat.overall.ipc
> 4103944 ± 4% +7.9% 4429579 perf-stat.ps.branch-misses
> 11617199 +22.1% 14186503 perf-stat.ps.cache-references
> 2919 -3.2% 2825 perf-stat.ps.context-switches
> 4.492e+08 ± 3% +23.2% 5.534e+08 ± 2% perf-stat.ps.cpu-cycles
The other obvious guess would be increased cache misses due to deeper idle
states clearing the cache. The reduced IPC and increased cycles would indicate
that, but the cache-misses don't seem to make up for that IMO.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-10-07 19:07 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-07 14:43 [linux-next:master] [cpuidle] 38f83090f5: fsmark.app_overhead 51.9% regression kernel test robot
2024-10-07 15:18 ` Rafael J. Wysocki
2024-10-07 15:58 ` Christian Loehle
2024-10-07 19:07 ` Christian Loehle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox