* [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression
@ 2024-09-16 8:41 kernel test robot
2024-09-17 2:43 ` Chuck Lever
2024-10-01 12:44 ` Jeff Layton
0 siblings, 2 replies; 5+ messages in thread
From: kernel test robot @ 2024-09-16 8:41 UTC (permalink / raw)
To: Jeff Layton
Cc: oe-lkp, lkp, Linux Memory Management List, Chuck Lever,
linux-nfs, ying.huang, feng.tang, fengwei.yin, oliver.sang
Hello,
kernel test robot noticed a 81.6% regression of fsmark.app_overhead on:
commit: 8cb33389f66441dc4e54b28fe0d9bd4bcd9b796d ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: fsmark
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:
iterations: 1x
nr_threads: 1t
disk: 1HDD
fs: btrfs
fs2: nfsv4
filesize: 4K
test_size: 40M
sync_method: fsyncBeforeClose
nr_files_per_directory: 1fpd
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240916/202409161645.d44bced5-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
commit:
e29c78a693 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
8cb33389f6 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d
---------------- ---------------------------
%stddev %change %stddev
\ | \
24388 ± 20% -32.8% 16400 ± 18% numa-vmstat.node0.nr_slab_reclaimable
61.50 ± 4% -10.6% 55.00 ± 6% perf-c2c.HITM.local
0.20 ± 3% +23.0% 0.24 ± 13% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
2977 -6.1% 2796 vmstat.system.cs
2132466 ± 2% +81.6% 3871852 fsmark.app_overhead
53442 -17.3% 44172 fsmark.time.voluntary_context_switches
2907 -5.7% 2742 perf-stat.i.context-switches
2902 -5.7% 2737 perf-stat.ps.context-switches
1724787 -1.0% 1706808 proc-vmstat.numa_hit
1592345 -1.1% 1574310 proc-vmstat.numa_local
24.87 ± 33% -38.9% 15.20 ± 12% sched_debug.cpu.nr_uninterruptible.max
4.36 ± 9% -17.1% 3.61 ± 10% sched_debug.cpu.nr_uninterruptible.stddev
97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.KReclaimable
97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.SReclaimable
256796 ± 9% -18.7% 208805 ± 13% numa-meminfo.node0.Slab
2307911 ± 52% +68.5% 3888971 ± 5% numa-meminfo.node1.MemUsed
193326 ± 12% +24.7% 241049 ± 12% numa-meminfo.node1.Slab
0.90 ± 27% -0.5 0.36 ±103% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
0.36 ± 70% +0.2 0.58 ± 3% perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write
0.52 ± 47% +0.3 0.78 ± 8% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
1.62 ± 12% +0.3 1.93 ± 9% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
1.22 ± 21% -0.3 0.89 ± 10% perf-profile.children.cycles-pp.readn
0.46 ± 32% -0.2 0.24 ± 34% perf-profile.children.cycles-pp.__close
0.45 ± 32% -0.2 0.22 ± 15% perf-profile.children.cycles-pp.__x64_sys_close
0.40 ± 29% -0.2 0.18 ± 38% perf-profile.children.cycles-pp.__fput
0.31 ± 23% -0.2 0.16 ± 33% perf-profile.children.cycles-pp.irq_work_tick
0.17 ± 51% -0.1 0.03 ±111% perf-profile.children.cycles-pp.nfs_file_release
0.16 ± 43% -0.1 0.03 ±111% perf-profile.children.cycles-pp.__put_nfs_open_context
0.26 ± 18% -0.1 0.15 ± 34% perf-profile.children.cycles-pp.perf_event_task_tick
0.15 ± 41% -0.1 0.03 ±108% perf-profile.children.cycles-pp.get_free_pages_noprof
0.18 ± 55% -0.1 0.06 ± 32% perf-profile.children.cycles-pp.native_apic_mem_eoi
0.18 ± 32% -0.1 0.07 ± 81% perf-profile.children.cycles-pp.flush_end_io
0.17 ± 41% -0.1 0.07 ± 93% perf-profile.children.cycles-pp.mas_store_gfp
0.52 ± 5% +0.1 0.58 ± 3% perf-profile.children.cycles-pp.btrfs_commit_transaction
0.02 ±141% +0.1 0.08 ± 42% perf-profile.children.cycles-pp.uptime_proc_show
0.02 ±141% +0.1 0.08 ± 44% perf-profile.children.cycles-pp.get_zeroed_page_noprof
0.02 ±141% +0.1 0.09 ± 35% perf-profile.children.cycles-pp.__rmqueue_pcplist
0.14 ± 12% +0.1 0.28 ± 29% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.47 ± 27% +0.2 0.67 ± 19% perf-profile.children.cycles-pp.__mmap
0.70 ± 21% +0.2 0.91 ± 7% perf-profile.children.cycles-pp.vfs_write
0.74 ± 20% +0.2 0.96 ± 9% perf-profile.children.cycles-pp.ksys_write
0.73 ± 21% +0.3 1.00 ± 7% perf-profile.children.cycles-pp.copy_process
1.05 ± 13% +0.3 1.38 ± 10% perf-profile.children.cycles-pp.kernel_clone
0.28 ± 22% -0.1 0.13 ± 35% perf-profile.self.cycles-pp.irq_work_tick
0.18 ± 55% -0.1 0.06 ± 32% perf-profile.self.cycles-pp.native_apic_mem_eoi
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression
2024-09-16 8:41 [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression kernel test robot
@ 2024-09-17 2:43 ` Chuck Lever
2024-09-18 4:59 ` Oliver Sang
2024-10-01 12:44 ` Jeff Layton
1 sibling, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2024-09-17 2:43 UTC (permalink / raw)
To: kernel test robot
Cc: Jeff Layton, oe-lkp, lkp, Linux Memory Management List,
linux-nfs, ying.huang, feng.tang, fengwei.yin
On Mon, Sep 16, 2024 at 04:41:35PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 81.6% regression of fsmark.app_overhead on:
>
>
> commit: 8cb33389f66441dc4e54b28fe0d9bd4bcd9b796d ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: fsmark
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> parameters:
>
> iterations: 1x
> nr_threads: 1t
> disk: 1HDD
> fs: btrfs
> fs2: nfsv4
> filesize: 4K
> test_size: 40M
> sync_method: fsyncBeforeClose
> nr_files_per_directory: 1fpd
> cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240916/202409161645.d44bced5-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
>
> commit:
> e29c78a693 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
> 8cb33389f6 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
>
> e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 24388 ± 20% -32.8% 16400 ± 18% numa-vmstat.node0.nr_slab_reclaimable
> 61.50 ± 4% -10.6% 55.00 ± 6% perf-c2c.HITM.local
> 0.20 ± 3% +23.0% 0.24 ± 13% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> 2977 -6.1% 2796 vmstat.system.cs
> 2132466 ± 2% +81.6% 3871852 fsmark.app_overhead
> 53442 -17.3% 44172 fsmark.time.voluntary_context_switches
> 2907 -5.7% 2742 perf-stat.i.context-switches
> 2902 -5.7% 2737 perf-stat.ps.context-switches
> 1724787 -1.0% 1706808 proc-vmstat.numa_hit
> 1592345 -1.1% 1574310 proc-vmstat.numa_local
> 24.87 ± 33% -38.9% 15.20 ± 12% sched_debug.cpu.nr_uninterruptible.max
> 4.36 ± 9% -17.1% 3.61 ± 10% sched_debug.cpu.nr_uninterruptible.stddev
> 97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.KReclaimable
> 97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.SReclaimable
> 256796 ± 9% -18.7% 208805 ± 13% numa-meminfo.node0.Slab
> 2307911 ± 52% +68.5% 3888971 ± 5% numa-meminfo.node1.MemUsed
> 193326 ± 12% +24.7% 241049 ± 12% numa-meminfo.node1.Slab
> 0.90 ± 27% -0.5 0.36 ±103% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
> 0.36 ± 70% +0.2 0.58 ± 3% perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write
> 0.52 ± 47% +0.3 0.78 ± 8% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 1.62 ± 12% +0.3 1.93 ± 9% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> 1.22 ± 21% -0.3 0.89 ± 10% perf-profile.children.cycles-pp.readn
> 0.46 ± 32% -0.2 0.24 ± 34% perf-profile.children.cycles-pp.__close
> 0.45 ± 32% -0.2 0.22 ± 15% perf-profile.children.cycles-pp.__x64_sys_close
> 0.40 ± 29% -0.2 0.18 ± 38% perf-profile.children.cycles-pp.__fput
> 0.31 ± 23% -0.2 0.16 ± 33% perf-profile.children.cycles-pp.irq_work_tick
> 0.17 ± 51% -0.1 0.03 ±111% perf-profile.children.cycles-pp.nfs_file_release
> 0.16 ± 43% -0.1 0.03 ±111% perf-profile.children.cycles-pp.__put_nfs_open_context
> 0.26 ± 18% -0.1 0.15 ± 34% perf-profile.children.cycles-pp.perf_event_task_tick
> 0.15 ± 41% -0.1 0.03 ±108% perf-profile.children.cycles-pp.get_free_pages_noprof
> 0.18 ± 55% -0.1 0.06 ± 32% perf-profile.children.cycles-pp.native_apic_mem_eoi
> 0.18 ± 32% -0.1 0.07 ± 81% perf-profile.children.cycles-pp.flush_end_io
> 0.17 ± 41% -0.1 0.07 ± 93% perf-profile.children.cycles-pp.mas_store_gfp
> 0.52 ± 5% +0.1 0.58 ± 3% perf-profile.children.cycles-pp.btrfs_commit_transaction
> 0.02 ±141% +0.1 0.08 ± 42% perf-profile.children.cycles-pp.uptime_proc_show
> 0.02 ±141% +0.1 0.08 ± 44% perf-profile.children.cycles-pp.get_zeroed_page_noprof
> 0.02 ±141% +0.1 0.09 ± 35% perf-profile.children.cycles-pp.__rmqueue_pcplist
> 0.14 ± 12% +0.1 0.28 ± 29% perf-profile.children.cycles-pp.hrtimer_next_event_without
> 0.47 ± 27% +0.2 0.67 ± 19% perf-profile.children.cycles-pp.__mmap
> 0.70 ± 21% +0.2 0.91 ± 7% perf-profile.children.cycles-pp.vfs_write
> 0.74 ± 20% +0.2 0.96 ± 9% perf-profile.children.cycles-pp.ksys_write
> 0.73 ± 21% +0.3 1.00 ± 7% perf-profile.children.cycles-pp.copy_process
> 1.05 ± 13% +0.3 1.38 ± 10% perf-profile.children.cycles-pp.kernel_clone
> 0.28 ± 22% -0.1 0.13 ± 35% perf-profile.self.cycles-pp.irq_work_tick
> 0.18 ± 55% -0.1 0.06 ± 32% perf-profile.self.cycles-pp.native_apic_mem_eoi
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
Hi Oliver, thank you for the report.
Jeff and I discussed this regression earlier today, and we have
decided to drop this patch for the v6.12 merge window. It's a new
feature, so it can wait for v6.13.
--
Chuck Lever
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression
2024-09-17 2:43 ` Chuck Lever
@ 2024-09-18 4:59 ` Oliver Sang
0 siblings, 0 replies; 5+ messages in thread
From: Oliver Sang @ 2024-09-18 4:59 UTC (permalink / raw)
To: Chuck Lever
Cc: Jeff Layton, oe-lkp, lkp, Linux Memory Management List,
linux-nfs, ying.huang, feng.tang, fengwei.yin, oliver.sang
hi, Chuck Lever,
On Mon, Sep 16, 2024 at 10:43:59PM -0400, Chuck Lever wrote:
[...]
>
> Hi Oliver, thank you for the report.
>
> Jeff and I discussed this regression earlier today, and we have
> decided to drop this patch for the v6.12 merge window. It's a new
> feature, so it can wait for v6.13.
thanks a lot for information!
>
>
> --
> Chuck Lever
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression
2024-09-16 8:41 [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression kernel test robot
2024-09-17 2:43 ` Chuck Lever
@ 2024-10-01 12:44 ` Jeff Layton
2024-10-08 8:01 ` Oliver Sang
1 sibling, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2024-10-01 12:44 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, Linux Memory Management List, Chuck Lever,
linux-nfs, ying.huang, feng.tang, fengwei.yin
On Mon, 2024-09-16 at 16:41 +0800, kernel test robot wrote:
>
> Hello,
>
> kernel test robot noticed a 81.6% regression of fsmark.app_overhead on:
>
>
> commit: 8cb33389f66441dc4e54b28fe0d9bd4bcd9b796d ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: fsmark
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> parameters:
>
> iterations: 1x
> nr_threads: 1t
> disk: 1HDD
> fs: btrfs
> fs2: nfsv4
> filesize: 4K
> test_size: 40M
> sync_method: fsyncBeforeClose
> nr_files_per_directory: 1fpd
> cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240916/202409161645.d44bced5-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
>
> commit:
> e29c78a693 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
> 8cb33389f6 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
>
> e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 24388 ± 20% -32.8% 16400 ± 18% numa-vmstat.node0.nr_slab_reclaimable
> 61.50 ± 4% -10.6% 55.00 ± 6% perf-c2c.HITM.local
> 0.20 ± 3% +23.0% 0.24 ± 13% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> 2977 -6.1% 2796 vmstat.system.cs
> 2132466 ± 2% +81.6% 3871852 fsmark.app_overhead
I have been unable to reproduce this result with fs_mark. I've run a
number of repeated tests, and I can create files just as fast with or
without this patch (roughly ~46 files/s on my test machine).
I'm particularly suspicious of the fsmark.app_overhead value above. The
fsmark output says:
# App overhead is time in microseconds spent in the test not doing file writing related system calls.
That seems outside the purview of anything we're altering here, so I
have to wonder if something else is going on. Oliver, can you rerun
this test and see if this regression is reproducible?
Thanks,
> 53442 -17.3% 44172 fsmark.time.voluntary_context_switches
> 2907 -5.7% 2742 perf-stat.i.context-switches
> 2902 -5.7% 2737 perf-stat.ps.context-switches
> 1724787 -1.0% 1706808 proc-vmstat.numa_hit
> 1592345 -1.1% 1574310 proc-vmstat.numa_local
> 24.87 ± 33% -38.9% 15.20 ± 12% sched_debug.cpu.nr_uninterruptible.max
> 4.36 ± 9% -17.1% 3.61 ± 10% sched_debug.cpu.nr_uninterruptible.stddev
> 97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.KReclaimable
> 97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.SReclaimable
> 256796 ± 9% -18.7% 208805 ± 13% numa-meminfo.node0.Slab
> 2307911 ± 52% +68.5% 3888971 ± 5% numa-meminfo.node1.MemUsed
> 193326 ± 12% +24.7% 241049 ± 12% numa-meminfo.node1.Slab
> 0.90 ± 27% -0.5 0.36 ±103% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
> 0.36 ± 70% +0.2 0.58 ± 3% perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write
> 0.52 ± 47% +0.3 0.78 ± 8% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> 1.62 ± 12% +0.3 1.93 ± 9% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> 1.22 ± 21% -0.3 0.89 ± 10% perf-profile.children.cycles-pp.readn
> 0.46 ± 32% -0.2 0.24 ± 34% perf-profile.children.cycles-pp.__close
> 0.45 ± 32% -0.2 0.22 ± 15% perf-profile.children.cycles-pp.__x64_sys_close
> 0.40 ± 29% -0.2 0.18 ± 38% perf-profile.children.cycles-pp.__fput
> 0.31 ± 23% -0.2 0.16 ± 33% perf-profile.children.cycles-pp.irq_work_tick
> 0.17 ± 51% -0.1 0.03 ±111% perf-profile.children.cycles-pp.nfs_file_release
> 0.16 ± 43% -0.1 0.03 ±111% perf-profile.children.cycles-pp.__put_nfs_open_context
> 0.26 ± 18% -0.1 0.15 ± 34% perf-profile.children.cycles-pp.perf_event_task_tick
> 0.15 ± 41% -0.1 0.03 ±108% perf-profile.children.cycles-pp.get_free_pages_noprof
> 0.18 ± 55% -0.1 0.06 ± 32% perf-profile.children.cycles-pp.native_apic_mem_eoi
> 0.18 ± 32% -0.1 0.07 ± 81% perf-profile.children.cycles-pp.flush_end_io
> 0.17 ± 41% -0.1 0.07 ± 93% perf-profile.children.cycles-pp.mas_store_gfp
> 0.52 ± 5% +0.1 0.58 ± 3% perf-profile.children.cycles-pp.btrfs_commit_transaction
> 0.02 ±141% +0.1 0.08 ± 42% perf-profile.children.cycles-pp.uptime_proc_show
> 0.02 ±141% +0.1 0.08 ± 44% perf-profile.children.cycles-pp.get_zeroed_page_noprof
> 0.02 ±141% +0.1 0.09 ± 35% perf-profile.children.cycles-pp.__rmqueue_pcplist
> 0.14 ± 12% +0.1 0.28 ± 29% perf-profile.children.cycles-pp.hrtimer_next_event_without
> 0.47 ± 27% +0.2 0.67 ± 19% perf-profile.children.cycles-pp.__mmap
> 0.70 ± 21% +0.2 0.91 ± 7% perf-profile.children.cycles-pp.vfs_write
> 0.74 ± 20% +0.2 0.96 ± 9% perf-profile.children.cycles-pp.ksys_write
> 0.73 ± 21% +0.3 1.00 ± 7% perf-profile.children.cycles-pp.copy_process
> 1.05 ± 13% +0.3 1.38 ± 10% perf-profile.children.cycles-pp.kernel_clone
> 0.28 ± 22% -0.1 0.13 ± 35% perf-profile.self.cycles-pp.irq_work_tick
> 0.18 ± 55% -0.1 0.06 ± 32% perf-profile.self.cycles-pp.native_apic_mem_eoi
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression
2024-10-01 12:44 ` Jeff Layton
@ 2024-10-08 8:01 ` Oliver Sang
0 siblings, 0 replies; 5+ messages in thread
From: Oliver Sang @ 2024-10-08 8:01 UTC (permalink / raw)
To: Jeff Layton
Cc: oe-lkp, lkp, Linux Memory Management List, Chuck Lever,
linux-nfs, ying.huang, feng.tang, fengwei.yin, oliver.sang
hi, Jeff Layton,
On Tue, Oct 01, 2024 at 08:44:43AM -0400, Jeff Layton wrote:
> On Mon, 2024-09-16 at 16:41 +0800, kernel test robot wrote:
> >
> > Hello,
> >
> > kernel test robot noticed a 81.6% regression of fsmark.app_overhead on:
> >
> >
> > commit: 8cb33389f66441dc4e54b28fe0d9bd4bcd9b796d ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > testcase: fsmark
> > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> > parameters:
> >
> > iterations: 1x
> > nr_threads: 1t
> > disk: 1HDD
> > fs: btrfs
> > fs2: nfsv4
> > filesize: 4K
> > test_size: 40M
> > sync_method: fsyncBeforeClose
> > nr_files_per_directory: 1fpd
> > cpufreq_governor: performance
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Closes: https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@intel.com
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20240916/202409161645.d44bced5-oliver.sang@intel.com
> >
> > =========================================================================================
> > compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> > gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
> >
> > commit:
> > e29c78a693 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
> > 8cb33389f6 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> >
> > e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 24388 ± 20% -32.8% 16400 ± 18% numa-vmstat.node0.nr_slab_reclaimable
> > 61.50 ± 4% -10.6% 55.00 ± 6% perf-c2c.HITM.local
> > 0.20 ± 3% +23.0% 0.24 ± 13% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> > 2977 -6.1% 2796 vmstat.system.cs
> > 2132466 ± 2% +81.6% 3871852 fsmark.app_overhead
>
> I have been unable to reproduce this result with fs_mark. I've run a
> number of repeated tests, and I can create files just as fast with or
> without this patch (roughly ~46 files/s on my test machine).
>
> I'm particularly suspicious of the fsmark.app_overhead value above. The
> fsmark output says:
>
> # App overhead is time in microseconds spent in the test not doing file writing related system calls.
>
> That seems outside the purview of anything we're altering here, so I
> have to wonder if something else is going on. Oliver, can you rerun
> this test and see if this regression is reproducible?
we rebuild kernel and rerun tests for both parent and this commit, the data
is stable and the regression is still reproducible.
but one thing I want to mention is there is no diff for "fsmark.files_per_sec"
=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d
---------------- ---------------------------
%stddev %change %stddev
\ | \
2080540 ± 2% +86.6% 3881862 fsmark.app_overhead
18.56 +0.0% 18.56 fsmark.files_per_sec
we happen to report another report which has similar behavior that big changes
of fsmark.app_overhead, but little difference for fsmark.files_per_sec
https://lore.kernel.org/all/202410072214.11d18a3c-oliver.sang@intel.com/
Christian Loehle <christian.loehle@arm.com> shared some information in
https://lore.kernel.org/all/01c3073b-84d0-4986-b6d5-a8877ae8a046@arm.com/
"App overhead is time in microseconds spent in the test not doing file writing related system calls."
So the loop is:
/*
* MAIN FILE WRITE LOOP:
* This loop measures the specific steps in creating files:
* Step 1: Make up a file name
* Step 2: Creat(file_name);
* Step 3: write file data
* Step 4: fsync() file data (optional)
* Step 5: close() file descriptor
*/
And it gets the timestamps before and after each syscall.
It then subtracts all those times (spent in syscalls) from the total time.
not sure if this information is helpful to you?
>
> Thanks,
>
> > 53442 -17.3% 44172 fsmark.time.voluntary_context_switches
> > 2907 -5.7% 2742 perf-stat.i.context-switches
> > 2902 -5.7% 2737 perf-stat.ps.context-switches
> > 1724787 -1.0% 1706808 proc-vmstat.numa_hit
> > 1592345 -1.1% 1574310 proc-vmstat.numa_local
> > 24.87 ± 33% -38.9% 15.20 ± 12% sched_debug.cpu.nr_uninterruptible.max
> > 4.36 ± 9% -17.1% 3.61 ± 10% sched_debug.cpu.nr_uninterruptible.stddev
> > 97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.KReclaimable
> > 97541 ± 20% -32.7% 65610 ± 18% numa-meminfo.node0.SReclaimable
> > 256796 ± 9% -18.7% 208805 ± 13% numa-meminfo.node0.Slab
> > 2307911 ± 52% +68.5% 3888971 ± 5% numa-meminfo.node1.MemUsed
> > 193326 ± 12% +24.7% 241049 ± 12% numa-meminfo.node1.Slab
> > 0.90 ± 27% -0.5 0.36 ±103% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
> > 0.36 ± 70% +0.2 0.58 ± 3% perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write
> > 0.52 ± 47% +0.3 0.78 ± 8% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> > 1.62 ± 12% +0.3 1.93 ± 9% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> > 1.22 ± 21% -0.3 0.89 ± 10% perf-profile.children.cycles-pp.readn
> > 0.46 ± 32% -0.2 0.24 ± 34% perf-profile.children.cycles-pp.__close
> > 0.45 ± 32% -0.2 0.22 ± 15% perf-profile.children.cycles-pp.__x64_sys_close
> > 0.40 ± 29% -0.2 0.18 ± 38% perf-profile.children.cycles-pp.__fput
> > 0.31 ± 23% -0.2 0.16 ± 33% perf-profile.children.cycles-pp.irq_work_tick
> > 0.17 ± 51% -0.1 0.03 ±111% perf-profile.children.cycles-pp.nfs_file_release
> > 0.16 ± 43% -0.1 0.03 ±111% perf-profile.children.cycles-pp.__put_nfs_open_context
> > 0.26 ± 18% -0.1 0.15 ± 34% perf-profile.children.cycles-pp.perf_event_task_tick
> > 0.15 ± 41% -0.1 0.03 ±108% perf-profile.children.cycles-pp.get_free_pages_noprof
> > 0.18 ± 55% -0.1 0.06 ± 32% perf-profile.children.cycles-pp.native_apic_mem_eoi
> > 0.18 ± 32% -0.1 0.07 ± 81% perf-profile.children.cycles-pp.flush_end_io
> > 0.17 ± 41% -0.1 0.07 ± 93% perf-profile.children.cycles-pp.mas_store_gfp
> > 0.52 ± 5% +0.1 0.58 ± 3% perf-profile.children.cycles-pp.btrfs_commit_transaction
> > 0.02 ±141% +0.1 0.08 ± 42% perf-profile.children.cycles-pp.uptime_proc_show
> > 0.02 ±141% +0.1 0.08 ± 44% perf-profile.children.cycles-pp.get_zeroed_page_noprof
> > 0.02 ±141% +0.1 0.09 ± 35% perf-profile.children.cycles-pp.__rmqueue_pcplist
> > 0.14 ± 12% +0.1 0.28 ± 29% perf-profile.children.cycles-pp.hrtimer_next_event_without
> > 0.47 ± 27% +0.2 0.67 ± 19% perf-profile.children.cycles-pp.__mmap
> > 0.70 ± 21% +0.2 0.91 ± 7% perf-profile.children.cycles-pp.vfs_write
> > 0.74 ± 20% +0.2 0.96 ± 9% perf-profile.children.cycles-pp.ksys_write
> > 0.73 ± 21% +0.3 1.00 ± 7% perf-profile.children.cycles-pp.copy_process
> > 1.05 ± 13% +0.3 1.38 ± 10% perf-profile.children.cycles-pp.kernel_clone
> > 0.28 ± 22% -0.1 0.13 ± 35% perf-profile.self.cycles-pp.irq_work_tick
> > 0.18 ± 55% -0.1 0.06 ± 32% perf-profile.self.cycles-pp.native_apic_mem_eoi
> >
> >
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
>
> --
> Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-08 8:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-16 8:41 [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression kernel test robot
2024-09-17 2:43 ` Chuck Lever
2024-09-18 4:59 ` Oliver Sang
2024-10-01 12:44 ` Jeff Layton
2024-10-08 8:01 ` Oliver Sang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox