linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [linux-next:master] [nfsd]  8cb33389f6:  fsmark.app_overhead 81.6% regression
@ 2024-09-16  8:41 kernel test robot
  2024-09-17  2:43 ` Chuck Lever
  2024-10-01 12:44 ` Jeff Layton
  0 siblings, 2 replies; 5+ messages in thread
From: kernel test robot @ 2024-09-16  8:41 UTC (permalink / raw)
  To: Jeff Layton
  Cc: oe-lkp, lkp, Linux Memory Management List, Chuck Lever,
	linux-nfs, ying.huang, feng.tang, fengwei.yin, oliver.sang



Hello,

kernel test robot noticed a 81.6% regression of fsmark.app_overhead on:


commit: 8cb33389f66441dc4e54b28fe0d9bd4bcd9b796d ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: fsmark
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:

	iterations: 1x
	nr_threads: 1t
	disk: 1HDD
	fs: btrfs
	fs2: nfsv4
	filesize: 4K
	test_size: 40M
	sync_method: fsyncBeforeClose
	nr_files_per_directory: 1fpd
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240916/202409161645.d44bced5-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark

commit: 
  e29c78a693 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
  8cb33389f6 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")

e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     24388 ± 20%     -32.8%      16400 ± 18%  numa-vmstat.node0.nr_slab_reclaimable
     61.50 ±  4%     -10.6%      55.00 ±  6%  perf-c2c.HITM.local
      0.20 ±  3%     +23.0%       0.24 ± 13%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      2977            -6.1%       2796        vmstat.system.cs
   2132466 ±  2%     +81.6%    3871852        fsmark.app_overhead
     53442           -17.3%      44172        fsmark.time.voluntary_context_switches
      2907            -5.7%       2742        perf-stat.i.context-switches
      2902            -5.7%       2737        perf-stat.ps.context-switches
   1724787            -1.0%    1706808        proc-vmstat.numa_hit
   1592345            -1.1%    1574310        proc-vmstat.numa_local
     24.87 ± 33%     -38.9%      15.20 ± 12%  sched_debug.cpu.nr_uninterruptible.max
      4.36 ±  9%     -17.1%       3.61 ± 10%  sched_debug.cpu.nr_uninterruptible.stddev
     97541 ± 20%     -32.7%      65610 ± 18%  numa-meminfo.node0.KReclaimable
     97541 ± 20%     -32.7%      65610 ± 18%  numa-meminfo.node0.SReclaimable
    256796 ±  9%     -18.7%     208805 ± 13%  numa-meminfo.node0.Slab
   2307911 ± 52%     +68.5%    3888971 ±  5%  numa-meminfo.node1.MemUsed
    193326 ± 12%     +24.7%     241049 ± 12%  numa-meminfo.node1.Slab
      0.90 ± 27%      -0.5        0.36 ±103%  perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
      0.36 ± 70%      +0.2        0.58 ±  3%  perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write
      0.52 ± 47%      +0.3        0.78 ±  8%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      1.62 ± 12%      +0.3        1.93 ±  9%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      1.22 ± 21%      -0.3        0.89 ± 10%  perf-profile.children.cycles-pp.readn
      0.46 ± 32%      -0.2        0.24 ± 34%  perf-profile.children.cycles-pp.__close
      0.45 ± 32%      -0.2        0.22 ± 15%  perf-profile.children.cycles-pp.__x64_sys_close
      0.40 ± 29%      -0.2        0.18 ± 38%  perf-profile.children.cycles-pp.__fput
      0.31 ± 23%      -0.2        0.16 ± 33%  perf-profile.children.cycles-pp.irq_work_tick
      0.17 ± 51%      -0.1        0.03 ±111%  perf-profile.children.cycles-pp.nfs_file_release
      0.16 ± 43%      -0.1        0.03 ±111%  perf-profile.children.cycles-pp.__put_nfs_open_context
      0.26 ± 18%      -0.1        0.15 ± 34%  perf-profile.children.cycles-pp.perf_event_task_tick
      0.15 ± 41%      -0.1        0.03 ±108%  perf-profile.children.cycles-pp.get_free_pages_noprof
      0.18 ± 55%      -0.1        0.06 ± 32%  perf-profile.children.cycles-pp.native_apic_mem_eoi
      0.18 ± 32%      -0.1        0.07 ± 81%  perf-profile.children.cycles-pp.flush_end_io
      0.17 ± 41%      -0.1        0.07 ± 93%  perf-profile.children.cycles-pp.mas_store_gfp
      0.52 ±  5%      +0.1        0.58 ±  3%  perf-profile.children.cycles-pp.btrfs_commit_transaction
      0.02 ±141%      +0.1        0.08 ± 42%  perf-profile.children.cycles-pp.uptime_proc_show
      0.02 ±141%      +0.1        0.08 ± 44%  perf-profile.children.cycles-pp.get_zeroed_page_noprof
      0.02 ±141%      +0.1        0.09 ± 35%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      0.14 ± 12%      +0.1        0.28 ± 29%  perf-profile.children.cycles-pp.hrtimer_next_event_without
      0.47 ± 27%      +0.2        0.67 ± 19%  perf-profile.children.cycles-pp.__mmap
      0.70 ± 21%      +0.2        0.91 ±  7%  perf-profile.children.cycles-pp.vfs_write
      0.74 ± 20%      +0.2        0.96 ±  9%  perf-profile.children.cycles-pp.ksys_write
      0.73 ± 21%      +0.3        1.00 ±  7%  perf-profile.children.cycles-pp.copy_process
      1.05 ± 13%      +0.3        1.38 ± 10%  perf-profile.children.cycles-pp.kernel_clone
      0.28 ± 22%      -0.1        0.13 ± 35%  perf-profile.self.cycles-pp.irq_work_tick
      0.18 ± 55%      -0.1        0.06 ± 32%  perf-profile.self.cycles-pp.native_apic_mem_eoi




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-next:master] [nfsd]  8cb33389f6:  fsmark.app_overhead 81.6% regression
  2024-09-16  8:41 [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression kernel test robot
@ 2024-09-17  2:43 ` Chuck Lever
  2024-09-18  4:59   ` Oliver Sang
  2024-10-01 12:44 ` Jeff Layton
  1 sibling, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2024-09-17  2:43 UTC (permalink / raw)
  To: kernel test robot
  Cc: Jeff Layton, oe-lkp, lkp, Linux Memory Management List,
	linux-nfs, ying.huang, feng.tang, fengwei.yin

On Mon, Sep 16, 2024 at 04:41:35PM +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a 81.6% regression of fsmark.app_overhead on:
> 
> 
> commit: 8cb33389f66441dc4e54b28fe0d9bd4bcd9b796d ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> testcase: fsmark
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> parameters:
> 
> 	iterations: 1x
> 	nr_threads: 1t
> 	disk: 1HDD
> 	fs: btrfs
> 	fs2: nfsv4
> 	filesize: 4K
> 	test_size: 40M
> 	sync_method: fsyncBeforeClose
> 	nr_files_per_directory: 1fpd
> 	cpufreq_governor: performance
> 
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240916/202409161645.d44bced5-oliver.sang@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
>   gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
> 
> commit: 
>   e29c78a693 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
>   8cb33389f6 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> 
> e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>      24388 ± 20%     -32.8%      16400 ± 18%  numa-vmstat.node0.nr_slab_reclaimable
>      61.50 ±  4%     -10.6%      55.00 ±  6%  perf-c2c.HITM.local
>       0.20 ±  3%     +23.0%       0.24 ± 13%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>       2977            -6.1%       2796        vmstat.system.cs
>    2132466 ±  2%     +81.6%    3871852        fsmark.app_overhead
>      53442           -17.3%      44172        fsmark.time.voluntary_context_switches
>       2907            -5.7%       2742        perf-stat.i.context-switches
>       2902            -5.7%       2737        perf-stat.ps.context-switches
>    1724787            -1.0%    1706808        proc-vmstat.numa_hit
>    1592345            -1.1%    1574310        proc-vmstat.numa_local
>      24.87 ± 33%     -38.9%      15.20 ± 12%  sched_debug.cpu.nr_uninterruptible.max
>       4.36 ±  9%     -17.1%       3.61 ± 10%  sched_debug.cpu.nr_uninterruptible.stddev
>      97541 ± 20%     -32.7%      65610 ± 18%  numa-meminfo.node0.KReclaimable
>      97541 ± 20%     -32.7%      65610 ± 18%  numa-meminfo.node0.SReclaimable
>     256796 ±  9%     -18.7%     208805 ± 13%  numa-meminfo.node0.Slab
>    2307911 ± 52%     +68.5%    3888971 ±  5%  numa-meminfo.node1.MemUsed
>     193326 ± 12%     +24.7%     241049 ± 12%  numa-meminfo.node1.Slab
>       0.90 ± 27%      -0.5        0.36 ±103%  perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
>       0.36 ± 70%      +0.2        0.58 ±  3%  perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write
>       0.52 ± 47%      +0.3        0.78 ±  8%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       1.62 ± 12%      +0.3        1.93 ±  9%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>       1.22 ± 21%      -0.3        0.89 ± 10%  perf-profile.children.cycles-pp.readn
>       0.46 ± 32%      -0.2        0.24 ± 34%  perf-profile.children.cycles-pp.__close
>       0.45 ± 32%      -0.2        0.22 ± 15%  perf-profile.children.cycles-pp.__x64_sys_close
>       0.40 ± 29%      -0.2        0.18 ± 38%  perf-profile.children.cycles-pp.__fput
>       0.31 ± 23%      -0.2        0.16 ± 33%  perf-profile.children.cycles-pp.irq_work_tick
>       0.17 ± 51%      -0.1        0.03 ±111%  perf-profile.children.cycles-pp.nfs_file_release
>       0.16 ± 43%      -0.1        0.03 ±111%  perf-profile.children.cycles-pp.__put_nfs_open_context
>       0.26 ± 18%      -0.1        0.15 ± 34%  perf-profile.children.cycles-pp.perf_event_task_tick
>       0.15 ± 41%      -0.1        0.03 ±108%  perf-profile.children.cycles-pp.get_free_pages_noprof
>       0.18 ± 55%      -0.1        0.06 ± 32%  perf-profile.children.cycles-pp.native_apic_mem_eoi
>       0.18 ± 32%      -0.1        0.07 ± 81%  perf-profile.children.cycles-pp.flush_end_io
>       0.17 ± 41%      -0.1        0.07 ± 93%  perf-profile.children.cycles-pp.mas_store_gfp
>       0.52 ±  5%      +0.1        0.58 ±  3%  perf-profile.children.cycles-pp.btrfs_commit_transaction
>       0.02 ±141%      +0.1        0.08 ± 42%  perf-profile.children.cycles-pp.uptime_proc_show
>       0.02 ±141%      +0.1        0.08 ± 44%  perf-profile.children.cycles-pp.get_zeroed_page_noprof
>       0.02 ±141%      +0.1        0.09 ± 35%  perf-profile.children.cycles-pp.__rmqueue_pcplist
>       0.14 ± 12%      +0.1        0.28 ± 29%  perf-profile.children.cycles-pp.hrtimer_next_event_without
>       0.47 ± 27%      +0.2        0.67 ± 19%  perf-profile.children.cycles-pp.__mmap
>       0.70 ± 21%      +0.2        0.91 ±  7%  perf-profile.children.cycles-pp.vfs_write
>       0.74 ± 20%      +0.2        0.96 ±  9%  perf-profile.children.cycles-pp.ksys_write
>       0.73 ± 21%      +0.3        1.00 ±  7%  perf-profile.children.cycles-pp.copy_process
>       1.05 ± 13%      +0.3        1.38 ± 10%  perf-profile.children.cycles-pp.kernel_clone
>       0.28 ± 22%      -0.1        0.13 ± 35%  perf-profile.self.cycles-pp.irq_work_tick
>       0.18 ± 55%      -0.1        0.06 ± 32%  perf-profile.self.cycles-pp.native_apic_mem_eoi
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
> 

Hi Oliver, thank you for the report.

Jeff and I discussed this regression earlier today, and we have
decided to drop this patch for the v6.12 merge window. It's a new
feature, so it can wait for v6.13.


-- 
Chuck Lever


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-next:master] [nfsd]  8cb33389f6:  fsmark.app_overhead 81.6% regression
  2024-09-17  2:43 ` Chuck Lever
@ 2024-09-18  4:59   ` Oliver Sang
  0 siblings, 0 replies; 5+ messages in thread
From: Oliver Sang @ 2024-09-18  4:59 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Jeff Layton, oe-lkp, lkp, Linux Memory Management List,
	linux-nfs, ying.huang, feng.tang, fengwei.yin, oliver.sang

hi, Chuck Lever,

On Mon, Sep 16, 2024 at 10:43:59PM -0400, Chuck Lever wrote:

[...]

> 
> Hi Oliver, thank you for the report.
> 
> Jeff and I discussed this regression earlier today, and we have
> decided to drop this patch for the v6.12 merge window. It's a new
> feature, so it can wait for v6.13.

thanks a lot for information!

> 
> 
> -- 
> Chuck Lever


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-next:master] [nfsd]  8cb33389f6:  fsmark.app_overhead 81.6% regression
  2024-09-16  8:41 [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression kernel test robot
  2024-09-17  2:43 ` Chuck Lever
@ 2024-10-01 12:44 ` Jeff Layton
  2024-10-08  8:01   ` Oliver Sang
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2024-10-01 12:44 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, Linux Memory Management List, Chuck Lever,
	linux-nfs, ying.huang, feng.tang, fengwei.yin

On Mon, 2024-09-16 at 16:41 +0800, kernel test robot wrote:
> 
> Hello,
> 
> kernel test robot noticed a 81.6% regression of fsmark.app_overhead on:
> 
> 
> commit: 8cb33389f66441dc4e54b28fe0d9bd4bcd9b796d ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> testcase: fsmark
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> parameters:
> 
> 	iterations: 1x
> 	nr_threads: 1t
> 	disk: 1HDD
> 	fs: btrfs
> 	fs2: nfsv4
> 	filesize: 4K
> 	test_size: 40M
> 	sync_method: fsyncBeforeClose
> 	nr_files_per_directory: 1fpd
> 	cpufreq_governor: performance
> 
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240916/202409161645.d44bced5-oliver.sang@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
>   gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
> 
> commit: 
>   e29c78a693 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
>   8cb33389f6 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> 
> e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>      24388 ± 20%     -32.8%      16400 ± 18%  numa-vmstat.node0.nr_slab_reclaimable
>      61.50 ±  4%     -10.6%      55.00 ±  6%  perf-c2c.HITM.local
>       0.20 ±  3%     +23.0%       0.24 ± 13%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
>       2977            -6.1%       2796        vmstat.system.cs
>    2132466 ±  2%     +81.6%    3871852        fsmark.app_overhead

I have been unable to reproduce this result with fs_mark. I've run a
number of repeated tests, and I can create files just as fast with or
without this patch (roughly ~46 files/s on my test machine).

I'm particularly suspicious of the fsmark.app_overhead value above. The
fsmark output says:

#       App overhead is time in microseconds spent in the test not doing file writing related system calls.

That seems outside the purview of anything we're altering here, so I
have to wonder if something else is going on. Oliver, can you rerun
this test and see if this regression is reproducible?

Thanks,

>      53442           -17.3%      44172        fsmark.time.voluntary_context_switches
>       2907            -5.7%       2742        perf-stat.i.context-switches
>       2902            -5.7%       2737        perf-stat.ps.context-switches
>    1724787            -1.0%    1706808        proc-vmstat.numa_hit
>    1592345            -1.1%    1574310        proc-vmstat.numa_local
>      24.87 ± 33%     -38.9%      15.20 ± 12%  sched_debug.cpu.nr_uninterruptible.max
>       4.36 ±  9%     -17.1%       3.61 ± 10%  sched_debug.cpu.nr_uninterruptible.stddev
>      97541 ± 20%     -32.7%      65610 ± 18%  numa-meminfo.node0.KReclaimable
>      97541 ± 20%     -32.7%      65610 ± 18%  numa-meminfo.node0.SReclaimable
>     256796 ±  9%     -18.7%     208805 ± 13%  numa-meminfo.node0.Slab
>    2307911 ± 52%     +68.5%    3888971 ±  5%  numa-meminfo.node1.MemUsed
>     193326 ± 12%     +24.7%     241049 ± 12%  numa-meminfo.node1.Slab
>       0.90 ± 27%      -0.5        0.36 ±103%  perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
>       0.36 ± 70%      +0.2        0.58 ±  3%  perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write
>       0.52 ± 47%      +0.3        0.78 ±  8%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
>       1.62 ± 12%      +0.3        1.93 ±  9%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
>       1.22 ± 21%      -0.3        0.89 ± 10%  perf-profile.children.cycles-pp.readn
>       0.46 ± 32%      -0.2        0.24 ± 34%  perf-profile.children.cycles-pp.__close
>       0.45 ± 32%      -0.2        0.22 ± 15%  perf-profile.children.cycles-pp.__x64_sys_close
>       0.40 ± 29%      -0.2        0.18 ± 38%  perf-profile.children.cycles-pp.__fput
>       0.31 ± 23%      -0.2        0.16 ± 33%  perf-profile.children.cycles-pp.irq_work_tick
>       0.17 ± 51%      -0.1        0.03 ±111%  perf-profile.children.cycles-pp.nfs_file_release
>       0.16 ± 43%      -0.1        0.03 ±111%  perf-profile.children.cycles-pp.__put_nfs_open_context
>       0.26 ± 18%      -0.1        0.15 ± 34%  perf-profile.children.cycles-pp.perf_event_task_tick
>       0.15 ± 41%      -0.1        0.03 ±108%  perf-profile.children.cycles-pp.get_free_pages_noprof
>       0.18 ± 55%      -0.1        0.06 ± 32%  perf-profile.children.cycles-pp.native_apic_mem_eoi
>       0.18 ± 32%      -0.1        0.07 ± 81%  perf-profile.children.cycles-pp.flush_end_io
>       0.17 ± 41%      -0.1        0.07 ± 93%  perf-profile.children.cycles-pp.mas_store_gfp
>       0.52 ±  5%      +0.1        0.58 ±  3%  perf-profile.children.cycles-pp.btrfs_commit_transaction
>       0.02 ±141%      +0.1        0.08 ± 42%  perf-profile.children.cycles-pp.uptime_proc_show
>       0.02 ±141%      +0.1        0.08 ± 44%  perf-profile.children.cycles-pp.get_zeroed_page_noprof
>       0.02 ±141%      +0.1        0.09 ± 35%  perf-profile.children.cycles-pp.__rmqueue_pcplist
>       0.14 ± 12%      +0.1        0.28 ± 29%  perf-profile.children.cycles-pp.hrtimer_next_event_without
>       0.47 ± 27%      +0.2        0.67 ± 19%  perf-profile.children.cycles-pp.__mmap
>       0.70 ± 21%      +0.2        0.91 ±  7%  perf-profile.children.cycles-pp.vfs_write
>       0.74 ± 20%      +0.2        0.96 ±  9%  perf-profile.children.cycles-pp.ksys_write
>       0.73 ± 21%      +0.3        1.00 ±  7%  perf-profile.children.cycles-pp.copy_process
>       1.05 ± 13%      +0.3        1.38 ± 10%  perf-profile.children.cycles-pp.kernel_clone
>       0.28 ± 22%      -0.1        0.13 ± 35%  perf-profile.self.cycles-pp.irq_work_tick
>       0.18 ± 55%      -0.1        0.06 ± 32%  perf-profile.self.cycles-pp.native_apic_mem_eoi
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linux-next:master] [nfsd]  8cb33389f6:  fsmark.app_overhead 81.6% regression
  2024-10-01 12:44 ` Jeff Layton
@ 2024-10-08  8:01   ` Oliver Sang
  0 siblings, 0 replies; 5+ messages in thread
From: Oliver Sang @ 2024-10-08  8:01 UTC (permalink / raw)
  To: Jeff Layton
  Cc: oe-lkp, lkp, Linux Memory Management List, Chuck Lever,
	linux-nfs, ying.huang, feng.tang, fengwei.yin, oliver.sang

hi, Jeff Layton,

On Tue, Oct 01, 2024 at 08:44:43AM -0400, Jeff Layton wrote:
> On Mon, 2024-09-16 at 16:41 +0800, kernel test robot wrote:
> > 
> > Hello,
> > 
> > kernel test robot noticed a 81.6% regression of fsmark.app_overhead on:
> > 
> > 
> > commit: 8cb33389f66441dc4e54b28fe0d9bd4bcd9b796d ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > 
> > testcase: fsmark
> > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> > parameters:
> > 
> > 	iterations: 1x
> > 	nr_threads: 1t
> > 	disk: 1HDD
> > 	fs: btrfs
> > 	fs2: nfsv4
> > 	filesize: 4K
> > 	test_size: 40M
> > 	sync_method: fsyncBeforeClose
> > 	nr_files_per_directory: 1fpd
> > 	cpufreq_governor: performance
> > 
> > 
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Closes: https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@intel.com
> > 
> > 
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20240916/202409161645.d44bced5-oliver.sang@intel.com
> > 
> > =========================================================================================
> > compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> >   gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
> > 
> > commit: 
> >   e29c78a693 ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
> >   8cb33389f6 ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> > 
> > e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d 
> > ---------------- --------------------------- 
> >          %stddev     %change         %stddev
> >              \          |                \  
> >      24388 ± 20%     -32.8%      16400 ± 18%  numa-vmstat.node0.nr_slab_reclaimable
> >      61.50 ±  4%     -10.6%      55.00 ±  6%  perf-c2c.HITM.local
> >       0.20 ±  3%     +23.0%       0.24 ± 13%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
> >       2977            -6.1%       2796        vmstat.system.cs
> >    2132466 ±  2%     +81.6%    3871852        fsmark.app_overhead
> 
> I have been unable to reproduce this result with fs_mark. I've run a
> number of repeated tests, and I can create files just as fast with or
> without this patch (roughly ~46 files/s on my test machine).
> 
> I'm particularly suspicious of the fsmark.app_overhead value above. The
> fsmark output says:
> 
> #       App overhead is time in microseconds spent in the test not doing file writing related system calls.
> 
> That seems outside the purview of anything we're altering here, so I
> have to wonder if something else is going on. Oliver, can you rerun
> this test and see if this regression is reproducible?

we rebuild kernel and rerun tests for both parent and this commit, the data
is stable and the regression is still reproducible.

but one thing I want to mention is there is no diff for "fsmark.files_per_sec"

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark

e29c78a6936e7422 8cb33389f66441dc4e54b28fe0d
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   2080540 ±  2%     +86.6%    3881862        fsmark.app_overhead
     18.56            +0.0%      18.56        fsmark.files_per_sec


we happen to report another report which has similar behavior that big changes
of fsmark.app_overhead, but little difference for fsmark.files_per_sec
https://lore.kernel.org/all/202410072214.11d18a3c-oliver.sang@intel.com/

Christian Loehle <christian.loehle@arm.com> shared some information in
https://lore.kernel.org/all/01c3073b-84d0-4986-b6d5-a8877ae8a046@arm.com/

"App overhead is time in microseconds spent in the test not doing file writing related system calls."
So the loop is:
	/*
	 * MAIN FILE WRITE LOOP:
	 * This loop measures the specific steps in creating files:
	 *      Step 1: Make up a file name
	 *      Step 2: Creat(file_name);
	 *      Step 3: write file data
	 *      Step 4: fsync() file data (optional)
	 *      Step 5: close() file descriptor
	 */

And it gets the timestamps before and after each syscall.
It then subtracts all those times (spent in syscalls) from the total time.


not sure if this information is helpful to you?

> 
> Thanks,
> 
> >      53442           -17.3%      44172        fsmark.time.voluntary_context_switches
> >       2907            -5.7%       2742        perf-stat.i.context-switches
> >       2902            -5.7%       2737        perf-stat.ps.context-switches
> >    1724787            -1.0%    1706808        proc-vmstat.numa_hit
> >    1592345            -1.1%    1574310        proc-vmstat.numa_local
> >      24.87 ± 33%     -38.9%      15.20 ± 12%  sched_debug.cpu.nr_uninterruptible.max
> >       4.36 ±  9%     -17.1%       3.61 ± 10%  sched_debug.cpu.nr_uninterruptible.stddev
> >      97541 ± 20%     -32.7%      65610 ± 18%  numa-meminfo.node0.KReclaimable
> >      97541 ± 20%     -32.7%      65610 ± 18%  numa-meminfo.node0.SReclaimable
> >     256796 ±  9%     -18.7%     208805 ± 13%  numa-meminfo.node0.Slab
> >    2307911 ± 52%     +68.5%    3888971 ±  5%  numa-meminfo.node1.MemUsed
> >     193326 ± 12%     +24.7%     241049 ± 12%  numa-meminfo.node1.Slab
> >       0.90 ± 27%      -0.5        0.36 ±103%  perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
> >       0.36 ± 70%      +0.2        0.58 ±  3%  perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.btrfs_sync_file.btrfs_do_write_iter.do_iter_readv_writev.vfs_iter_write
> >       0.52 ± 47%      +0.3        0.78 ±  8%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
> >       1.62 ± 12%      +0.3        1.93 ±  9%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
> >       1.22 ± 21%      -0.3        0.89 ± 10%  perf-profile.children.cycles-pp.readn
> >       0.46 ± 32%      -0.2        0.24 ± 34%  perf-profile.children.cycles-pp.__close
> >       0.45 ± 32%      -0.2        0.22 ± 15%  perf-profile.children.cycles-pp.__x64_sys_close
> >       0.40 ± 29%      -0.2        0.18 ± 38%  perf-profile.children.cycles-pp.__fput
> >       0.31 ± 23%      -0.2        0.16 ± 33%  perf-profile.children.cycles-pp.irq_work_tick
> >       0.17 ± 51%      -0.1        0.03 ±111%  perf-profile.children.cycles-pp.nfs_file_release
> >       0.16 ± 43%      -0.1        0.03 ±111%  perf-profile.children.cycles-pp.__put_nfs_open_context
> >       0.26 ± 18%      -0.1        0.15 ± 34%  perf-profile.children.cycles-pp.perf_event_task_tick
> >       0.15 ± 41%      -0.1        0.03 ±108%  perf-profile.children.cycles-pp.get_free_pages_noprof
> >       0.18 ± 55%      -0.1        0.06 ± 32%  perf-profile.children.cycles-pp.native_apic_mem_eoi
> >       0.18 ± 32%      -0.1        0.07 ± 81%  perf-profile.children.cycles-pp.flush_end_io
> >       0.17 ± 41%      -0.1        0.07 ± 93%  perf-profile.children.cycles-pp.mas_store_gfp
> >       0.52 ±  5%      +0.1        0.58 ±  3%  perf-profile.children.cycles-pp.btrfs_commit_transaction
> >       0.02 ±141%      +0.1        0.08 ± 42%  perf-profile.children.cycles-pp.uptime_proc_show
> >       0.02 ±141%      +0.1        0.08 ± 44%  perf-profile.children.cycles-pp.get_zeroed_page_noprof
> >       0.02 ±141%      +0.1        0.09 ± 35%  perf-profile.children.cycles-pp.__rmqueue_pcplist
> >       0.14 ± 12%      +0.1        0.28 ± 29%  perf-profile.children.cycles-pp.hrtimer_next_event_without
> >       0.47 ± 27%      +0.2        0.67 ± 19%  perf-profile.children.cycles-pp.__mmap
> >       0.70 ± 21%      +0.2        0.91 ±  7%  perf-profile.children.cycles-pp.vfs_write
> >       0.74 ± 20%      +0.2        0.96 ±  9%  perf-profile.children.cycles-pp.ksys_write
> >       0.73 ± 21%      +0.3        1.00 ±  7%  perf-profile.children.cycles-pp.copy_process
> >       1.05 ± 13%      +0.3        1.38 ± 10%  perf-profile.children.cycles-pp.kernel_clone
> >       0.28 ± 22%      -0.1        0.13 ± 35%  perf-profile.self.cycles-pp.irq_work_tick
> >       0.18 ± 55%      -0.1        0.06 ± 32%  perf-profile.self.cycles-pp.native_apic_mem_eoi
> > 
> > 
> > 
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> > 
> > 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-10-08  8:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-16  8:41 [linux-next:master] [nfsd] 8cb33389f6: fsmark.app_overhead 81.6% regression kernel test robot
2024-09-17  2:43 ` Chuck Lever
2024-09-18  4:59   ` Oliver Sang
2024-10-01 12:44 ` Jeff Layton
2024-10-08  8:01   ` Oliver Sang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox