Hello,

kernel test robot noticed a 2.1% improvement of fsmark.files_per_sec on:


commit: e5b9a37505880cb3d76ebddca25a7242fd9d6f91 ("NFSD: Enable write delegation support")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: fsmark
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
parameters:

	iterations: 1x
	nr_threads: 32t
	disk: 1SSD
	fs: btrfs
	fs2: nfsv4
	filesize: 9B
	test_size: 400M
	sync_method: fsyncBeforeClose
	nr_directories: 16d
	nr_files_per_directory: 256fpd
	cpufreq_governor: performance


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_directories/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
  gcc-12/performance/1SSD/9B/nfsv4/btrfs/1x/x86_64-rhel-8.3/16d/256fpd/32t/debian-11.1-x86_64-20220510.cgz/fsyncBeforeClose/lkp-ivb-2ep1/400M/fsmark

commit: 
  af7c14f91a ("NFSD: Enforce flush-on-close for write delegations")
  e5b9a37505 ("NFSD: Enable write delegation support")

af7c14f91a306eee e5b9a37505880cb3d76ebddca25 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   4426369           -12.5%    3873415        cpuidle..usage
      5.93 ą  2%      -0.5        5.47        mpstat.cpu.all.sys%
      9.48            +2.1%       9.68        iostat.cpu.iowait
      7.84            -6.0%       7.37        iostat.cpu.system
    310366 ą  2%     -14.8%     264523        vmstat.system.cs
     69011            -5.3%      65342        vmstat.system.in
  11823460           -11.4%   10471571        fsmark.app_overhead
      5058            +2.1%       5165        fsmark.files_per_sec
     55.00            +2.4%      56.33        fsmark.time.percent_of_cpu_this_job_got
    477760            -9.2%     433730        fsmark.time.voluntary_context_switches
      5.77 ą 69%      -4.4        1.36 ą148%  perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      5.77 ą 69%      -4.4        1.36 ą148%  perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
      7.35 ą 68%      -5.5        1.86 ą142%  perf-profile.children.cycles-pp.__x64_sys_openat
      7.35 ą 68%      -5.5        1.86 ą142%  perf-profile.children.cycles-pp.do_sys_openat2
      5.77 ą 69%      -4.4        1.36 ą148%  perf-profile.children.cycles-pp.do_filp_open
      5.77 ą 69%      -4.4        1.36 ą148%  perf-profile.children.cycles-pp.path_openat
      5.10 ą 88%      -3.2        1.92 ą142%  perf-profile.children.cycles-pp.sched_setaffinity
   2453462           -20.3%    1954727        turbostat.C1
     14.80            -1.8       13.03        turbostat.C1%
     18.26            +0.8       19.04        turbostat.C1E%
     92.44            -1.7%      90.91        turbostat.CorWatt
     32589           -34.7%      21295 ą  2%  turbostat.POLL
      0.13 ą  2%      -0.0        0.09        turbostat.POLL%
    120.46            -1.3%     118.92        turbostat.PkgWatt
    712376            -1.4%     702472        proc-vmstat.nr_dirtied
     46109            -6.9%      42910        proc-vmstat.nr_slab_unreclaimable
    705524            -1.4%     695317        proc-vmstat.nr_written
   2314060            -3.9%    2223615        proc-vmstat.numa_hit
    489398 ą  9%     -12.0%     430864 ą  9%  proc-vmstat.numa_other
    218171            -1.6%     214672        proc-vmstat.pgactivate
   2765841            -3.3%    2673651        proc-vmstat.pgalloc_normal
   1981302            -7.0%    1842873 ą  3%  proc-vmstat.pgfree
   2140457            -1.8%    2102818        proc-vmstat.pgpgout
 2.073e+09            -4.5%   1.98e+09        perf-stat.i.branch-instructions
      4.89            +0.2        5.10        perf-stat.i.branch-miss-rate%
      6.86            -0.3        6.60        perf-stat.i.cache-miss-rate%
  16995704            -6.9%   15830167        perf-stat.i.cache-misses
 2.565e+08            -3.3%   2.48e+08        perf-stat.i.cache-references
    355982           -13.9%     306606        perf-stat.i.context-switches
 1.713e+10            -5.2%  1.623e+10        perf-stat.i.cpu-cycles
      1706            -6.8%       1589 ą  2%  perf-stat.i.cpu-migrations
 2.443e+09            -4.5%  2.334e+09        perf-stat.i.dTLB-loads
 1.196e+09            -5.0%  1.137e+09        perf-stat.i.dTLB-stores
     52.68            +1.9       54.58        perf-stat.i.iTLB-load-miss-rate%
   2927293            -9.6%    2646648        perf-stat.i.iTLB-loads
 1.018e+10            -4.2%  9.752e+09        perf-stat.i.instructions
      0.60            +1.0%       0.60        perf-stat.i.ipc
      0.36            -5.2%       0.34        perf-stat.i.metric.GHz
    669.99           -10.0%     602.81        perf-stat.i.metric.K/sec
    124.30            -4.5%     118.67        perf-stat.i.metric.M/sec
   8004496           -12.0%    7040516        perf-stat.i.node-load-misses
   8642285           -11.0%    7688294        perf-stat.i.node-loads
     35.10            -1.0       34.11        perf-stat.i.node-store-miss-rate%
   3782168           -10.9%    3370639        perf-stat.i.node-store-misses
   6991064            -7.0%    6505160        perf-stat.i.node-stores
      4.87            +0.2        5.06        perf-stat.overall.branch-miss-rate%
      6.63            -0.2        6.38        perf-stat.overall.cache-miss-rate%
     52.17            +1.9       54.08        perf-stat.overall.iTLB-load-miss-rate%
      0.59            +1.1%       0.60        perf-stat.overall.ipc
     35.11            -1.0       34.13        perf-stat.overall.node-store-miss-rate%
 1.979e+09            -4.7%  1.886e+09        perf-stat.ps.branch-instructions
  16221561            -7.1%   15076188        perf-stat.ps.cache-misses
 2.448e+08            -3.5%  2.362e+08        perf-stat.ps.cache-references
    339800           -14.1%     292030        perf-stat.ps.context-switches
 1.635e+10            -5.4%  1.546e+10        perf-stat.ps.cpu-cycles
      1628            -7.0%       1513 ą  2%  perf-stat.ps.cpu-migrations
 2.332e+09            -4.7%  2.223e+09        perf-stat.ps.dTLB-loads
 1.141e+09            -5.2%  1.082e+09        perf-stat.ps.dTLB-stores
   2794208            -9.8%    2520816        perf-stat.ps.iTLB-loads
 9.712e+09            -4.4%  9.288e+09        perf-stat.ps.instructions
   7640561           -12.2%    6705747        perf-stat.ps.node-load-misses
   8249263           -11.2%    7322640        perf-stat.ps.node-loads
   3609780           -11.1%    3209985        perf-stat.ps.node-store-misses
   6672465            -7.2%    6195169        perf-stat.ps.node-stores
 2.143e+11            -8.7%  1.956e+11        perf-stat.total.instructions


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki