linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [linus:master] [mm/slub]  306c4ac989:  stress-ng.seal.ops_per_sec 5.2% improvement
@ 2024-07-25  8:04 kernel test robot
  2024-07-25 10:11 ` Vlastimil Babka
  0 siblings, 1 reply; 2+ messages in thread
From: kernel test robot @ 2024-07-25  8:04 UTC (permalink / raw)
  To: Hyunmin Lee
  Cc: oe-lkp, lkp, linux-kernel, Vlastimil Babka, Jeungwoo Yoo,
	Sangyun Kim, Hyeonggon Yoo, Gwan-gyeong Mun, Christoph Lameter,
	David Rientjes, linux-mm, ying.huang, feng.tang, fengwei.yin,
	oliver.sang



Hello,

kernel test robot noticed a 5.2% improvement of stress-ng.seal.ops_per_sec on:


commit: 306c4ac9896b07b8872293eb224058ff83f81fac ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: seal
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240725/202407251553.12f35198-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/seal/stress-ng/60s

commit: 
  844776cb65 ("mm/slub: mark racy access on slab->freelist")
  306c4ac989 ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")

844776cb65a77ef2 306c4ac9896b07b8872293eb224 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.51 ± 27%      +1.9        4.44 ± 35%  mpstat.cpu.all.idle%
    975100 ± 19%     +29.5%    1262643 ± 16%  numa-meminfo.node1.AnonPages.max
    187.06 ±  4%     -11.5%     165.63 ± 10%  sched_debug.cfs_rq:/.runnable_avg.stddev
      0.05 ± 18%     -40.0%       0.03 ± 58%  vmstat.procs.b
  58973718            +5.2%   62024061        stress-ng.seal.ops
    982893            +5.2%    1033732        stress-ng.seal.ops_per_sec
  59045344            +5.2%   62095668        stress-ng.time.minor_page_faults
    174957            +1.4%     177400        proc-vmstat.nr_slab_unreclaimable
  63634761            +5.5%   67148443        proc-vmstat.numa_hit
  63399995            +5.5%   66914221        proc-vmstat.numa_local
  73601172            +6.1%   78073549        proc-vmstat.pgalloc_normal
  59870250            +5.3%   63063514        proc-vmstat.pgfault
  72718474            +6.0%   77106313        proc-vmstat.pgfree
 1.983e+10            +1.3%   2.01e+10        perf-stat.i.branch-instructions
  66023349            +5.6%   69728143        perf-stat.i.cache-misses
 2.023e+08            +4.7%  2.117e+08        perf-stat.i.cache-references
      7.22            -1.9%       7.08        perf-stat.i.cpi
      9738            -5.6%       9196        perf-stat.i.cycles-between-cache-misses
 8.799e+10            +1.6%  8.939e+10        perf-stat.i.instructions
      0.14            +1.6%       0.14        perf-stat.i.ipc
      8.71            +5.1%       9.16        perf-stat.i.metric.K/sec
    983533            +4.7%    1029816        perf-stat.i.minor-faults
    983533            +4.7%    1029816        perf-stat.i.page-faults
      7.30           -18.4%       5.96 ± 44%  perf-stat.overall.cpi
      9735           -21.3%       7658 ± 44%  perf-stat.overall.cycles-between-cache-misses
      0.52            +0.1        0.62 ±  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64
      0.56            +0.1        0.67 ±  7%  perf-profile.calltrace.cycles-pp.ftruncate64
      0.34 ± 70%      +0.3        0.60 ±  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
     48.29            +0.6       48.86        perf-profile.calltrace.cycles-pp.__close
     48.27            +0.6       48.84        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
     48.27            +0.6       48.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
     48.26            +0.6       48.83        perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
      0.00            +0.6        0.58 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
     48.21            +0.6       48.80        perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
     48.03            +0.6       48.68        perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
     48.02            +0.6       48.66        perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.__x64_sys_close.do_syscall_64
     47.76            +0.7       48.47        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.__x64_sys_close
     47.19            +0.7       47.92        perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
     47.11            +0.8       47.88        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
      0.74            -0.3        0.48 ±  8%  perf-profile.children.cycles-pp.__munmap
      0.69            -0.2        0.44 ±  9%  perf-profile.children.cycles-pp.__x64_sys_munmap
      0.68            -0.2        0.44 ±  9%  perf-profile.children.cycles-pp.__vm_munmap
      0.68            -0.2        0.45 ±  9%  perf-profile.children.cycles-pp.do_vmi_munmap
      0.65            -0.2        0.42 ±  8%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.44            -0.2        0.28 ±  7%  perf-profile.children.cycles-pp.unmap_region
      0.48            -0.1        0.36 ±  7%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.42            -0.1        0.32 ±  7%  perf-profile.children.cycles-pp.do_user_addr_fault
      0.42 ±  2%      -0.1        0.32 ±  7%  perf-profile.children.cycles-pp.exc_page_fault
      0.38 ±  2%      -0.1        0.29 ±  7%  perf-profile.children.cycles-pp.handle_mm_fault
      0.35 ±  2%      -0.1        0.27 ±  7%  perf-profile.children.cycles-pp.__handle_mm_fault
      0.33 ±  2%      -0.1        0.26 ±  6%  perf-profile.children.cycles-pp.do_fault
      0.21 ±  2%      -0.1        0.14 ±  8%  perf-profile.children.cycles-pp.lru_add_drain
      0.22            -0.1        0.15 ± 11%  perf-profile.children.cycles-pp.alloc_inode
      0.21 ±  2%      -0.1        0.15 ±  9%  perf-profile.children.cycles-pp.lru_add_drain_cpu
      0.18 ±  2%      -0.1        0.12 ±  8%  perf-profile.children.cycles-pp.unmap_vmas
      0.21 ±  2%      -0.1        0.14 ±  7%  perf-profile.children.cycles-pp.folio_batch_move_lru
      0.17            -0.1        0.11 ±  8%  perf-profile.children.cycles-pp.unmap_page_range
      0.16 ±  2%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.zap_pte_range
      0.16 ±  2%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.zap_pmd_range
      0.26 ±  2%      -0.1        0.20 ±  7%  perf-profile.children.cycles-pp.shmem_fault
      0.50            -0.1        0.45 ±  8%  perf-profile.children.cycles-pp.mmap_region
      0.26 ±  2%      -0.1        0.20 ±  7%  perf-profile.children.cycles-pp.__do_fault
      0.26            -0.1        0.21 ±  6%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.19 ±  2%      -0.1        0.14 ± 14%  perf-profile.children.cycles-pp.write
      0.22 ±  3%      -0.0        0.18 ±  5%  perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
      0.11 ±  4%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.mas_store_gfp
      0.16 ±  2%      -0.0        0.12 ± 11%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.14            -0.0        0.10 ± 10%  perf-profile.children.cycles-pp.mas_wr_node_store
      0.08            -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.msync
      0.06            -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.mas_find
      0.12 ±  4%      -0.0        0.08 ± 11%  perf-profile.children.cycles-pp.inode_init_always
      0.10 ±  3%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.shmem_alloc_inode
      0.16            -0.0        0.13 ±  9%  perf-profile.children.cycles-pp.__x64_sys_fcntl
      0.11 ±  4%      -0.0        0.08 ± 11%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.10 ±  4%      -0.0        0.08 ±  8%  perf-profile.children.cycles-pp.do_fcntl
      0.15            -0.0        0.13 ±  8%  perf-profile.children.cycles-pp.destroy_inode
      0.16 ±  3%      -0.0        0.14 ±  7%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      0.22 ±  3%      -0.0        0.20 ±  5%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.08            -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.___slab_alloc
      0.15 ±  3%      -0.0        0.12 ±  8%  perf-profile.children.cycles-pp.__destroy_inode
      0.07 ±  7%      -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.__call_rcu_common
      0.13 ±  2%      -0.0        0.11 ±  8%  perf-profile.children.cycles-pp.perf_event_mmap
      0.09            -0.0        0.07 ±  9%  perf-profile.children.cycles-pp.memfd_fcntl
      0.06            -0.0        0.04 ± 44%  perf-profile.children.cycles-pp.native_irq_return_iret
      0.08 ±  6%      -0.0        0.06 ±  8%  perf-profile.children.cycles-pp.shmem_add_to_page_cache
      0.12            -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.perf_event_mmap_event
      0.11 ±  3%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
      0.10            -0.0        0.08 ±  8%  perf-profile.children.cycles-pp.uncharge_batch
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.05            +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.__d_alloc
      0.05            +0.0        0.07 ± 10%  perf-profile.children.cycles-pp.d_alloc_pseudo
      0.07            +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.file_init_path
      0.06 ±  6%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.security_file_alloc
      0.07 ±  7%      +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.errseq_sample
      0.04 ± 44%      +0.0        0.07 ± 10%  perf-profile.children.cycles-pp.apparmor_file_alloc_security
      0.09            +0.0        0.12 ±  5%  perf-profile.children.cycles-pp.init_file
      0.15            +0.0        0.18 ±  7%  perf-profile.children.cycles-pp.common_perm_cond
      0.15 ±  3%      +0.0        0.19 ±  8%  perf-profile.children.cycles-pp.security_file_truncate
      0.20            +0.0        0.24 ±  7%  perf-profile.children.cycles-pp.notify_change
      0.06            +0.0        0.10 ±  6%  perf-profile.children.cycles-pp.inode_init_owner
      0.13            +0.0        0.18 ±  5%  perf-profile.children.cycles-pp.alloc_empty_file
      0.10            +0.1        0.16 ±  7%  perf-profile.children.cycles-pp.clear_nlink
      0.47            +0.1        0.56 ±  7%  perf-profile.children.cycles-pp.do_ftruncate
      0.49            +0.1        0.59 ±  7%  perf-profile.children.cycles-pp.__x64_sys_ftruncate
      0.59            +0.1        0.70 ±  7%  perf-profile.children.cycles-pp.ftruncate64
      0.28            +0.1        0.40 ±  6%  perf-profile.children.cycles-pp.alloc_file_pseudo
     98.62            +0.2       98.77        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     98.58            +0.2       98.74        perf-profile.children.cycles-pp.do_syscall_64
     48.30            +0.6       48.86        perf-profile.children.cycles-pp.__close
     48.26            +0.6       48.83        perf-profile.children.cycles-pp.__x64_sys_close
     48.21            +0.6       48.80        perf-profile.children.cycles-pp.__fput
     48.04            +0.6       48.68        perf-profile.children.cycles-pp.dput
     48.02            +0.6       48.67        perf-profile.children.cycles-pp.__dentry_kill
     47.77            +0.7       48.47        perf-profile.children.cycles-pp.evict
      0.30            -0.1        0.23 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock
      0.10 ±  4%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.__fput
      0.08 ±  6%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.inode_init_always
      0.06            -0.0        0.04 ± 44%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.08            -0.0        0.06 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.09            -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.07            +0.0        0.09 ±  7%  perf-profile.self.cycles-pp.__shmem_get_inode
      0.06 ±  7%      +0.0        0.09 ±  9%  perf-profile.self.cycles-pp.errseq_sample
      0.15 ±  2%      +0.0        0.18 ±  7%  perf-profile.self.cycles-pp.common_perm_cond
      0.03 ± 70%      +0.0        0.06 ±  7%  perf-profile.self.cycles-pp.apparmor_file_alloc_security
      0.06            +0.0        0.10 ±  7%  perf-profile.self.cycles-pp.inode_init_owner
      0.10            +0.1        0.16 ±  6%  perf-profile.self.cycles-pp.clear_nlink




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [linus:master] [mm/slub] 306c4ac989: stress-ng.seal.ops_per_sec 5.2% improvement
  2024-07-25  8:04 [linus:master] [mm/slub] 306c4ac989: stress-ng.seal.ops_per_sec 5.2% improvement kernel test robot
@ 2024-07-25 10:11 ` Vlastimil Babka
  0 siblings, 0 replies; 2+ messages in thread
From: Vlastimil Babka @ 2024-07-25 10:11 UTC (permalink / raw)
  To: kernel test robot, Hyunmin Lee
  Cc: oe-lkp, lkp, linux-kernel, Jeungwoo Yoo, Sangyun Kim,
	Hyeonggon Yoo, Gwan-gyeong Mun, Christoph Lameter,
	David Rientjes, linux-mm, ying.huang, feng.tang, fengwei.yin

On 7/25/24 10:04 AM, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a 5.2% improvement of stress-ng.seal.ops_per_sec on:
> 
> 
> commit: 306c4ac9896b07b8872293eb224058ff83f81fac ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

Well that's great news, but also highly unlikely that the commit would cause
such an improvement, as it only optimizes a once-per-boot operation of
create_kmalloc_caches(). Maybe there are secondary effects in different
order of slab cache creation resulting in some different cpu cache layout,
but such improvement could be machine and compiler specific and overall fragile.

> testcase: stress-ng
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> parameters:
> 
> 	nr_threads: 100%
> 	testtime: 60s
> 	test: seal
> 	cpufreq_governor: performance
> 
> 
> 
> 
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240725/202407251553.12f35198-oliver.sang@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/seal/stress-ng/60s
> 
> commit: 
>   844776cb65 ("mm/slub: mark racy access on slab->freelist")
>   306c4ac989 ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
> 
> 844776cb65a77ef2 306c4ac9896b07b8872293eb224 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>       2.51 ± 27%      +1.9        4.44 ± 35%  mpstat.cpu.all.idle%
>     975100 ± 19%     +29.5%    1262643 ± 16%  numa-meminfo.node1.AnonPages.max
>     187.06 ±  4%     -11.5%     165.63 ± 10%  sched_debug.cfs_rq:/.runnable_avg.stddev
>       0.05 ± 18%     -40.0%       0.03 ± 58%  vmstat.procs.b
>   58973718            +5.2%   62024061        stress-ng.seal.ops
>     982893            +5.2%    1033732        stress-ng.seal.ops_per_sec
>   59045344            +5.2%   62095668        stress-ng.time.minor_page_faults
>     174957            +1.4%     177400        proc-vmstat.nr_slab_unreclaimable
>   63634761            +5.5%   67148443        proc-vmstat.numa_hit
>   63399995            +5.5%   66914221        proc-vmstat.numa_local
>   73601172            +6.1%   78073549        proc-vmstat.pgalloc_normal
>   59870250            +5.3%   63063514        proc-vmstat.pgfault
>   72718474            +6.0%   77106313        proc-vmstat.pgfree
>  1.983e+10            +1.3%   2.01e+10        perf-stat.i.branch-instructions
>   66023349            +5.6%   69728143        perf-stat.i.cache-misses
>  2.023e+08            +4.7%  2.117e+08        perf-stat.i.cache-references
>       7.22            -1.9%       7.08        perf-stat.i.cpi
>       9738            -5.6%       9196        perf-stat.i.cycles-between-cache-misses
>  8.799e+10            +1.6%  8.939e+10        perf-stat.i.instructions
>       0.14            +1.6%       0.14        perf-stat.i.ipc
>       8.71            +5.1%       9.16        perf-stat.i.metric.K/sec
>     983533            +4.7%    1029816        perf-stat.i.minor-faults
>     983533            +4.7%    1029816        perf-stat.i.page-faults
>       7.30           -18.4%       5.96 ± 44%  perf-stat.overall.cpi
>       9735           -21.3%       7658 ± 44%  perf-stat.overall.cycles-between-cache-misses
>       0.52            +0.1        0.62 ±  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64
>       0.56            +0.1        0.67 ±  7%  perf-profile.calltrace.cycles-pp.ftruncate64
>       0.34 ± 70%      +0.3        0.60 ±  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
>      48.29            +0.6       48.86        perf-profile.calltrace.cycles-pp.__close
>      48.27            +0.6       48.84        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
>      48.27            +0.6       48.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
>      48.26            +0.6       48.83        perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
>       0.00            +0.6        0.58 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
>      48.21            +0.6       48.80        perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
>      48.03            +0.6       48.68        perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      48.02            +0.6       48.66        perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.__x64_sys_close.do_syscall_64
>      47.76            +0.7       48.47        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.__x64_sys_close
>      47.19            +0.7       47.92        perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
>      47.11            +0.8       47.88        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
>       0.74            -0.3        0.48 ±  8%  perf-profile.children.cycles-pp.__munmap
>       0.69            -0.2        0.44 ±  9%  perf-profile.children.cycles-pp.__x64_sys_munmap
>       0.68            -0.2        0.44 ±  9%  perf-profile.children.cycles-pp.__vm_munmap
>       0.68            -0.2        0.45 ±  9%  perf-profile.children.cycles-pp.do_vmi_munmap
>       0.65            -0.2        0.42 ±  8%  perf-profile.children.cycles-pp.do_vmi_align_munmap
>       0.44            -0.2        0.28 ±  7%  perf-profile.children.cycles-pp.unmap_region
>       0.48            -0.1        0.36 ±  7%  perf-profile.children.cycles-pp.asm_exc_page_fault
>       0.42            -0.1        0.32 ±  7%  perf-profile.children.cycles-pp.do_user_addr_fault
>       0.42 ±  2%      -0.1        0.32 ±  7%  perf-profile.children.cycles-pp.exc_page_fault
>       0.38 ±  2%      -0.1        0.29 ±  7%  perf-profile.children.cycles-pp.handle_mm_fault
>       0.35 ±  2%      -0.1        0.27 ±  7%  perf-profile.children.cycles-pp.__handle_mm_fault
>       0.33 ±  2%      -0.1        0.26 ±  6%  perf-profile.children.cycles-pp.do_fault
>       0.21 ±  2%      -0.1        0.14 ±  8%  perf-profile.children.cycles-pp.lru_add_drain
>       0.22            -0.1        0.15 ± 11%  perf-profile.children.cycles-pp.alloc_inode
>       0.21 ±  2%      -0.1        0.15 ±  9%  perf-profile.children.cycles-pp.lru_add_drain_cpu
>       0.18 ±  2%      -0.1        0.12 ±  8%  perf-profile.children.cycles-pp.unmap_vmas
>       0.21 ±  2%      -0.1        0.14 ±  7%  perf-profile.children.cycles-pp.folio_batch_move_lru
>       0.17            -0.1        0.11 ±  8%  perf-profile.children.cycles-pp.unmap_page_range
>       0.16 ±  2%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.zap_pte_range
>       0.16 ±  2%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.zap_pmd_range
>       0.26 ±  2%      -0.1        0.20 ±  7%  perf-profile.children.cycles-pp.shmem_fault
>       0.50            -0.1        0.45 ±  8%  perf-profile.children.cycles-pp.mmap_region
>       0.26 ±  2%      -0.1        0.20 ±  7%  perf-profile.children.cycles-pp.__do_fault
>       0.26            -0.1        0.21 ±  6%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
>       0.19 ±  2%      -0.1        0.14 ± 14%  perf-profile.children.cycles-pp.write
>       0.22 ±  3%      -0.0        0.18 ±  5%  perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
>       0.11 ±  4%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.mas_store_gfp
>       0.16 ±  2%      -0.0        0.12 ± 11%  perf-profile.children.cycles-pp.mas_wr_store_entry
>       0.14            -0.0        0.10 ± 10%  perf-profile.children.cycles-pp.mas_wr_node_store
>       0.08            -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.msync
>       0.06            -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.mas_find
>       0.12 ±  4%      -0.0        0.08 ± 11%  perf-profile.children.cycles-pp.inode_init_always
>       0.10 ±  3%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.shmem_alloc_inode
>       0.16            -0.0        0.13 ±  9%  perf-profile.children.cycles-pp.__x64_sys_fcntl
>       0.11 ±  4%      -0.0        0.08 ± 11%  perf-profile.children.cycles-pp.shmem_file_write_iter
>       0.10 ±  4%      -0.0        0.08 ±  8%  perf-profile.children.cycles-pp.do_fcntl
>       0.15            -0.0        0.13 ±  8%  perf-profile.children.cycles-pp.destroy_inode
>       0.16 ±  3%      -0.0        0.14 ±  7%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
>       0.22 ±  3%      -0.0        0.20 ±  5%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       0.08            -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.___slab_alloc
>       0.15 ±  3%      -0.0        0.12 ±  8%  perf-profile.children.cycles-pp.__destroy_inode
>       0.07 ±  7%      -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.__call_rcu_common
>       0.13 ±  2%      -0.0        0.11 ±  8%  perf-profile.children.cycles-pp.perf_event_mmap
>       0.09            -0.0        0.07 ±  9%  perf-profile.children.cycles-pp.memfd_fcntl
>       0.06            -0.0        0.04 ± 44%  perf-profile.children.cycles-pp.native_irq_return_iret
>       0.08 ±  6%      -0.0        0.06 ±  8%  perf-profile.children.cycles-pp.shmem_add_to_page_cache
>       0.12            -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.perf_event_mmap_event
>       0.11 ±  3%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
>       0.10            -0.0        0.08 ±  8%  perf-profile.children.cycles-pp.uncharge_batch
>       0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64
>       0.05            +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.__d_alloc
>       0.05            +0.0        0.07 ± 10%  perf-profile.children.cycles-pp.d_alloc_pseudo
>       0.07            +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.file_init_path
>       0.06 ±  6%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.security_file_alloc
>       0.07 ±  7%      +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.errseq_sample
>       0.04 ± 44%      +0.0        0.07 ± 10%  perf-profile.children.cycles-pp.apparmor_file_alloc_security
>       0.09            +0.0        0.12 ±  5%  perf-profile.children.cycles-pp.init_file
>       0.15            +0.0        0.18 ±  7%  perf-profile.children.cycles-pp.common_perm_cond
>       0.15 ±  3%      +0.0        0.19 ±  8%  perf-profile.children.cycles-pp.security_file_truncate
>       0.20            +0.0        0.24 ±  7%  perf-profile.children.cycles-pp.notify_change
>       0.06            +0.0        0.10 ±  6%  perf-profile.children.cycles-pp.inode_init_owner
>       0.13            +0.0        0.18 ±  5%  perf-profile.children.cycles-pp.alloc_empty_file
>       0.10            +0.1        0.16 ±  7%  perf-profile.children.cycles-pp.clear_nlink
>       0.47            +0.1        0.56 ±  7%  perf-profile.children.cycles-pp.do_ftruncate
>       0.49            +0.1        0.59 ±  7%  perf-profile.children.cycles-pp.__x64_sys_ftruncate
>       0.59            +0.1        0.70 ±  7%  perf-profile.children.cycles-pp.ftruncate64
>       0.28            +0.1        0.40 ±  6%  perf-profile.children.cycles-pp.alloc_file_pseudo
>      98.62            +0.2       98.77        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      98.58            +0.2       98.74        perf-profile.children.cycles-pp.do_syscall_64
>      48.30            +0.6       48.86        perf-profile.children.cycles-pp.__close
>      48.26            +0.6       48.83        perf-profile.children.cycles-pp.__x64_sys_close
>      48.21            +0.6       48.80        perf-profile.children.cycles-pp.__fput
>      48.04            +0.6       48.68        perf-profile.children.cycles-pp.dput
>      48.02            +0.6       48.67        perf-profile.children.cycles-pp.__dentry_kill
>      47.77            +0.7       48.47        perf-profile.children.cycles-pp.evict
>       0.30            -0.1        0.23 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock
>       0.10 ±  4%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.__fput
>       0.08 ±  6%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.inode_init_always
>       0.06            -0.0        0.04 ± 44%  perf-profile.self.cycles-pp.native_irq_return_iret
>       0.08            -0.0        0.06 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.09            -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.07            +0.0        0.09 ±  7%  perf-profile.self.cycles-pp.__shmem_get_inode
>       0.06 ±  7%      +0.0        0.09 ±  9%  perf-profile.self.cycles-pp.errseq_sample
>       0.15 ±  2%      +0.0        0.18 ±  7%  perf-profile.self.cycles-pp.common_perm_cond
>       0.03 ± 70%      +0.0        0.06 ±  7%  perf-profile.self.cycles-pp.apparmor_file_alloc_security
>       0.06            +0.0        0.10 ±  7%  perf-profile.self.cycles-pp.inode_init_owner
>       0.10            +0.1        0.16 ±  6%  perf-profile.self.cycles-pp.clear_nlink
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-07-25 10:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-25  8:04 [linus:master] [mm/slub] 306c4ac989: stress-ng.seal.ops_per_sec 5.2% improvement kernel test robot
2024-07-25 10:11 ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox