* [linus:master] [mm/slub] 306c4ac989: stress-ng.seal.ops_per_sec 5.2% improvement
@ 2024-07-25 8:04 kernel test robot
2024-07-25 10:11 ` Vlastimil Babka
0 siblings, 1 reply; 2+ messages in thread
From: kernel test robot @ 2024-07-25 8:04 UTC (permalink / raw)
To: Hyunmin Lee
Cc: oe-lkp, lkp, linux-kernel, Vlastimil Babka, Jeungwoo Yoo,
Sangyun Kim, Hyeonggon Yoo, Gwan-gyeong Mun, Christoph Lameter,
David Rientjes, linux-mm, ying.huang, feng.tang, fengwei.yin,
oliver.sang
Hello,
kernel test robot noticed a 5.2% improvement of stress-ng.seal.ops_per_sec on:
commit: 306c4ac9896b07b8872293eb224058ff83f81fac ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:
nr_threads: 100%
testtime: 60s
test: seal
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240725/202407251553.12f35198-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/seal/stress-ng/60s
commit:
844776cb65 ("mm/slub: mark racy access on slab->freelist")
306c4ac989 ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
844776cb65a77ef2 306c4ac9896b07b8872293eb224
---------------- ---------------------------
%stddev %change %stddev
\ | \
2.51 ± 27% +1.9 4.44 ± 35% mpstat.cpu.all.idle%
975100 ± 19% +29.5% 1262643 ± 16% numa-meminfo.node1.AnonPages.max
187.06 ± 4% -11.5% 165.63 ± 10% sched_debug.cfs_rq:/.runnable_avg.stddev
0.05 ± 18% -40.0% 0.03 ± 58% vmstat.procs.b
58973718 +5.2% 62024061 stress-ng.seal.ops
982893 +5.2% 1033732 stress-ng.seal.ops_per_sec
59045344 +5.2% 62095668 stress-ng.time.minor_page_faults
174957 +1.4% 177400 proc-vmstat.nr_slab_unreclaimable
63634761 +5.5% 67148443 proc-vmstat.numa_hit
63399995 +5.5% 66914221 proc-vmstat.numa_local
73601172 +6.1% 78073549 proc-vmstat.pgalloc_normal
59870250 +5.3% 63063514 proc-vmstat.pgfault
72718474 +6.0% 77106313 proc-vmstat.pgfree
1.983e+10 +1.3% 2.01e+10 perf-stat.i.branch-instructions
66023349 +5.6% 69728143 perf-stat.i.cache-misses
2.023e+08 +4.7% 2.117e+08 perf-stat.i.cache-references
7.22 -1.9% 7.08 perf-stat.i.cpi
9738 -5.6% 9196 perf-stat.i.cycles-between-cache-misses
8.799e+10 +1.6% 8.939e+10 perf-stat.i.instructions
0.14 +1.6% 0.14 perf-stat.i.ipc
8.71 +5.1% 9.16 perf-stat.i.metric.K/sec
983533 +4.7% 1029816 perf-stat.i.minor-faults
983533 +4.7% 1029816 perf-stat.i.page-faults
7.30 -18.4% 5.96 ± 44% perf-stat.overall.cpi
9735 -21.3% 7658 ± 44% perf-stat.overall.cycles-between-cache-misses
0.52 +0.1 0.62 ± 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64
0.56 +0.1 0.67 ± 7% perf-profile.calltrace.cycles-pp.ftruncate64
0.34 ± 70% +0.3 0.60 ± 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
48.29 +0.6 48.86 perf-profile.calltrace.cycles-pp.__close
48.27 +0.6 48.84 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
48.27 +0.6 48.84 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
48.26 +0.6 48.83 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
0.00 +0.6 0.58 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
48.21 +0.6 48.80 perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
48.03 +0.6 48.68 perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
48.02 +0.6 48.66 perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.__x64_sys_close.do_syscall_64
47.76 +0.7 48.47 perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.__x64_sys_close
47.19 +0.7 47.92 perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
47.11 +0.8 47.88 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
0.74 -0.3 0.48 ± 8% perf-profile.children.cycles-pp.__munmap
0.69 -0.2 0.44 ± 9% perf-profile.children.cycles-pp.__x64_sys_munmap
0.68 -0.2 0.44 ± 9% perf-profile.children.cycles-pp.__vm_munmap
0.68 -0.2 0.45 ± 9% perf-profile.children.cycles-pp.do_vmi_munmap
0.65 -0.2 0.42 ± 8% perf-profile.children.cycles-pp.do_vmi_align_munmap
0.44 -0.2 0.28 ± 7% perf-profile.children.cycles-pp.unmap_region
0.48 -0.1 0.36 ± 7% perf-profile.children.cycles-pp.asm_exc_page_fault
0.42 -0.1 0.32 ± 7% perf-profile.children.cycles-pp.do_user_addr_fault
0.42 ± 2% -0.1 0.32 ± 7% perf-profile.children.cycles-pp.exc_page_fault
0.38 ± 2% -0.1 0.29 ± 7% perf-profile.children.cycles-pp.handle_mm_fault
0.35 ± 2% -0.1 0.27 ± 7% perf-profile.children.cycles-pp.__handle_mm_fault
0.33 ± 2% -0.1 0.26 ± 6% perf-profile.children.cycles-pp.do_fault
0.21 ± 2% -0.1 0.14 ± 8% perf-profile.children.cycles-pp.lru_add_drain
0.22 -0.1 0.15 ± 11% perf-profile.children.cycles-pp.alloc_inode
0.21 ± 2% -0.1 0.15 ± 9% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.18 ± 2% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.unmap_vmas
0.21 ± 2% -0.1 0.14 ± 7% perf-profile.children.cycles-pp.folio_batch_move_lru
0.17 -0.1 0.11 ± 8% perf-profile.children.cycles-pp.unmap_page_range
0.16 ± 2% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.zap_pte_range
0.16 ± 2% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.zap_pmd_range
0.26 ± 2% -0.1 0.20 ± 7% perf-profile.children.cycles-pp.shmem_fault
0.50 -0.1 0.45 ± 8% perf-profile.children.cycles-pp.mmap_region
0.26 ± 2% -0.1 0.20 ± 7% perf-profile.children.cycles-pp.__do_fault
0.26 -0.1 0.21 ± 6% perf-profile.children.cycles-pp.shmem_get_folio_gfp
0.19 ± 2% -0.1 0.14 ± 14% perf-profile.children.cycles-pp.write
0.22 ± 3% -0.0 0.18 ± 5% perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
0.11 ± 4% -0.0 0.07 ± 10% perf-profile.children.cycles-pp.mas_store_gfp
0.16 ± 2% -0.0 0.12 ± 11% perf-profile.children.cycles-pp.mas_wr_store_entry
0.14 -0.0 0.10 ± 10% perf-profile.children.cycles-pp.mas_wr_node_store
0.08 -0.0 0.04 ± 45% perf-profile.children.cycles-pp.msync
0.06 -0.0 0.02 ± 99% perf-profile.children.cycles-pp.mas_find
0.12 ± 4% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.inode_init_always
0.10 ± 3% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.shmem_alloc_inode
0.16 -0.0 0.13 ± 9% perf-profile.children.cycles-pp.__x64_sys_fcntl
0.11 ± 4% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.shmem_file_write_iter
0.10 ± 4% -0.0 0.08 ± 8% perf-profile.children.cycles-pp.do_fcntl
0.15 -0.0 0.13 ± 8% perf-profile.children.cycles-pp.destroy_inode
0.16 ± 3% -0.0 0.14 ± 7% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
0.22 ± 3% -0.0 0.20 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.08 -0.0 0.06 ± 11% perf-profile.children.cycles-pp.___slab_alloc
0.15 ± 3% -0.0 0.12 ± 8% perf-profile.children.cycles-pp.__destroy_inode
0.07 ± 7% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.__call_rcu_common
0.13 ± 2% -0.0 0.11 ± 8% perf-profile.children.cycles-pp.perf_event_mmap
0.09 -0.0 0.07 ± 9% perf-profile.children.cycles-pp.memfd_fcntl
0.06 -0.0 0.04 ± 44% perf-profile.children.cycles-pp.native_irq_return_iret
0.08 ± 6% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.shmem_add_to_page_cache
0.12 -0.0 0.10 ± 6% perf-profile.children.cycles-pp.perf_event_mmap_event
0.11 ± 3% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
0.10 -0.0 0.08 ± 8% perf-profile.children.cycles-pp.uncharge_batch
0.12 ± 4% -0.0 0.10 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.05 +0.0 0.07 ± 5% perf-profile.children.cycles-pp.__d_alloc
0.05 +0.0 0.07 ± 10% perf-profile.children.cycles-pp.d_alloc_pseudo
0.07 +0.0 0.09 ± 7% perf-profile.children.cycles-pp.file_init_path
0.06 ± 6% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.security_file_alloc
0.07 ± 7% +0.0 0.09 ± 7% perf-profile.children.cycles-pp.errseq_sample
0.04 ± 44% +0.0 0.07 ± 10% perf-profile.children.cycles-pp.apparmor_file_alloc_security
0.09 +0.0 0.12 ± 5% perf-profile.children.cycles-pp.init_file
0.15 +0.0 0.18 ± 7% perf-profile.children.cycles-pp.common_perm_cond
0.15 ± 3% +0.0 0.19 ± 8% perf-profile.children.cycles-pp.security_file_truncate
0.20 +0.0 0.24 ± 7% perf-profile.children.cycles-pp.notify_change
0.06 +0.0 0.10 ± 6% perf-profile.children.cycles-pp.inode_init_owner
0.13 +0.0 0.18 ± 5% perf-profile.children.cycles-pp.alloc_empty_file
0.10 +0.1 0.16 ± 7% perf-profile.children.cycles-pp.clear_nlink
0.47 +0.1 0.56 ± 7% perf-profile.children.cycles-pp.do_ftruncate
0.49 +0.1 0.59 ± 7% perf-profile.children.cycles-pp.__x64_sys_ftruncate
0.59 +0.1 0.70 ± 7% perf-profile.children.cycles-pp.ftruncate64
0.28 +0.1 0.40 ± 6% perf-profile.children.cycles-pp.alloc_file_pseudo
98.62 +0.2 98.77 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
98.58 +0.2 98.74 perf-profile.children.cycles-pp.do_syscall_64
48.30 +0.6 48.86 perf-profile.children.cycles-pp.__close
48.26 +0.6 48.83 perf-profile.children.cycles-pp.__x64_sys_close
48.21 +0.6 48.80 perf-profile.children.cycles-pp.__fput
48.04 +0.6 48.68 perf-profile.children.cycles-pp.dput
48.02 +0.6 48.67 perf-profile.children.cycles-pp.__dentry_kill
47.77 +0.7 48.47 perf-profile.children.cycles-pp.evict
0.30 -0.1 0.23 ± 7% perf-profile.self.cycles-pp._raw_spin_lock
0.10 ± 4% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.__fput
0.08 ± 6% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.inode_init_always
0.06 -0.0 0.04 ± 44% perf-profile.self.cycles-pp.native_irq_return_iret
0.08 -0.0 0.06 ± 7% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.09 -0.0 0.08 ± 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.07 +0.0 0.09 ± 7% perf-profile.self.cycles-pp.__shmem_get_inode
0.06 ± 7% +0.0 0.09 ± 9% perf-profile.self.cycles-pp.errseq_sample
0.15 ± 2% +0.0 0.18 ± 7% perf-profile.self.cycles-pp.common_perm_cond
0.03 ± 70% +0.0 0.06 ± 7% perf-profile.self.cycles-pp.apparmor_file_alloc_security
0.06 +0.0 0.10 ± 7% perf-profile.self.cycles-pp.inode_init_owner
0.10 +0.1 0.16 ± 6% perf-profile.self.cycles-pp.clear_nlink
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [linus:master] [mm/slub] 306c4ac989: stress-ng.seal.ops_per_sec 5.2% improvement
2024-07-25 8:04 [linus:master] [mm/slub] 306c4ac989: stress-ng.seal.ops_per_sec 5.2% improvement kernel test robot
@ 2024-07-25 10:11 ` Vlastimil Babka
0 siblings, 0 replies; 2+ messages in thread
From: Vlastimil Babka @ 2024-07-25 10:11 UTC (permalink / raw)
To: kernel test robot, Hyunmin Lee
Cc: oe-lkp, lkp, linux-kernel, Jeungwoo Yoo, Sangyun Kim,
Hyeonggon Yoo, Gwan-gyeong Mun, Christoph Lameter,
David Rientjes, linux-mm, ying.huang, feng.tang, fengwei.yin
On 7/25/24 10:04 AM, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a 5.2% improvement of stress-ng.seal.ops_per_sec on:
>
>
> commit: 306c4ac9896b07b8872293eb224058ff83f81fac ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
Well that's great news, but also highly unlikely that the commit would cause
such an improvement, as it only optimizes a once-per-boot operation of
create_kmalloc_caches(). Maybe there are secondary effects in different
order of slab cache creation resulting in some different cpu cache layout,
but such improvement could be machine and compiler specific and overall fragile.
> testcase: stress-ng
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> parameters:
>
> nr_threads: 100%
> testtime: 60s
> test: seal
> cpufreq_governor: performance
>
>
>
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240725/202407251553.12f35198-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/seal/stress-ng/60s
>
> commit:
> 844776cb65 ("mm/slub: mark racy access on slab->freelist")
> 306c4ac989 ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
>
> 844776cb65a77ef2 306c4ac9896b07b8872293eb224
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 2.51 ± 27% +1.9 4.44 ± 35% mpstat.cpu.all.idle%
> 975100 ± 19% +29.5% 1262643 ± 16% numa-meminfo.node1.AnonPages.max
> 187.06 ± 4% -11.5% 165.63 ± 10% sched_debug.cfs_rq:/.runnable_avg.stddev
> 0.05 ± 18% -40.0% 0.03 ± 58% vmstat.procs.b
> 58973718 +5.2% 62024061 stress-ng.seal.ops
> 982893 +5.2% 1033732 stress-ng.seal.ops_per_sec
> 59045344 +5.2% 62095668 stress-ng.time.minor_page_faults
> 174957 +1.4% 177400 proc-vmstat.nr_slab_unreclaimable
> 63634761 +5.5% 67148443 proc-vmstat.numa_hit
> 63399995 +5.5% 66914221 proc-vmstat.numa_local
> 73601172 +6.1% 78073549 proc-vmstat.pgalloc_normal
> 59870250 +5.3% 63063514 proc-vmstat.pgfault
> 72718474 +6.0% 77106313 proc-vmstat.pgfree
> 1.983e+10 +1.3% 2.01e+10 perf-stat.i.branch-instructions
> 66023349 +5.6% 69728143 perf-stat.i.cache-misses
> 2.023e+08 +4.7% 2.117e+08 perf-stat.i.cache-references
> 7.22 -1.9% 7.08 perf-stat.i.cpi
> 9738 -5.6% 9196 perf-stat.i.cycles-between-cache-misses
> 8.799e+10 +1.6% 8.939e+10 perf-stat.i.instructions
> 0.14 +1.6% 0.14 perf-stat.i.ipc
> 8.71 +5.1% 9.16 perf-stat.i.metric.K/sec
> 983533 +4.7% 1029816 perf-stat.i.minor-faults
> 983533 +4.7% 1029816 perf-stat.i.page-faults
> 7.30 -18.4% 5.96 ± 44% perf-stat.overall.cpi
> 9735 -21.3% 7658 ± 44% perf-stat.overall.cycles-between-cache-misses
> 0.52 +0.1 0.62 ± 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64
> 0.56 +0.1 0.67 ± 7% perf-profile.calltrace.cycles-pp.ftruncate64
> 0.34 ± 70% +0.3 0.60 ± 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
> 48.29 +0.6 48.86 perf-profile.calltrace.cycles-pp.__close
> 48.27 +0.6 48.84 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
> 48.27 +0.6 48.84 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
> 48.26 +0.6 48.83 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
> 0.00 +0.6 0.58 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
> 48.21 +0.6 48.80 perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
> 48.03 +0.6 48.68 perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 48.02 +0.6 48.66 perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.__x64_sys_close.do_syscall_64
> 47.76 +0.7 48.47 perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.__x64_sys_close
> 47.19 +0.7 47.92 perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
> 47.11 +0.8 47.88 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
> 0.74 -0.3 0.48 ± 8% perf-profile.children.cycles-pp.__munmap
> 0.69 -0.2 0.44 ± 9% perf-profile.children.cycles-pp.__x64_sys_munmap
> 0.68 -0.2 0.44 ± 9% perf-profile.children.cycles-pp.__vm_munmap
> 0.68 -0.2 0.45 ± 9% perf-profile.children.cycles-pp.do_vmi_munmap
> 0.65 -0.2 0.42 ± 8% perf-profile.children.cycles-pp.do_vmi_align_munmap
> 0.44 -0.2 0.28 ± 7% perf-profile.children.cycles-pp.unmap_region
> 0.48 -0.1 0.36 ± 7% perf-profile.children.cycles-pp.asm_exc_page_fault
> 0.42 -0.1 0.32 ± 7% perf-profile.children.cycles-pp.do_user_addr_fault
> 0.42 ± 2% -0.1 0.32 ± 7% perf-profile.children.cycles-pp.exc_page_fault
> 0.38 ± 2% -0.1 0.29 ± 7% perf-profile.children.cycles-pp.handle_mm_fault
> 0.35 ± 2% -0.1 0.27 ± 7% perf-profile.children.cycles-pp.__handle_mm_fault
> 0.33 ± 2% -0.1 0.26 ± 6% perf-profile.children.cycles-pp.do_fault
> 0.21 ± 2% -0.1 0.14 ± 8% perf-profile.children.cycles-pp.lru_add_drain
> 0.22 -0.1 0.15 ± 11% perf-profile.children.cycles-pp.alloc_inode
> 0.21 ± 2% -0.1 0.15 ± 9% perf-profile.children.cycles-pp.lru_add_drain_cpu
> 0.18 ± 2% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.unmap_vmas
> 0.21 ± 2% -0.1 0.14 ± 7% perf-profile.children.cycles-pp.folio_batch_move_lru
> 0.17 -0.1 0.11 ± 8% perf-profile.children.cycles-pp.unmap_page_range
> 0.16 ± 2% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.zap_pte_range
> 0.16 ± 2% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.zap_pmd_range
> 0.26 ± 2% -0.1 0.20 ± 7% perf-profile.children.cycles-pp.shmem_fault
> 0.50 -0.1 0.45 ± 8% perf-profile.children.cycles-pp.mmap_region
> 0.26 ± 2% -0.1 0.20 ± 7% perf-profile.children.cycles-pp.__do_fault
> 0.26 -0.1 0.21 ± 6% perf-profile.children.cycles-pp.shmem_get_folio_gfp
> 0.19 ± 2% -0.1 0.14 ± 14% perf-profile.children.cycles-pp.write
> 0.22 ± 3% -0.0 0.18 ± 5% perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
> 0.11 ± 4% -0.0 0.07 ± 10% perf-profile.children.cycles-pp.mas_store_gfp
> 0.16 ± 2% -0.0 0.12 ± 11% perf-profile.children.cycles-pp.mas_wr_store_entry
> 0.14 -0.0 0.10 ± 10% perf-profile.children.cycles-pp.mas_wr_node_store
> 0.08 -0.0 0.04 ± 45% perf-profile.children.cycles-pp.msync
> 0.06 -0.0 0.02 ± 99% perf-profile.children.cycles-pp.mas_find
> 0.12 ± 4% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.inode_init_always
> 0.10 ± 3% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.shmem_alloc_inode
> 0.16 -0.0 0.13 ± 9% perf-profile.children.cycles-pp.__x64_sys_fcntl
> 0.11 ± 4% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.shmem_file_write_iter
> 0.10 ± 4% -0.0 0.08 ± 8% perf-profile.children.cycles-pp.do_fcntl
> 0.15 -0.0 0.13 ± 8% perf-profile.children.cycles-pp.destroy_inode
> 0.16 ± 3% -0.0 0.14 ± 7% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
> 0.22 ± 3% -0.0 0.20 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 0.08 -0.0 0.06 ± 11% perf-profile.children.cycles-pp.___slab_alloc
> 0.15 ± 3% -0.0 0.12 ± 8% perf-profile.children.cycles-pp.__destroy_inode
> 0.07 ± 7% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.__call_rcu_common
> 0.13 ± 2% -0.0 0.11 ± 8% perf-profile.children.cycles-pp.perf_event_mmap
> 0.09 -0.0 0.07 ± 9% perf-profile.children.cycles-pp.memfd_fcntl
> 0.06 -0.0 0.04 ± 44% perf-profile.children.cycles-pp.native_irq_return_iret
> 0.08 ± 6% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.shmem_add_to_page_cache
> 0.12 -0.0 0.10 ± 6% perf-profile.children.cycles-pp.perf_event_mmap_event
> 0.11 ± 3% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
> 0.10 -0.0 0.08 ± 8% perf-profile.children.cycles-pp.uncharge_batch
> 0.12 ± 4% -0.0 0.10 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64
> 0.05 +0.0 0.07 ± 5% perf-profile.children.cycles-pp.__d_alloc
> 0.05 +0.0 0.07 ± 10% perf-profile.children.cycles-pp.d_alloc_pseudo
> 0.07 +0.0 0.09 ± 7% perf-profile.children.cycles-pp.file_init_path
> 0.06 ± 6% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.security_file_alloc
> 0.07 ± 7% +0.0 0.09 ± 7% perf-profile.children.cycles-pp.errseq_sample
> 0.04 ± 44% +0.0 0.07 ± 10% perf-profile.children.cycles-pp.apparmor_file_alloc_security
> 0.09 +0.0 0.12 ± 5% perf-profile.children.cycles-pp.init_file
> 0.15 +0.0 0.18 ± 7% perf-profile.children.cycles-pp.common_perm_cond
> 0.15 ± 3% +0.0 0.19 ± 8% perf-profile.children.cycles-pp.security_file_truncate
> 0.20 +0.0 0.24 ± 7% perf-profile.children.cycles-pp.notify_change
> 0.06 +0.0 0.10 ± 6% perf-profile.children.cycles-pp.inode_init_owner
> 0.13 +0.0 0.18 ± 5% perf-profile.children.cycles-pp.alloc_empty_file
> 0.10 +0.1 0.16 ± 7% perf-profile.children.cycles-pp.clear_nlink
> 0.47 +0.1 0.56 ± 7% perf-profile.children.cycles-pp.do_ftruncate
> 0.49 +0.1 0.59 ± 7% perf-profile.children.cycles-pp.__x64_sys_ftruncate
> 0.59 +0.1 0.70 ± 7% perf-profile.children.cycles-pp.ftruncate64
> 0.28 +0.1 0.40 ± 6% perf-profile.children.cycles-pp.alloc_file_pseudo
> 98.62 +0.2 98.77 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 98.58 +0.2 98.74 perf-profile.children.cycles-pp.do_syscall_64
> 48.30 +0.6 48.86 perf-profile.children.cycles-pp.__close
> 48.26 +0.6 48.83 perf-profile.children.cycles-pp.__x64_sys_close
> 48.21 +0.6 48.80 perf-profile.children.cycles-pp.__fput
> 48.04 +0.6 48.68 perf-profile.children.cycles-pp.dput
> 48.02 +0.6 48.67 perf-profile.children.cycles-pp.__dentry_kill
> 47.77 +0.7 48.47 perf-profile.children.cycles-pp.evict
> 0.30 -0.1 0.23 ± 7% perf-profile.self.cycles-pp._raw_spin_lock
> 0.10 ± 4% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.__fput
> 0.08 ± 6% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.inode_init_always
> 0.06 -0.0 0.04 ± 44% perf-profile.self.cycles-pp.native_irq_return_iret
> 0.08 -0.0 0.06 ± 7% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> 0.09 -0.0 0.08 ± 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> 0.07 +0.0 0.09 ± 7% perf-profile.self.cycles-pp.__shmem_get_inode
> 0.06 ± 7% +0.0 0.09 ± 9% perf-profile.self.cycles-pp.errseq_sample
> 0.15 ± 2% +0.0 0.18 ± 7% perf-profile.self.cycles-pp.common_perm_cond
> 0.03 ± 70% +0.0 0.06 ± 7% perf-profile.self.cycles-pp.apparmor_file_alloc_security
> 0.06 +0.0 0.10 ± 7% perf-profile.self.cycles-pp.inode_init_owner
> 0.10 +0.1 0.16 ± 6% perf-profile.self.cycles-pp.clear_nlink
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-07-25 10:11 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-25 8:04 [linus:master] [mm/slub] 306c4ac989: stress-ng.seal.ops_per_sec 5.2% improvement kernel test robot
2024-07-25 10:11 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox