* [linux-next:master] [mm] 94962b2628: will-it-scale.per_process_ops 4.8% improvement
@ 2026-01-31 12:56 kernel test robot
0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2026-01-31 12:56 UTC (permalink / raw)
To: Ankur Arora
Cc: oe-lkp, lkp, Andrew Morton, David Hildenbrand, Andy Lutomirski,
Borislav Petkov (AMD),
Boris Ostrovsky, H. Peter Anvin, Ingo Molnar,
Konrad Rzessutek Wilk, Lance Yang, Liam R. Howlett, Li Zhe,
Lorenzo Stoakes, Mateusz Guzik, Matthew Wilcox, Michal Hocko,
Mike Rapoport, Peter Zijlstra, Raghavendra K T,
Suren Baghdasaryan, Thomas Gleixner, Vlastimil Babka, linux-mm,
oliver.sang
Hello,
kernel test robot noticed a 4.8% improvement of will-it-scale.per_process_ops on:
commit: 94962b2628e6af2c48be6ebdf9f76add28d60ecc ("mm: folio_zero_user: clear page ranges")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
parameters:
nr_task: 100%
mode: process
test: page_fault1
cpufreq_governor: performance
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260131/202601312034.df465f26-lkp@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-ivb-2ep2/page_fault1/will-it-scale
commit:
9890ecab6a ("mm: folio_zero_user: clear pages sequentially")
94962b2628 ("mm: folio_zero_user: clear page ranges")
9890ecab6ad9c0d3 94962b2628e6af2c48be6ebdf9f
---------------- ---------------------------
%stddev %change %stddev
\ | \
188907 -20.7% 149831 meminfo.Shmem
2571826 ± 4% -9.8% 2320837 ± 6% numa-meminfo.node1.AnonPages.max
55533 -5.8% 52308 vmstat.system.in
0.05 +0.0 0.06 mpstat.cpu.all.soft%
7.97 +1.2 9.13 mpstat.cpu.all.usr%
0.50 ± 9% +66.5% 0.83 ± 11% perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
0.50 ± 9% +66.5% 0.83 ± 11% perf-sched.total_sch_delay.average.ms
16857808 -5.8% 15880074 turbostat.IRQ
1047246 ± 3% -43.1% 596149 ± 2% turbostat.NMI
13326899 +4.8% 13968231 will-it-scale.48.processes
277643 +4.8% 291004 will-it-scale.per_process_ops
13326899 +4.8% 13968231 will-it-scale.workload
7632 ± 4% +53.0% 11676 perf-c2c.DRAM.local
336.17 ± 15% +172.4% 915.83 ± 12% perf-c2c.DRAM.remote
55.17 ± 28% +109.1% 115.33 ± 9% perf-c2c.HIT.remote
548.33 ± 7% +119.2% 1201 ± 4% perf-c2c.HITM.local
158.83 ± 6% +183.7% 450.67 ± 3% perf-c2c.HITM.remote
980465 -1.1% 969931 proc-vmstat.nr_active_anon
966522 -1.0% 956717 proc-vmstat.nr_file_pages
47257 -20.8% 37450 proc-vmstat.nr_shmem
980461 -1.1% 969926 proc-vmstat.nr_zone_active_anon
16479858 +4.5% 17219093 proc-vmstat.numa_hit
16430403 +4.5% 17165377 proc-vmstat.numa_local
4.02e+09 +4.8% 4.213e+09 proc-vmstat.pgalloc_normal
8603457 +4.3% 8969427 proc-vmstat.pgfault
4.02e+09 +4.8% 4.213e+09 proc-vmstat.pgfree
7834289 +4.8% 8210993 proc-vmstat.thp_fault_alloc
6455 ±141% +750.4% 54895 ± 70% sched_debug.cfs_rq:/.left_deadline.avg
309861 ±141% +750.4% 2634973 ± 70% sched_debug.cfs_rq:/.left_deadline.max
44256 ±141% +750.4% 376343 ± 70% sched_debug.cfs_rq:/.left_deadline.stddev
6455 ±141% +750.4% 54895 ± 70% sched_debug.cfs_rq:/.left_vruntime.avg
309855 ±141% +750.4% 2634967 ± 70% sched_debug.cfs_rq:/.left_vruntime.max
44255 ±141% +750.4% 376342 ± 70% sched_debug.cfs_rq:/.left_vruntime.stddev
219.53 +873.1% 2136 ±185% sched_debug.cfs_rq:/.load_avg.max
60.71 ± 10% +444.0% 330.25 ±167% sched_debug.cfs_rq:/.load_avg.stddev
6455 ±141% +750.4% 54895 ± 70% sched_debug.cfs_rq:/.right_vruntime.avg
309855 ±141% +750.4% 2634967 ± 70% sched_debug.cfs_rq:/.right_vruntime.max
44255 ±141% +750.4% 376342 ± 70% sched_debug.cfs_rq:/.right_vruntime.stddev
8.51 ± 12% +67.6% 14.25 ± 14% sched_debug.cpu.clock.stddev
148335 ± 37% -79.4% 30520 ± 8% sched_debug.cpu.nr_switches.max
22646 ± 34% -73.7% 5946 ± 8% sched_debug.cpu.nr_switches.stddev
586.24 +45.2% 851.22 perf-stat.i.MPKI
3.795e+08 -23.6% 2.901e+08 perf-stat.i.branch-instructions
1.19 +0.4 1.55 perf-stat.i.branch-miss-rate%
8.737e+08 +4.8% 9.154e+08 perf-stat.i.cache-misses
9.041e+08 +5.1% 9.506e+08 perf-stat.i.cache-references
94.80 +38.7% 131.45 perf-stat.i.cpi
83.43 ± 2% -10.6% 74.57 perf-stat.i.cpu-migrations
162.14 -4.5% 154.78 perf-stat.i.cycles-between-cache-misses
1.806e+09 -22.0% 1.409e+09 perf-stat.i.instructions
0.01 -20.5% 0.01 perf-stat.i.ipc
0.10 ± 36% -59.3% 0.04 ± 38% perf-stat.i.major-faults
28196 +4.2% 29394 perf-stat.i.minor-faults
28196 +4.2% 29394 perf-stat.i.page-faults
483.60 +34.6% 650.80 perf-stat.overall.MPKI
1.92 +0.6 2.51 perf-stat.overall.branch-miss-rate%
78.25 +28.5% 100.54 perf-stat.overall.cpi
161.80 -4.5% 154.48 perf-stat.overall.cycles-between-cache-misses
0.01 -22.2% 0.01 perf-stat.overall.ipc
40822 -25.7% 30316 perf-stat.overall.path-length
3.784e+08 -23.7% 2.886e+08 perf-stat.ps.branch-instructions
8.706e+08 +4.8% 9.121e+08 perf-stat.ps.cache-misses
9.009e+08 +5.1% 9.472e+08 perf-stat.ps.cache-references
83.08 ± 2% -10.6% 74.23 perf-stat.ps.cpu-migrations
1.8e+09 -22.2% 1.402e+09 perf-stat.ps.instructions
0.10 ± 36% -59.5% 0.04 ± 39% perf-stat.ps.major-faults
28093 +4.2% 29283 perf-stat.ps.minor-faults
28093 +4.2% 29283 perf-stat.ps.page-faults
5.44e+11 -22.2% 4.235e+11 perf-stat.total.instructions
86.63 -58.6 28.03 perf-profile.calltrace.cycles-pp.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
97.92 -3.3 94.64 perf-profile.calltrace.cycles-pp.testcase
88.40 -2.3 86.08 perf-profile.calltrace.cycles-pp.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
89.13 -2.3 86.86 perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
89.28 -2.3 87.02 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
89.18 -2.3 86.92 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
89.49 -2.3 87.24 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
89.48 -2.2 87.23 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
89.56 -2.2 87.32 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
1.32 +0.3 1.60 perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof
1.43 +0.3 1.72 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
1.43 +0.3 1.71 perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page
1.41 +0.3 1.70 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd
1.45 +0.3 1.74 perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
1.45 +2.4 3.84 perf-profile.calltrace.cycles-pp.free_unref_folios.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
1.54 +2.6 4.16 perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes
1.54 +2.6 4.19 perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas
1.54 +2.7 4.20 perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap
1.55 +2.7 4.22 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
1.64 +2.9 4.52 perf-profile.calltrace.cycles-pp.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
1.67 +2.9 4.56 perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
1.65 +2.9 4.54 perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
1.67 +2.9 4.56 perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.67 +2.9 4.56 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
1.67 +2.9 4.56 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
1.67 +2.9 4.57 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
1.67 +2.9 4.57 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
1.68 +2.9 4.58 perf-profile.calltrace.cycles-pp.__munmap
0.84 ± 5% +111.8 112.61 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
97.97 -3.3 94.68 perf-profile.children.cycles-pp.testcase
86.84 -2.6 84.21 perf-profile.children.cycles-pp.folio_zero_user
88.40 -2.3 86.08 perf-profile.children.cycles-pp.vma_alloc_anon_folio_pmd
89.13 -2.3 86.86 perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
89.21 -2.2 86.98 perf-profile.children.cycles-pp.__handle_mm_fault
89.31 -2.2 87.09 perf-profile.children.cycles-pp.handle_mm_fault
89.52 -2.2 87.30 perf-profile.children.cycles-pp.exc_page_fault
89.51 -2.2 87.30 perf-profile.children.cycles-pp.do_user_addr_fault
89.60 -2.2 87.40 perf-profile.children.cycles-pp.asm_exc_page_fault
0.75 ± 5% -0.4 0.39 ± 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.66 ± 5% -0.3 0.34 ± 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.66 ± 5% -0.3 0.34 ± 3% perf-profile.children.cycles-pp.hrtimer_interrupt
0.56 ± 6% -0.3 0.28 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.44 ± 6% -0.2 0.22 ± 5% perf-profile.children.cycles-pp.tick_nohz_handler
0.41 ± 7% -0.2 0.20 ± 4% perf-profile.children.cycles-pp.update_process_times
0.27 ± 6% -0.1 0.14 ± 3% perf-profile.children.cycles-pp.sched_tick
0.18 ± 7% -0.1 0.09 ± 7% perf-profile.children.cycles-pp.task_tick_fair
0.10 ± 3% -0.0 0.08 ± 5% perf-profile.children.cycles-pp.__irqentry_text_end
0.10 ± 4% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.lock_vma_under_rcu
0.12 ± 3% +0.0 0.14 ± 5% perf-profile.children.cycles-pp.___perf_sw_event
0.07 ± 8% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.try_charge_memcg
0.04 ± 44% +0.0 0.06 perf-profile.children.cycles-pp.mod_memcg_lruvec_state
0.05 ± 8% +0.0 0.07 ± 5% perf-profile.children.cycles-pp.mod_node_page_state
0.37 +0.0 0.39 ± 2% perf-profile.children.cycles-pp.pte_alloc_one
0.06 ± 9% +0.0 0.08 ± 4% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.36 +0.0 0.38 perf-profile.children.cycles-pp.alloc_pages_noprof
0.05 +0.0 0.08 ± 7% perf-profile.children.cycles-pp.free_tail_page_prepare
0.06 ± 9% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.__mem_cgroup_charge
0.00 +0.1 0.06 ± 9% perf-profile.children.cycles-pp.x64_sys_call
0.00 +0.1 0.06 ± 8% perf-profile.children.cycles-pp.load_elf_binary
0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.exec_binprm
0.01 ±223% +0.1 0.07 ± 7% perf-profile.children.cycles-pp.charge_memcg
0.00 +0.1 0.06 ± 15% perf-profile.children.cycles-pp.asm_sysvec_call_function
0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.bprm_execve
0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.folio_remove_rmap_pmd
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
0.00 +0.1 0.07 ± 5% perf-profile.children.cycles-pp.free_pgtables
0.05 +0.1 0.14 ± 3% perf-profile.children.cycles-pp.__mmap
0.00 +0.1 0.09 ± 7% perf-profile.children.cycles-pp.__folio_unqueue_deferred_split
0.00 +0.1 0.09 ± 4% perf-profile.children.cycles-pp.execve
0.00 +0.1 0.09 ± 5% perf-profile.children.cycles-pp.__x64_sys_execve
0.00 +0.1 0.09 ± 5% perf-profile.children.cycles-pp.do_execveat_common
0.06 +0.1 0.16 ± 3% perf-profile.children.cycles-pp.vm_mmap_pgoff
0.05 ± 8% +0.1 0.15 ± 5% perf-profile.children.cycles-pp.do_mmap
0.00 +0.1 0.11 ± 4% perf-profile.children.cycles-pp.__mmap_region
0.00 +0.1 0.13 ± 2% perf-profile.children.cycles-pp.lru_gen_del_folio
0.00 +0.1 0.14 ± 2% perf-profile.children.cycles-pp.__page_cache_release
0.08 +0.1 0.22 ± 3% perf-profile.children.cycles-pp.zap_huge_pmd
0.09 ± 4% +0.2 0.25 ± 3% perf-profile.children.cycles-pp.unmap_page_range
0.08 ± 5% +0.2 0.24 ± 3% perf-profile.children.cycles-pp.zap_pmd_range
0.09 ± 6% +0.2 0.25 perf-profile.children.cycles-pp.unmap_vmas
1.47 +0.3 1.75 perf-profile.children.cycles-pp.prep_new_page
1.64 +0.3 1.93 perf-profile.children.cycles-pp.get_page_from_freelist
1.46 +0.3 1.75 perf-profile.children.cycles-pp.vma_alloc_folio_noprof
1.79 +0.3 2.10 perf-profile.children.cycles-pp.alloc_pages_mpol
1.78 +0.3 2.09 perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
1.46 +2.4 3.86 perf-profile.children.cycles-pp.free_unref_folios
1.54 +2.6 4.17 perf-profile.children.cycles-pp.folios_put_refs
1.54 +2.7 4.20 perf-profile.children.cycles-pp.free_pages_and_swap_cache
1.55 +2.7 4.20 perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
1.55 +2.7 4.23 perf-profile.children.cycles-pp.tlb_finish_mmu
1.64 +2.9 4.52 perf-profile.children.cycles-pp.vms_clear_ptes
1.65 +2.9 4.54 perf-profile.children.cycles-pp.vms_complete_munmap_vmas
1.67 +2.9 4.56 perf-profile.children.cycles-pp.do_vmi_align_munmap
1.67 +2.9 4.57 perf-profile.children.cycles-pp.__x64_sys_munmap
1.67 +2.9 4.57 perf-profile.children.cycles-pp.__vm_munmap
1.67 +2.9 4.56 perf-profile.children.cycles-pp.do_vmi_munmap
1.68 +2.9 4.58 perf-profile.children.cycles-pp.__munmap
1.93 +3.2 5.12 perf-profile.children.cycles-pp.do_syscall_64
1.93 +3.2 5.12 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.96 ± 5% +55.6 56.60 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
84.87 -1.2 83.67 perf-profile.self.cycles-pp.folio_zero_user
8.24 -1.0 7.28 perf-profile.self.cycles-pp.testcase
0.10 ± 3% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.__irqentry_text_end
0.08 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.mas_walk
0.05 ± 8% +0.0 0.07 ± 5% perf-profile.self.cycles-pp.mod_node_page_state
0.00 +0.1 0.05 perf-profile.self.cycles-pp.__alloc_frozen_pages_noprof
0.00 +0.1 0.06 ± 6% perf-profile.self.cycles-pp.free_tail_page_prepare
0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.zap_huge_pmd
0.00 +0.1 0.08 ± 8% perf-profile.self.cycles-pp.__folio_unqueue_deferred_split
0.00 +0.1 0.10 ± 3% perf-profile.self.cycles-pp.lru_gen_del_folio
0.00 +0.2 0.17 ± 2% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
1.32 +0.3 1.60 perf-profile.self.cycles-pp.prep_new_page
1.40 +2.3 3.75 perf-profile.self.cycles-pp.free_unref_folios
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-01-31 12:56 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-31 12:56 [linux-next:master] [mm] 94962b2628: will-it-scale.per_process_ops 4.8% improvement kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox