linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [linux-next:master] [mm]  94962b2628: will-it-scale.per_process_ops 4.8% improvement
@ 2026-01-31 12:56 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2026-01-31 12:56 UTC (permalink / raw)
  To: Ankur Arora
  Cc: oe-lkp, lkp, Andrew Morton, David Hildenbrand, Andy Lutomirski,
	Borislav Petkov (AMD),
	Boris Ostrovsky, H. Peter Anvin, Ingo Molnar,
	Konrad Rzessutek Wilk, Lance Yang, Liam R. Howlett, Li Zhe,
	Lorenzo Stoakes, Mateusz Guzik, Matthew Wilcox, Michal Hocko,
	Mike Rapoport, Peter Zijlstra, Raghavendra K T,
	Suren Baghdasaryan, Thomas Gleixner, Vlastimil Babka, linux-mm,
	oliver.sang



Hello,

kernel test robot noticed a 4.8% improvement of will-it-scale.per_process_ops on:


commit: 94962b2628e6af2c48be6ebdf9f76add28d60ecc ("mm: folio_zero_user: clear page ranges")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


testcase: will-it-scale
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory
parameters:

	nr_task: 100%
	mode: process
	test: page_fault1
	cpufreq_governor: performance



Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260131/202601312034.df465f26-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-14/performance/x86_64-rhel-9.4/process/100%/debian-13-x86_64-20250902.cgz/lkp-ivb-2ep2/page_fault1/will-it-scale

commit: 
  9890ecab6a ("mm: folio_zero_user: clear pages sequentially")
  94962b2628 ("mm: folio_zero_user: clear page ranges")

9890ecab6ad9c0d3 94962b2628e6af2c48be6ebdf9f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    188907           -20.7%     149831        meminfo.Shmem
   2571826 ±  4%      -9.8%    2320837 ±  6%  numa-meminfo.node1.AnonPages.max
     55533            -5.8%      52308        vmstat.system.in
      0.05            +0.0        0.06        mpstat.cpu.all.soft%
      7.97            +1.2        9.13        mpstat.cpu.all.usr%
      0.50 ±  9%     +66.5%       0.83 ± 11%  perf-sched.sch_delay.avg.ms.[unknown].[unknown].[unknown].[unknown].[unknown]
      0.50 ±  9%     +66.5%       0.83 ± 11%  perf-sched.total_sch_delay.average.ms
  16857808            -5.8%   15880074        turbostat.IRQ
   1047246 ±  3%     -43.1%     596149 ±  2%  turbostat.NMI
  13326899            +4.8%   13968231        will-it-scale.48.processes
    277643            +4.8%     291004        will-it-scale.per_process_ops
  13326899            +4.8%   13968231        will-it-scale.workload
      7632 ±  4%     +53.0%      11676        perf-c2c.DRAM.local
    336.17 ± 15%    +172.4%     915.83 ± 12%  perf-c2c.DRAM.remote
     55.17 ± 28%    +109.1%     115.33 ±  9%  perf-c2c.HIT.remote
    548.33 ±  7%    +119.2%       1201 ±  4%  perf-c2c.HITM.local
    158.83 ±  6%    +183.7%     450.67 ±  3%  perf-c2c.HITM.remote
    980465            -1.1%     969931        proc-vmstat.nr_active_anon
    966522            -1.0%     956717        proc-vmstat.nr_file_pages
     47257           -20.8%      37450        proc-vmstat.nr_shmem
    980461            -1.1%     969926        proc-vmstat.nr_zone_active_anon
  16479858            +4.5%   17219093        proc-vmstat.numa_hit
  16430403            +4.5%   17165377        proc-vmstat.numa_local
  4.02e+09            +4.8%  4.213e+09        proc-vmstat.pgalloc_normal
   8603457            +4.3%    8969427        proc-vmstat.pgfault
  4.02e+09            +4.8%  4.213e+09        proc-vmstat.pgfree
   7834289            +4.8%    8210993        proc-vmstat.thp_fault_alloc
      6455 ±141%    +750.4%      54895 ± 70%  sched_debug.cfs_rq:/.left_deadline.avg
    309861 ±141%    +750.4%    2634973 ± 70%  sched_debug.cfs_rq:/.left_deadline.max
     44256 ±141%    +750.4%     376343 ± 70%  sched_debug.cfs_rq:/.left_deadline.stddev
      6455 ±141%    +750.4%      54895 ± 70%  sched_debug.cfs_rq:/.left_vruntime.avg
    309855 ±141%    +750.4%    2634967 ± 70%  sched_debug.cfs_rq:/.left_vruntime.max
     44255 ±141%    +750.4%     376342 ± 70%  sched_debug.cfs_rq:/.left_vruntime.stddev
    219.53          +873.1%       2136 ±185%  sched_debug.cfs_rq:/.load_avg.max
     60.71 ± 10%    +444.0%     330.25 ±167%  sched_debug.cfs_rq:/.load_avg.stddev
      6455 ±141%    +750.4%      54895 ± 70%  sched_debug.cfs_rq:/.right_vruntime.avg
    309855 ±141%    +750.4%    2634967 ± 70%  sched_debug.cfs_rq:/.right_vruntime.max
     44255 ±141%    +750.4%     376342 ± 70%  sched_debug.cfs_rq:/.right_vruntime.stddev
      8.51 ± 12%     +67.6%      14.25 ± 14%  sched_debug.cpu.clock.stddev
    148335 ± 37%     -79.4%      30520 ±  8%  sched_debug.cpu.nr_switches.max
     22646 ± 34%     -73.7%       5946 ±  8%  sched_debug.cpu.nr_switches.stddev
    586.24           +45.2%     851.22        perf-stat.i.MPKI
 3.795e+08           -23.6%  2.901e+08        perf-stat.i.branch-instructions
      1.19            +0.4        1.55        perf-stat.i.branch-miss-rate%
 8.737e+08            +4.8%  9.154e+08        perf-stat.i.cache-misses
 9.041e+08            +5.1%  9.506e+08        perf-stat.i.cache-references
     94.80           +38.7%     131.45        perf-stat.i.cpi
     83.43 ±  2%     -10.6%      74.57        perf-stat.i.cpu-migrations
    162.14            -4.5%     154.78        perf-stat.i.cycles-between-cache-misses
 1.806e+09           -22.0%  1.409e+09        perf-stat.i.instructions
      0.01           -20.5%       0.01        perf-stat.i.ipc
      0.10 ± 36%     -59.3%       0.04 ± 38%  perf-stat.i.major-faults
     28196            +4.2%      29394        perf-stat.i.minor-faults
     28196            +4.2%      29394        perf-stat.i.page-faults
    483.60           +34.6%     650.80        perf-stat.overall.MPKI
      1.92            +0.6        2.51        perf-stat.overall.branch-miss-rate%
     78.25           +28.5%     100.54        perf-stat.overall.cpi
    161.80            -4.5%     154.48        perf-stat.overall.cycles-between-cache-misses
      0.01           -22.2%       0.01        perf-stat.overall.ipc
     40822           -25.7%      30316        perf-stat.overall.path-length
 3.784e+08           -23.7%  2.886e+08        perf-stat.ps.branch-instructions
 8.706e+08            +4.8%  9.121e+08        perf-stat.ps.cache-misses
 9.009e+08            +5.1%  9.472e+08        perf-stat.ps.cache-references
     83.08 ±  2%     -10.6%      74.23        perf-stat.ps.cpu-migrations
   1.8e+09           -22.2%  1.402e+09        perf-stat.ps.instructions
      0.10 ± 36%     -59.5%       0.04 ± 39%  perf-stat.ps.major-faults
     28093            +4.2%      29283        perf-stat.ps.minor-faults
     28093            +4.2%      29283        perf-stat.ps.page-faults
  5.44e+11           -22.2%  4.235e+11        perf-stat.total.instructions
     86.63           -58.6       28.03        perf-profile.calltrace.cycles-pp.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
     97.92            -3.3       94.64        perf-profile.calltrace.cycles-pp.testcase
     88.40            -2.3       86.08        perf-profile.calltrace.cycles-pp.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     89.13            -2.3       86.86        perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     89.28            -2.3       87.02        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     89.18            -2.3       86.92        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     89.49            -2.3       87.24        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
     89.48            -2.2       87.23        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
     89.56            -2.2       87.32        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
      1.32            +0.3        1.60        perf-profile.calltrace.cycles-pp.prep_new_page.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof
      1.43            +0.3        1.72        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
      1.43            +0.3        1.71        perf-profile.calltrace.cycles-pp.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page
      1.41            +0.3        1.70        perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_frozen_pages_noprof.alloc_pages_mpol.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd
      1.45            +0.3        1.74        perf-profile.calltrace.cycles-pp.vma_alloc_folio_noprof.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
      1.45            +2.4        3.84        perf-profile.calltrace.cycles-pp.free_unref_folios.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu
      1.54            +2.6        4.16        perf-profile.calltrace.cycles-pp.folios_put_refs.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes
      1.54            +2.6        4.19        perf-profile.calltrace.cycles-pp.free_pages_and_swap_cache.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas
      1.54            +2.7        4.20        perf-profile.calltrace.cycles-pp.__tlb_batch_free_encoded_pages.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap
      1.55            +2.7        4.22        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap
      1.64            +2.9        4.52        perf-profile.calltrace.cycles-pp.vms_clear_ptes.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      1.67            +2.9        4.56        perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      1.65            +2.9        4.54        perf-profile.calltrace.cycles-pp.vms_complete_munmap_vmas.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      1.67            +2.9        4.56        perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.67            +2.9        4.56        perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      1.67            +2.9        4.56        perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      1.67            +2.9        4.57        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      1.67            +2.9        4.57        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
      1.68            +2.9        4.58        perf-profile.calltrace.cycles-pp.__munmap
      0.84 ±  5%    +111.8      112.61        perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.folio_zero_user.vma_alloc_anon_folio_pmd.__do_huge_pmd_anonymous_page.__handle_mm_fault
     97.97            -3.3       94.68        perf-profile.children.cycles-pp.testcase
     86.84            -2.6       84.21        perf-profile.children.cycles-pp.folio_zero_user
     88.40            -2.3       86.08        perf-profile.children.cycles-pp.vma_alloc_anon_folio_pmd
     89.13            -2.3       86.86        perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page
     89.21            -2.2       86.98        perf-profile.children.cycles-pp.__handle_mm_fault
     89.31            -2.2       87.09        perf-profile.children.cycles-pp.handle_mm_fault
     89.52            -2.2       87.30        perf-profile.children.cycles-pp.exc_page_fault
     89.51            -2.2       87.30        perf-profile.children.cycles-pp.do_user_addr_fault
     89.60            -2.2       87.40        perf-profile.children.cycles-pp.asm_exc_page_fault
      0.75 ±  5%      -0.4        0.39 ±  4%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.66 ±  5%      -0.3        0.34 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.66 ±  5%      -0.3        0.34 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.56 ±  6%      -0.3        0.28 ±  4%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.44 ±  6%      -0.2        0.22 ±  5%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.41 ±  7%      -0.2        0.20 ±  4%  perf-profile.children.cycles-pp.update_process_times
      0.27 ±  6%      -0.1        0.14 ±  3%  perf-profile.children.cycles-pp.sched_tick
      0.18 ±  7%      -0.1        0.09 ±  7%  perf-profile.children.cycles-pp.task_tick_fair
      0.10 ±  3%      -0.0        0.08 ±  5%  perf-profile.children.cycles-pp.__irqentry_text_end
      0.10 ±  4%      +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.12 ±  3%      +0.0        0.14 ±  5%  perf-profile.children.cycles-pp.___perf_sw_event
      0.07 ±  8%      +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.try_charge_memcg
      0.04 ± 44%      +0.0        0.06        perf-profile.children.cycles-pp.mod_memcg_lruvec_state
      0.05 ±  8%      +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.mod_node_page_state
      0.37            +0.0        0.39 ±  2%  perf-profile.children.cycles-pp.pte_alloc_one
      0.06 ±  9%      +0.0        0.08 ±  4%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.36            +0.0        0.38        perf-profile.children.cycles-pp.alloc_pages_noprof
      0.05            +0.0        0.08 ±  7%  perf-profile.children.cycles-pp.free_tail_page_prepare
      0.06 ±  9%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.00            +0.1        0.06 ±  9%  perf-profile.children.cycles-pp.x64_sys_call
      0.00            +0.1        0.06 ±  8%  perf-profile.children.cycles-pp.load_elf_binary
      0.00            +0.1        0.06 ±  6%  perf-profile.children.cycles-pp.exec_binprm
      0.01 ±223%      +0.1        0.07 ±  7%  perf-profile.children.cycles-pp.charge_memcg
      0.00            +0.1        0.06 ± 15%  perf-profile.children.cycles-pp.asm_sysvec_call_function
      0.00            +0.1        0.06 ±  6%  perf-profile.children.cycles-pp.bprm_execve
      0.00            +0.1        0.06 ±  6%  perf-profile.children.cycles-pp.folio_remove_rmap_pmd
      0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
      0.00            +0.1        0.07 ±  5%  perf-profile.children.cycles-pp.free_pgtables
      0.05            +0.1        0.14 ±  3%  perf-profile.children.cycles-pp.__mmap
      0.00            +0.1        0.09 ±  7%  perf-profile.children.cycles-pp.__folio_unqueue_deferred_split
      0.00            +0.1        0.09 ±  4%  perf-profile.children.cycles-pp.execve
      0.00            +0.1        0.09 ±  5%  perf-profile.children.cycles-pp.__x64_sys_execve
      0.00            +0.1        0.09 ±  5%  perf-profile.children.cycles-pp.do_execveat_common
      0.06            +0.1        0.16 ±  3%  perf-profile.children.cycles-pp.vm_mmap_pgoff
      0.05 ±  8%      +0.1        0.15 ±  5%  perf-profile.children.cycles-pp.do_mmap
      0.00            +0.1        0.11 ±  4%  perf-profile.children.cycles-pp.__mmap_region
      0.00            +0.1        0.13 ±  2%  perf-profile.children.cycles-pp.lru_gen_del_folio
      0.00            +0.1        0.14 ±  2%  perf-profile.children.cycles-pp.__page_cache_release
      0.08            +0.1        0.22 ±  3%  perf-profile.children.cycles-pp.zap_huge_pmd
      0.09 ±  4%      +0.2        0.25 ±  3%  perf-profile.children.cycles-pp.unmap_page_range
      0.08 ±  5%      +0.2        0.24 ±  3%  perf-profile.children.cycles-pp.zap_pmd_range
      0.09 ±  6%      +0.2        0.25        perf-profile.children.cycles-pp.unmap_vmas
      1.47            +0.3        1.75        perf-profile.children.cycles-pp.prep_new_page
      1.64            +0.3        1.93        perf-profile.children.cycles-pp.get_page_from_freelist
      1.46            +0.3        1.75        perf-profile.children.cycles-pp.vma_alloc_folio_noprof
      1.79            +0.3        2.10        perf-profile.children.cycles-pp.alloc_pages_mpol
      1.78            +0.3        2.09        perf-profile.children.cycles-pp.__alloc_frozen_pages_noprof
      1.46            +2.4        3.86        perf-profile.children.cycles-pp.free_unref_folios
      1.54            +2.6        4.17        perf-profile.children.cycles-pp.folios_put_refs
      1.54            +2.7        4.20        perf-profile.children.cycles-pp.free_pages_and_swap_cache
      1.55            +2.7        4.20        perf-profile.children.cycles-pp.__tlb_batch_free_encoded_pages
      1.55            +2.7        4.23        perf-profile.children.cycles-pp.tlb_finish_mmu
      1.64            +2.9        4.52        perf-profile.children.cycles-pp.vms_clear_ptes
      1.65            +2.9        4.54        perf-profile.children.cycles-pp.vms_complete_munmap_vmas
      1.67            +2.9        4.56        perf-profile.children.cycles-pp.do_vmi_align_munmap
      1.67            +2.9        4.57        perf-profile.children.cycles-pp.__x64_sys_munmap
      1.67            +2.9        4.57        perf-profile.children.cycles-pp.__vm_munmap
      1.67            +2.9        4.56        perf-profile.children.cycles-pp.do_vmi_munmap
      1.68            +2.9        4.58        perf-profile.children.cycles-pp.__munmap
      1.93            +3.2        5.12        perf-profile.children.cycles-pp.do_syscall_64
      1.93            +3.2        5.12        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.96 ±  5%     +55.6       56.60        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
     84.87            -1.2       83.67        perf-profile.self.cycles-pp.folio_zero_user
      8.24            -1.0        7.28        perf-profile.self.cycles-pp.testcase
      0.10 ±  3%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.__irqentry_text_end
      0.08 ±  5%      +0.0        0.10 ±  4%  perf-profile.self.cycles-pp.mas_walk
      0.05 ±  8%      +0.0        0.07 ±  5%  perf-profile.self.cycles-pp.mod_node_page_state
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.__alloc_frozen_pages_noprof
      0.00            +0.1        0.06 ±  6%  perf-profile.self.cycles-pp.free_tail_page_prepare
      0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.zap_huge_pmd
      0.00            +0.1        0.08 ±  8%  perf-profile.self.cycles-pp.__folio_unqueue_deferred_split
      0.00            +0.1        0.10 ±  3%  perf-profile.self.cycles-pp.lru_gen_del_folio
      0.00            +0.2        0.17 ±  2%  perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
      1.32            +0.3        1.60        perf-profile.self.cycles-pp.prep_new_page
      1.40            +2.3        3.75        perf-profile.self.cycles-pp.free_unref_folios




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-01-31 12:56 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-31 12:56 [linux-next:master] [mm] 94962b2628: will-it-scale.per_process_ops 4.8% improvement kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox