Greeting, FYI, we noticed a -1.2% regression of will-it-scale.per_process_ops due to commit: commit: 0f5b256b2c35bf7d0faf874ed01227b4b7cb0118 ("[PATCH v10 3/6] mm: Introduce Reported pages") url: https://github.com/0day-ci/linux/commits/Alexander-Duyck/mm-virtio-Provide-support-for-unused-page-reporting/20190919-015544 in testcase: will-it-scale on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory with following parameters: nr_task: 100% mode: process test: page_fault2 cpufreq_governor: performance ucode: 0xb000036 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-2019-05-14.cgz/lkp-bdw-ep6/page_fault2/will-it-scale/0xb000036 commit: e10e2ab29d ("mm: Use zone and order instead of free area in free_list manipulators") 0f5b256b2c ("mm: Introduce Reported pages") e10e2ab29d6d4ee2 0f5b256b2c35bf7d0faf874ed01 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | 1:4 -25% :4 dmesg.WARNING:at_ip___perf_sw_event/0x 1:4 -25% :4 dmesg.WARNING:at_ip__fsnotify_parent/0x 3:4 1% 3:4 perf-profile.calltrace.cycles-pp.error_entry.testcase 3:4 1% 3:4 perf-profile.children.cycles-pp.error_entry 2:4 1% 2:4 perf-profile.self.cycles-pp.error_entry %stddev %change %stddev \ | \ 83249 -1.2% 82239 will-it-scale.per_process_ops 7325970 -1.2% 7237126 will-it-scale.workload 1137 ± 2% -2.8% 1105 ± 2% vmstat.system.cs 9785 +11.8% 10942 ± 5% softirqs.CPU0.SCHED 7282 ± 30% +30.5% 9504 ± 9% softirqs.CPU2.RCU 2.211e+09 -1.2% 2.185e+09 proc-vmstat.numa_hit 2.211e+09 -1.2% 2.185e+09 proc-vmstat.numa_local 2.213e+09 -1.2% 2.186e+09 proc-vmstat.pgalloc_normal 2.204e+09 -1.2% 2.178e+09 proc-vmstat.pgfault 2.212e+09 -1.2% 2.185e+09 proc-vmstat.pgfree 232.75 ± 21% +359.8% 1070 ± 66% interrupts.37:IR-PCI-MSI.1572868-edge.eth0-TxRx-3 232.75 ± 21% +359.8% 1070 ± 66% interrupts.CPU16.37:IR-PCI-MSI.1572868-edge.eth0-TxRx-3 34.00 ± 76% +447.1% 186.00 ±114% interrupts.CPU18.RES:Rescheduling_interrupts 318.00 ± 42% -65.8% 108.75 ± 77% interrupts.CPU28.RES:Rescheduling_interrupts 173.50 ± 32% +143.7% 422.75 ± 28% interrupts.CPU36.RES:Rescheduling_interrupts 70.75 ± 73% +726.1% 584.50 ± 60% interrupts.CPU39.RES:Rescheduling_interrupts 66.75 ± 38% +78.7% 119.25 ± 19% interrupts.CPU83.RES:Rescheduling_interrupts 286.00 ± 93% -88.0% 34.25 ± 97% interrupts.CPU84.RES:Rescheduling_interrupts 41205135 -3.7% 39666469 perf-stat.i.branch-misses 1096 ± 2% -2.9% 1064 ± 2% perf-stat.i.context-switches 3.60 +3.8% 3.74 ± 3% perf-stat.i.cpi 34.38 ± 2% -4.6% 32.80 ± 2% perf-stat.i.cpu-migrations 547.67 +323.8% 2321 ± 80% perf-stat.i.cycles-between-cache-misses 71391394 -3.5% 68859821 ± 2% perf-stat.i.dTLB-store-misses 14877534 -3.1% 14415629 ± 2% perf-stat.i.iTLB-load-misses 7256992 -3.0% 7036423 ± 2% perf-stat.i.minor-faults 1.272e+08 -3.3% 1.231e+08 perf-stat.i.node-loads 2.62 +1.0 3.64 ± 25% perf-stat.i.node-store-miss-rate% 847863 +4.6% 887118 perf-stat.i.node-store-misses 31585215 -3.4% 30501990 ± 2% perf-stat.i.node-stores 7256096 -3.0% 7035925 ± 2% perf-stat.i.page-faults 0.33 ± 3% +0.0 0.34 ± 2% perf-stat.overall.node-load-miss-rate% 2.61 +0.2 2.83 ± 2% perf-stat.overall.node-store-miss-rate% 2791987 +1.1% 2822374 perf-stat.overall.path-length 41058236 -3.7% 39547998 perf-stat.ps.branch-misses 34.24 ± 2% -4.5% 32.69 perf-stat.ps.cpu-migrations 71150671 -3.5% 68679628 ± 2% perf-stat.ps.dTLB-store-misses 14827264 -3.0% 14377324 perf-stat.ps.iTLB-load-misses 7230962 -3.0% 7016784 ± 2% perf-stat.ps.minor-faults 1.268e+08 -3.2% 1.228e+08 perf-stat.ps.node-loads 844990 +4.7% 884796 perf-stat.ps.node-store-misses 31478503 -3.4% 30422235 ± 2% perf-stat.ps.node-stores 7230628 -3.0% 7016690 ± 2% perf-stat.ps.page-faults 4.10 -0.7 3.43 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte 4.13 -0.7 3.47 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte.finish_fault 5.35 -0.7 4.69 perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte.finish_fault.__handle_mm_fault 5.44 -0.7 4.78 perf-profile.calltrace.cycles-pp.__lru_cache_add.alloc_set_pte.finish_fault.__handle_mm_fault.handle_mm_fault 7.50 -0.6 6.87 perf-profile.calltrace.cycles-pp.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault 7.41 -0.6 6.78 perf-profile.calltrace.cycles-pp.alloc_set_pte.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault 53.51 -0.3 53.17 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 53.93 -0.3 53.62 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault.testcase 1.90 -0.3 1.60 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.unmap_page_range 1.92 -0.3 1.62 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.unmap_page_range.unmap_vmas 54.98 -0.3 54.69 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault.testcase 55.33 -0.3 55.04 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault.testcase 61.72 -0.3 61.44 perf-profile.calltrace.cycles-pp.testcase 59.26 -0.2 59.03 perf-profile.calltrace.cycles-pp.page_fault.testcase 0.74 ± 2% -0.2 0.52 ± 3% perf-profile.calltrace.cycles-pp.__list_del_entry_valid.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.__handle_mm_fault 4.05 +0.0 4.09 perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap 4.08 +0.0 4.12 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap 4.10 +0.0 4.14 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 1.59 +0.1 1.64 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault 1.73 +0.1 1.78 perf-profile.calltrace.cycles-pp.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault 1.21 +0.1 1.28 perf-profile.calltrace.cycles-pp.find_lock_entry.shmem_getpage_gfp.shmem_fault.__do_fault.__handle_mm_fault 0.97 +0.1 1.03 perf-profile.calltrace.cycles-pp.find_get_entry.find_lock_entry.shmem_getpage_gfp.shmem_fault.__do_fault 1.38 +0.1 1.44 perf-profile.calltrace.cycles-pp.shmem_getpage_gfp.shmem_fault.__do_fault.__handle_mm_fault.handle_mm_fault 3.71 +0.1 3.79 perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region 3.64 +0.1 3.73 perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_flush_mmu.tlb_finish_mmu 33.13 +0.2 33.33 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 33.12 +0.2 33.32 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap 31.65 +0.2 31.87 perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.unmap_page_range.unmap_vmas.unmap_region 31.88 +0.2 32.11 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.unmap_page_range.unmap_vmas.unmap_region.__do_munmap 37.24 +0.2 37.48 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.munmap 37.24 +0.2 37.48 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap 37.24 +0.2 37.49 perf-profile.calltrace.cycles-pp.munmap 37.23 +0.2 37.48 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 37.23 +0.2 37.48 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap 37.23 +0.2 37.48 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap 37.23 +0.2 37.48 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 28.96 +0.6 29.55 perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.tlb_flush_mmu.unmap_page_range.unmap_vmas 28.48 +0.6 29.07 perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_flush_mmu.unmap_page_range 30.94 +0.7 31.65 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page_list.release_pages 31.01 +0.7 31.72 perf-profile.calltrace.cycles-pp._raw_spin_lock.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_flush_mmu 6.32 -1.0 5.30 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 5.45 -0.7 4.79 perf-profile.children.cycles-pp.__lru_cache_add 5.36 -0.7 4.70 perf-profile.children.cycles-pp.pagevec_lru_move_fn 7.46 -0.6 6.82 perf-profile.children.cycles-pp.alloc_set_pte 7.51 -0.6 6.88 perf-profile.children.cycles-pp.finish_fault 53.54 -0.3 53.20 perf-profile.children.cycles-pp.__handle_mm_fault 53.96 -0.3 53.65 perf-profile.children.cycles-pp.handle_mm_fault 55.00 -0.3 54.71 perf-profile.children.cycles-pp.__do_page_fault 55.34 -0.3 55.05 perf-profile.children.cycles-pp.do_page_fault 57.38 -0.3 57.12 perf-profile.children.cycles-pp.page_fault 62.64 -0.3 62.38 perf-profile.children.cycles-pp.testcase 0.99 -0.2 0.76 perf-profile.children.cycles-pp.__list_del_entry_valid 0.39 -0.0 0.36 ± 2% perf-profile.children.cycles-pp.__mod_lruvec_state 4.11 +0.0 4.14 perf-profile.children.cycles-pp.tlb_finish_mmu 1.60 +0.0 1.65 perf-profile.children.cycles-pp.shmem_fault 1.73 +0.1 1.79 perf-profile.children.cycles-pp.__do_fault 1.40 +0.1 1.46 perf-profile.children.cycles-pp.shmem_getpage_gfp 1.23 +0.1 1.29 perf-profile.children.cycles-pp.find_lock_entry 0.97 +0.1 1.03 perf-profile.children.cycles-pp.find_get_entry 33.13 +0.2 33.33 perf-profile.children.cycles-pp.unmap_vmas 33.13 +0.2 33.33 perf-profile.children.cycles-pp.unmap_page_range 37.24 +0.2 37.49 perf-profile.children.cycles-pp.munmap 37.23 +0.2 37.48 perf-profile.children.cycles-pp.__do_munmap 37.23 +0.2 37.48 perf-profile.children.cycles-pp.__x64_sys_munmap 37.23 +0.2 37.48 perf-profile.children.cycles-pp.__vm_munmap 37.23 +0.2 37.48 perf-profile.children.cycles-pp.unmap_region 37.33 +0.2 37.57 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 37.33 +0.2 37.57 perf-profile.children.cycles-pp.do_syscall_64 35.97 +0.3 36.22 perf-profile.children.cycles-pp.tlb_flush_mmu 35.83 +0.3 36.09 perf-profile.children.cycles-pp.release_pages 32.70 +0.7 33.38 perf-profile.children.cycles-pp.free_unref_page_list 32.15 +0.7 32.84 perf-profile.children.cycles-pp.free_pcppages_bulk 64.33 +0.9 65.19 perf-profile.children.cycles-pp._raw_spin_lock 0.98 -0.2 0.75 perf-profile.self.cycles-pp.__list_del_entry_valid 0.95 -0.0 0.92 perf-profile.self.cycles-pp.free_pcppages_bulk 0.14 ± 3% -0.0 0.13 ± 3% perf-profile.self.cycles-pp.__mod_lruvec_state 0.26 +0.0 0.27 perf-profile.self.cycles-pp.handle_mm_fault 0.17 ± 4% +0.0 0.18 ± 2% perf-profile.self.cycles-pp.__count_memcg_events 0.62 +0.1 0.68 perf-profile.self.cycles-pp.find_get_entry 0.89 +0.2 1.13 perf-profile.self.cycles-pp.get_page_from_freelist will-it-scale.per_process_ops 86000 +-+-----------------------------------------------------------------+ |.+.+.+..+.+.+.+.+.+.+.+..+.+.+.+.+.+.+.+..+.+.+.+.+.+.+.+..+ | 84000 +-+ + | | +.+.+.| | O O O | 82000 +-+ O O O O O O O O O O O O O O O O O O O | | | 80000 +-+ | | | 78000 +-+ O | | O O | | O O | 76000 +-+ | O O | 74000 +-+-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen