Greeting, FYI, we noticed a 10.7% improvement of vm-scalability.throughput due to commit: commit: 1ebbb21811b76c3b932959787f37985af36f62fa ("mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: vm-scalability on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory with following parameters: runtime: 300s test: lru-file-mmap-read cpufreq_governor: performance test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/ Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/lkp-csl-2sp3/lru-file-mmap-read/vm-scalability commit: ab35088543 ("mm/page_alloc: explicitly define what alloc flags deplete min reserves") 1ebbb21811 ("mm/page_alloc: explicitly define how __GFP_HIGH non-blocking allocations accesses reserves") ab3508854353793c 1ebbb21811b76c3b932959787f3 ---------------- --------------------------- %stddev %change %stddev \ | \ 0.21 ± 7% -36.7% 0.13 ± 5% vm-scalability.free_time 341615 +10.1% 375971 vm-scalability.median 281.39 ± 17% -143.5 137.92 ± 16% vm-scalability.stddev% 32701056 +10.7% 36197694 vm-scalability.throughput 160.97 -10.6% 143.88 vm-scalability.time.elapsed_time 160.97 -10.6% 143.88 vm-scalability.time.elapsed_time.max 352302 ± 2% -8.8% 321372 vm-scalability.time.involuntary_context_switches 2788352 ± 7% +30.6% 3640704 ± 6% vm-scalability.time.maximum_resident_set_size 8477 +1.3% 8584 vm-scalability.time.percent_of_cpu_this_job_got 9907 -14.8% 8441 ± 2% vm-scalability.time.system_time 3739 +4.5% 3909 vm-scalability.time.user_time 1.35e+09 ± 4% -18.1% 1.105e+09 ± 2% cpuidle..time 1.184e+08 -14.7% 1.009e+08 turbostat.IRQ 59.12 +1.8% 60.17 turbostat.RAMWatt 5219286 ± 13% -27.2% 3801249 ± 12% meminfo.Active 5211468 ± 13% -27.2% 3793679 ± 12% meminfo.Active(file) 7776815 ± 3% +13.8% 8847221 ± 4% meminfo.MemFree 1.85 ± 2% +0.3 2.10 mpstat.cpu.all.irq% 0.13 ± 3% +0.0 0.16 mpstat.cpu.all.soft% 24.36 +4.0 28.38 ± 2% mpstat.cpu.all.usr% 23.33 ± 2% +17.1% 27.33 ± 3% vmstat.cpu.us 7945791 ± 2% +17.4% 9324811 ± 3% vmstat.memory.free 441308 -3.2% 427108 vmstat.system.in 2851432 ± 6% -26.3% 2102118 ± 12% numa-meminfo.node0.Active 2848009 ± 6% -26.3% 2099497 ± 12% numa-meminfo.node0.Active(file) 4013196 ± 2% +13.8% 4568204 ± 2% numa-meminfo.node0.MemFree 3625142 ± 5% +21.8% 4415591 ± 5% numa-meminfo.node1.MemFree 316.87 ± 18% -34.4% 207.99 ± 12% sched_debug.cfs_rq:/.util_est_enqueued.avg 1189 ± 12% -23.5% 910.39 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.max 324.18 ± 6% -19.2% 261.97 ± 2% sched_debug.cfs_rq:/.util_est_enqueued.stddev 3005 ± 6% +10.6% 3325 ± 2% sched_debug.cpu.nr_switches.min 3072920 ± 8% -26.4% 2261101 ± 7% numa-numastat.node0.numa_foreign 5528896 ± 9% -60.5% 2186586 ± 10% numa-numastat.node0.numa_miss 5590018 ± 9% -59.9% 2239518 ± 10% numa-numastat.node0.other_node 5527881 ± 9% -60.4% 2186984 ± 10% numa-numastat.node1.numa_foreign 3072649 ± 8% -26.4% 2261499 ± 7% numa-numastat.node1.numa_miss 3100633 ± 8% -25.9% 2297621 ± 8% numa-numastat.node1.other_node 714902 ± 6% -26.3% 526979 ± 12% numa-vmstat.node0.nr_active_file 1025543 ± 3% +10.5% 1132734 ± 4% numa-vmstat.node0.nr_free_pages 282.83 ± 7% -23.2% 217.33 ± 12% numa-vmstat.node0.nr_isolated_file 714904 ± 6% -26.3% 526980 ± 12% numa-vmstat.node0.nr_zone_active_file 3072920 ± 8% -26.4% 2261101 ± 7% numa-vmstat.node0.numa_foreign 5528896 ± 9% -60.5% 2186586 ± 10% numa-vmstat.node0.numa_miss 5590018 ± 9% -59.9% 2239518 ± 10% numa-vmstat.node0.numa_other 3239860 ± 4% -18.4% 2642385 ± 6% numa-vmstat.node0.workingset_nodereclaim 931672 ± 6% +18.4% 1103239 ± 5% numa-vmstat.node1.nr_free_pages 5527881 ± 9% -60.4% 2186984 ± 10% numa-vmstat.node1.numa_foreign 3072649 ± 8% -26.4% 2261499 ± 7% numa-vmstat.node1.numa_miss 3100633 ± 8% -25.9% 2297621 ± 8% numa-vmstat.node1.numa_other 228122 ± 6% -13.3% 197862 proc-vmstat.allocstall_movable 6249 ± 5% +144.5% 15278 ± 2% proc-vmstat.allocstall_normal 39122902 ± 21% -81.3% 7326649 ± 23% proc-vmstat.compact_daemon_free_scanned 1.885e+08 ± 17% -74.6% 47875280 ± 26% proc-vmstat.compact_daemon_migrate_scanned 3493 ± 8% -56.6% 1515 ± 6% proc-vmstat.compact_daemon_wake 140286 ± 15% -92.5% 10548 ± 38% proc-vmstat.compact_fail 57183012 ± 15% -83.7% 9325762 ± 23% proc-vmstat.compact_free_scanned 2623027 ± 14% -52.3% 1250898 ± 10% proc-vmstat.compact_isolated 5.211e+08 ± 20% -88.0% 62772564 ± 29% proc-vmstat.compact_migrate_scanned 446104 ± 15% -90.9% 40548 ± 25% proc-vmstat.compact_stall 305818 ± 16% -90.2% 30000 ± 21% proc-vmstat.compact_success 9202 ± 10% -40.0% 5520 ± 7% proc-vmstat.kswapd_low_wmark_hit_quickly 1305504 ± 13% -26.7% 957580 ± 12% proc-vmstat.nr_active_file 1985908 ± 4% +12.9% 2241333 ± 3% proc-vmstat.nr_free_pages 1015907 -2.7% 988196 proc-vmstat.nr_page_table_pages 549360 -2.5% 535474 proc-vmstat.nr_slab_reclaimable 69677 -3.0% 67616 proc-vmstat.nr_slab_unreclaimable 1305512 ± 13% -26.7% 957587 ± 12% proc-vmstat.nr_zone_active_file 8600801 ± 8% -48.3% 4448085 ± 7% proc-vmstat.numa_foreign 4400 ± 50% -81.1% 833.83 ± 80% proc-vmstat.numa_hint_faults_local 51279456 ± 3% -5.2% 48609440 ± 2% proc-vmstat.numa_hit 51194728 ± 3% -5.2% 48521975 ± 2% proc-vmstat.numa_local 8601545 ± 8% -48.3% 4448085 ± 7% proc-vmstat.numa_miss 8690652 ± 8% -47.8% 4537139 ± 7% proc-vmstat.numa_other 9311 ± 11% -39.5% 5629 ± 7% proc-vmstat.pageoutrun 14184591 ± 2% -12.8% 12364026 proc-vmstat.pgalloc_dma32 23108 ± 6% -46.7% 12308 ± 16% proc-vmstat.pgmajfault 1264562 ± 14% -52.0% 606868 ± 10% proc-vmstat.pgmigrate_success 28373 ± 2% -5.4% 26846 ± 2% proc-vmstat.pgreuse 1.831e+09 +3.5% 1.896e+09 proc-vmstat.pgscan_direct 2.641e+08 ± 10% -25.1% 1.978e+08 ± 11% proc-vmstat.pgscan_kswapd 9.765e+08 +3.0% 1.006e+09 proc-vmstat.pgsteal_direct 68399366 ± 4% -45.6% 37214972 proc-vmstat.pgsteal_kswapd 16621148 -5.3% 15745211 proc-vmstat.slabs_scanned 358.33 ± 55% +307.9% 1461 ± 47% proc-vmstat.unevictable_pgs_culled 292.00 ± 65% +377.5% 1394 ± 47% proc-vmstat.unevictable_pgs_rescued 1899136 -7.0% 1765248 ± 2% proc-vmstat.unevictable_pgs_scanned 5904732 -5.8% 5563946 proc-vmstat.workingset_nodereclaim 3099373 -1.6% 3049129 proc-vmstat.workingset_nodes 3.734e+10 +8.0% 4.031e+10 perf-stat.i.branch-instructions 29422732 -5.2% 27892505 perf-stat.i.branch-misses 2.064e+08 +9.2% 2.255e+08 perf-stat.i.cache-misses 7.005e+08 +7.1% 7.5e+08 perf-stat.i.cache-references 2.41 ± 2% -7.7% 2.23 perf-stat.i.cpi 177.46 -6.8% 165.33 perf-stat.i.cpu-migrations 1391 ± 2% -9.7% 1255 perf-stat.i.cycles-between-cache-misses 3.476e+10 +7.9% 3.752e+10 perf-stat.i.dTLB-loads 0.02 ± 3% -0.0 0.02 ± 3% perf-stat.i.dTLB-store-miss-rate% 5.566e+09 +7.8% 5.998e+09 perf-stat.i.dTLB-stores 1.222e+11 +6.8% 1.305e+11 perf-stat.i.instructions 39940 ± 2% +6.9% 42712 perf-stat.i.instructions-per-iTLB-miss 0.49 +6.6% 0.52 perf-stat.i.ipc 210782 +11.7% 235498 perf-stat.i.major-faults 471.82 -7.4% 436.87 perf-stat.i.metric.K/sec 813.09 +7.9% 877.53 perf-stat.i.metric.M/sec 213062 +11.6% 237881 perf-stat.i.minor-faults 17079151 ± 2% -11.9% 15046137 perf-stat.i.node-load-misses 11057787 ± 2% -13.3% 9585653 perf-stat.i.node-loads 57.95 ± 2% -8.2 49.73 perf-stat.i.node-store-miss-rate% 7350966 -14.7% 6266723 perf-stat.i.node-store-misses 5862252 ± 3% +17.9% 6910693 ± 2% perf-stat.i.node-stores 423844 +11.7% 473380 perf-stat.i.page-faults 0.08 -0.0 0.07 perf-stat.overall.branch-miss-rate% 29.46 +0.6 30.07 perf-stat.overall.cache-miss-rate% 2.21 -6.2% 2.07 perf-stat.overall.cpi 1309 -8.3% 1200 perf-stat.overall.cycles-between-cache-misses 0.03 ± 2% -0.0 0.03 ± 2% perf-stat.overall.dTLB-load-miss-rate% 40576 ± 2% +8.3% 43947 perf-stat.overall.instructions-per-iTLB-miss 0.45 +6.6% 0.48 perf-stat.overall.ipc 55.96 ± 2% -8.2 47.75 perf-stat.overall.node-store-miss-rate% 4037 -4.5% 3854 perf-stat.overall.path-length 3.687e+10 +8.0% 3.982e+10 perf-stat.ps.branch-instructions 29098195 -5.3% 27565060 perf-stat.ps.branch-misses 2.038e+08 +9.4% 2.229e+08 perf-stat.ps.cache-misses 6.917e+08 +7.2% 7.412e+08 perf-stat.ps.cache-references 175.77 -6.8% 163.78 perf-stat.ps.cpu-migrations 3.433e+10 +8.0% 3.706e+10 perf-stat.ps.dTLB-loads 5.495e+09 +7.8% 5.925e+09 perf-stat.ps.dTLB-stores 1.207e+11 +6.8% 1.289e+11 perf-stat.ps.instructions 207772 +11.9% 232413 perf-stat.ps.major-faults 210008 +11.8% 234752 perf-stat.ps.minor-faults 17010095 ± 2% -12.1% 14954361 perf-stat.ps.node-load-misses 10989225 ± 2% -13.4% 9514094 perf-stat.ps.node-loads 7316949 -14.9% 6227549 perf-stat.ps.node-store-misses 5760441 ± 3% +18.3% 6816855 ± 2% perf-stat.ps.node-stores 417780 +11.8% 467166 perf-stat.ps.page-faults 1.951e+13 -4.5% 1.862e+13 perf-stat.total.instructions 20.60 ± 86% -20.6 0.00 perf-profile.calltrace.cycles-pp.do_access 17.03 ± 88% -17.0 0.00 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access 16.93 ± 88% -16.9 0.00 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access 16.93 ± 88% -16.9 0.00 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 16.86 ± 88% -16.9 0.00 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access 16.24 ± 89% -16.2 0.00 perf-profile.calltrace.cycles-pp.filemap_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault 16.20 ± 89% -16.2 0.00 perf-profile.calltrace.cycles-pp.page_cache_ra_order.filemap_fault.__do_fault.do_read_fault.do_fault 16.25 ± 89% -15.1 1.15 ±223% perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault 16.76 ± 88% -15.1 1.69 ±151% perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 14.87 ± 74% -14.9 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlinkat 14.87 ± 74% -14.9 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat 14.87 ± 74% -14.9 0.00 perf-profile.calltrace.cycles-pp.unlinkat 14.86 ± 74% -14.9 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat 14.86 ± 74% -14.9 0.00 perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat 14.86 ± 74% -14.9 0.00 perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe 14.84 ± 74% -14.8 0.00 perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64 16.77 ± 88% -14.5 2.26 ±141% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 9.75 ± 95% -9.7 0.00 perf-profile.calltrace.cycles-pp.folio_alloc.page_cache_ra_order.filemap_fault.__do_fault.do_read_fault 9.74 ± 95% -9.7 0.00 perf-profile.calltrace.cycles-pp.__alloc_pages.folio_alloc.page_cache_ra_order.filemap_fault.__do_fault 9.11 ± 97% -9.1 0.00 perf-profile.calltrace.cycles-pp.__alloc_pages_slowpath.__alloc_pages.folio_alloc.page_cache_ra_order.filemap_fault 7.66 ± 70% -7.7 0.00 perf-profile.calltrace.cycles-pp.do_rw_once 7.15 ± 77% -7.2 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 5.99 ± 72% -6.0 0.00 perf-profile.calltrace.cycles-pp.truncate_folio_batch_exceptionals.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat 5.92 ± 76% -5.9 0.00 perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 5.77 ± 80% -5.8 0.00 perf-profile.calltrace.cycles-pp.read_pages.page_cache_ra_order.filemap_fault.__do_fault.do_read_fault 5.77 ± 80% -5.8 0.00 perf-profile.calltrace.cycles-pp.iomap_readahead.read_pages.page_cache_ra_order.filemap_fault.__do_fault 5.72 ± 80% -5.7 0.00 perf-profile.calltrace.cycles-pp.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order.filemap_fault 5.58 ± 80% -5.6 0.00 perf-profile.calltrace.cycles-pp.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order 5.06 ± 72% -5.1 0.00 perf-profile.calltrace.cycles-pp.xas_store.truncate_folio_batch_exceptionals.truncate_inode_pages_range.evict.do_unlinkat 4.85 ± 79% -4.9 0.00 perf-profile.calltrace.cycles-pp.memset_erms.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages 4.75 ± 75% -4.8 0.00 perf-profile.calltrace.cycles-pp.find_lock_entries.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat 6.56 ± 33% -4.7 1.86 ±156% perf-profile.calltrace.cycles-pp.ret_from_fork 6.56 ± 33% -4.7 1.86 ±156% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork 0.00 +13.8 13.82 ± 73% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +14.4 14.40 ± 70% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 22.17 ± 83% -22.2 0.00 perf-profile.children.cycles-pp.do_access 16.27 ± 88% -16.3 0.00 perf-profile.children.cycles-pp.filemap_fault 16.23 ± 88% -16.2 0.00 perf-profile.children.cycles-pp.page_cache_ra_order 16.28 ± 88% -15.7 0.58 ±223% perf-profile.children.cycles-pp.__do_fault 14.87 ± 74% -14.9 0.00 perf-profile.children.cycles-pp.unlinkat 14.86 ± 74% -14.9 0.00 perf-profile.children.cycles-pp.__x64_sys_unlinkat 14.86 ± 74% -14.9 0.00 perf-profile.children.cycles-pp.do_unlinkat 14.86 ± 74% -14.9 0.00 perf-profile.children.cycles-pp.evict 14.84 ± 74% -14.8 0.00 perf-profile.children.cycles-pp.truncate_inode_pages_range 16.84 ± 87% -14.6 2.26 ±141% perf-profile.children.cycles-pp.do_fault 16.83 ± 87% -14.6 2.26 ±141% perf-profile.children.cycles-pp.do_read_fault 9.75 ± 95% -9.8 0.00 perf-profile.children.cycles-pp.folio_alloc 9.11 ± 97% -9.1 0.00 perf-profile.children.cycles-pp.__alloc_pages_slowpath 7.66 ± 96% -7.1 0.58 ±223% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 7.76 ± 61% -6.6 1.15 ±223% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 6.54 ±104% -6.5 0.00 perf-profile.children.cycles-pp.shrink_node 6.45 ±104% -6.4 0.00 perf-profile.children.cycles-pp.shrink_node_memcgs 6.24 ± 69% -6.2 0.00 perf-profile.children.cycles-pp.do_rw_once 6.02 ± 72% -6.0 0.00 perf-profile.children.cycles-pp.truncate_folio_batch_exceptionals 6.56 ± 71% -6.0 0.58 ±223% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 5.83 ± 65% -5.8 0.00 perf-profile.children.cycles-pp.xas_store 5.80 ± 79% -5.8 0.00 perf-profile.children.cycles-pp.read_pages 5.80 ± 79% -5.8 0.00 perf-profile.children.cycles-pp.iomap_readahead 5.75 ± 79% -5.7 0.00 perf-profile.children.cycles-pp.iomap_readpage_iter 5.56 ±103% -5.6 0.00 perf-profile.children.cycles-pp.shrink_lruvec 5.54 ±103% -5.5 0.00 perf-profile.children.cycles-pp.shrink_inactive_list 5.09 ± 69% -5.1 0.00 perf-profile.children.cycles-pp.compact_zone 4.76 ± 75% -4.8 0.00 perf-profile.children.cycles-pp.find_lock_entries 6.56 ± 33% -4.7 1.86 ±156% perf-profile.children.cycles-pp.ret_from_fork 6.56 ± 33% -4.7 1.86 ±156% perf-profile.children.cycles-pp.kthread 4.42 ± 73% -4.4 0.00 perf-profile.children.cycles-pp.isolate_migratepages 5.61 ± 80% -4.3 1.28 ±223% perf-profile.children.cycles-pp.zero_user_segments 5.56 ± 79% -4.3 1.28 ±223% perf-profile.children.cycles-pp.memset_erms 4.08 ± 69% -2.3 1.76 ±153% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 0.02 ±142% +13.1 13.17 ± 47% perf-profile.children.cycles-pp.__x64_sys_openat 0.02 ±142% +13.1 13.17 ± 47% perf-profile.children.cycles-pp.do_sys_openat2 0.00 +16.2 16.19 ± 30% perf-profile.children.cycles-pp.cmd_record 0.00 +16.2 16.19 ± 30% perf-profile.children.cycles-pp.__cmd_record 18.04 ± 62% +33.0 51.05 ± 21% perf-profile.children.cycles-pp.do_syscall_64 18.04 ± 62% +33.6 51.63 ± 22% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 7.66 ± 96% -7.1 0.58 ±223% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 4.27 ± 73% -4.3 0.00 perf-profile.self.cycles-pp.do_access 5.49 ± 79% -4.2 1.28 ±223% perf-profile.self.cycles-pp.memset_erms 3.92 ± 69% -3.9 0.00 perf-profile.self.cycles-pp.do_rw_once Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests