hi, Yang Shi, On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote: > hi, Fengwei, hi, Yang Shi, > > On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote: > > > > On 2024/1/4 09:32, Yang Shi wrote: > > ... > > > > Can you please help test the below patch? > > I can't access the testing box now. Oliver will help to test your patch. > > > > since now the commit-id of > 'mm: align larger anonymous mappings on THP boundaries' > in linux-next/master is efa7df3e3bb5d > I applied the patch like below: > > * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi > * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries > * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi > > our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression > so far, I will test d8d7b1dae6f03 for all these tests. Thanks > we got 12 regressions and 1 improvement results for efa7df3e3b so far. (4 regressions are just similar to what we reported for 1111d46b5c). by your patch, 6 of those regressions are fixed, others are not impacted. below is a summary: No. testsuite test status-on-efa7df3e3b fix-by-d8d7b1dae6 ? === ========= ==== ==================== =================== (1) stress-ng numa regression NO (2) pthread regression yes (on a Ice Lake server) (3) pthread regression yes (on a Cascade Lake desktop) (4) will-it-scale malloc1 regression NO (5) page_fault1 improvement no (so still improvement) (6) vm-scalability anon-w-seq-mt regression yes (7) stream nr_threads=25% regression yes (8) nr_threads=50% regression yes (9) phoronix osbench.CreateThreads regression yes (on a Cascade Lake server) (10) ramspeed.Add.Integer regression NO (and below 3, on a Coffee Lake desktop) (11) ramspeed.Average.FloatingPoint regression NO (12) ramspeed.Triad.Integer regression NO (13) ramspeed.Average.Integer regression NO below are details, for those regressions not fixed by d8d7b1dae6, attached full comparison. (1) detail comparison is attached as 'stress-ng-regression' Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G ========================================================================================= class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 251.12 -48.2% 130.00 -47.9% 130.75 stress-ng.numa.ops 4.10 -49.4% 2.08 -49.2% 2.09 stress-ng.numa.ops_per_sec (2) Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G ========================================================================================= class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/pthread/stress-ng/60s 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 3272223 -87.8% 400430 +0.5% 3287322 stress-ng.pthread.ops 54516 -87.8% 6664 +0.5% 54772 stress-ng.pthread.ops_per_sec (3) Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with memory: 128G ========================================================================================= class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 2250845 -85.2% 332370 ± 6% -0.8% 2232820 stress-ng.pthread.ops 37510 -85.2% 5538 ± 6% -0.8% 37209 stress-ng.pthread.ops_per_sec (4) full comparison attached as 'will-it-scale-regression' Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 10994 -86.7% 1466 -86.7% 1460 will-it-scale.per_process_ops 1231431 -86.7% 164315 -86.7% 163624 will-it-scale.workload (5) Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault1/will-it-scale 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.224.threads 56.06 +13.3% 63.53 +13.8% 63.81 will-it-scale.224.threads_idle 84191 +44.8% 121869 +44.9% 122010 will-it-scale.per_thread_ops 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.workload (6) Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 345968 -6.5% 323566 +0.1% 346304 vm-scalability.median 1.91 ± 10% -0.5 1.38 ± 20% -0.2 1.75 ± 13% vm-scalability.median_stddev% 79708409 -7.4% 73839640 -0.1% 79613742 vm-scalability.throughput (7) Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G ========================================================================================= array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase: 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 349414 -16.2% 292854 ± 2% -0.4% 348048 stream.add_bandwidth_MBps 347727 ± 2% -16.5% 290470 ± 2% -0.6% 345750 ± 2% stream.add_bandwidth_MBps_harmonicMean 332206 -21.6% 260428 ± 3% -0.4% 330838 stream.copy_bandwidth_MBps 330746 ± 2% -22.6% 255915 ± 3% -0.6% 328725 ± 2% stream.copy_bandwidth_MBps_harmonicMean 301178 -16.9% 250209 ± 2% -0.4% 299920 stream.scale_bandwidth_MBps 300262 -17.7% 247151 ± 2% -0.6% 298586 ± 2% stream.scale_bandwidth_MBps_harmonicMean 337408 -12.5% 295287 ± 2% -0.3% 336304 stream.triad_bandwidth_MBps 336153 -12.7% 293621 -0.5% 334624 ± 2% stream.triad_bandwidth_MBps_harmonicMean (8) Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G ========================================================================================= array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase: 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/50%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 345632 -19.7% 277550 ± 3% +0.4% 347067 ± 2% stream.add_bandwidth_MBps 342263 ± 2% -19.7% 274704 ± 2% +0.4% 343609 ± 2% stream.add_bandwidth_MBps_harmonicMean 343820 -17.3% 284428 ± 3% +0.1% 344248 stream.copy_bandwidth_MBps 341759 ± 2% -17.8% 280934 ± 3% +0.1% 342025 ± 2% stream.copy_bandwidth_MBps_harmonicMean 343270 -17.8% 282330 ± 3% +0.3% 344276 ± 2% stream.scale_bandwidth_MBps 340812 ± 2% -18.3% 278284 ± 3% +0.3% 341672 ± 2% stream.scale_bandwidth_MBps_harmonicMean 364596 -19.7% 292831 ± 3% +0.4% 366145 ± 2% stream.triad_bandwidth_MBps 360643 ± 2% -19.9% 289034 ± 3% +0.4% 362004 ± 2% stream.triad_bandwidth_MBps_harmonicMean (9) Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with memory: 512G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Create Threads/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 26.82 +1348.4% 388.43 +4.0% 27.88 phoronix-test-suite.osbench.CreateThreads.us_per_event **** for below (10) - (13), full comparison is attached as phoronix-regressions (they all happen on a Coffee Lake desktop) (10) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 20115 -4.5% 19211 -4.5% 19217 phoronix-test-suite.ramspeed.Add.Integer.mb_s (11) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 19960 -2.9% 19378 -3.0% 19366 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s (12) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 19667 -6.4% 18399 -6.4% 18413 phoronix-test-suite.ramspeed.Triad.Integer.mb_s (13) Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 19799 -3.5% 19106 -3.4% 19117 phoronix-test-suite.ramspeed.Average.Integer.mb_s > > > commit d8d7b1dae6f0311d528b289cda7b317520f9a984 > Author: 0day robot > Date: Thu Jan 4 12:51:10 2024 +0800 > > fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi > > diff --git a/include/linux/mman.h b/include/linux/mman.h > index 40d94411d4920..91197bd387730 100644 > --- a/include/linux/mman.h > +++ b/include/linux/mman.h > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > arch_calc_vm_flag_bits(flags); > } > > > > > > Regards > > Yin, Fengwei > > > > > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h > > > index 40d94411d492..dc7048824be8 100644 > > > --- a/include/linux/mman.h > > > +++ b/include/linux/mman.h > > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > > > arch_calc_vm_flag_bits(flags); > > > } > > >