From: Yang Shi <shy828301@gmail.com>
To: Oliver Sang <oliver.sang@intel.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>,
Rik van Riel <riel@surriel.com>,
oe-lkp@lists.linux.dev, lkp@intel.com,
Linux Memory Management List <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Christopher Lameter <cl@linux.com>,
ying.huang@intel.com, feng.tang@intel.com
Subject: Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
Date: Fri, 5 Jan 2024 10:49:56 -0800 [thread overview]
Message-ID: <CAHbLzkpp0d=hPcKxprYnVJL=g0dxNcTN5vmg8AHueEXYMvoCCw@mail.gmail.com> (raw)
In-Reply-To: <ZZfL6APUYZ3VuUTv@xsang-OptiPlex-9020>
On Fri, Jan 5, 2024 at 1:29 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Yang Shi,
>
> On Thu, Jan 04, 2024 at 04:39:50PM +0800, Oliver Sang wrote:
> > hi, Fengwei, hi, Yang Shi,
> >
> > On Thu, Jan 04, 2024 at 04:18:00PM +0800, Yin Fengwei wrote:
> > >
> > > On 2024/1/4 09:32, Yang Shi wrote:
> >
> > ...
> >
> > > > Can you please help test the below patch?
> > > I can't access the testing box now. Oliver will help to test your patch.
> > >
> >
> > since now the commit-id of
> > 'mm: align larger anonymous mappings on THP boundaries'
> > in linux-next/master is efa7df3e3bb5d
> > I applied the patch like below:
> >
> > * d8d7b1dae6f03 fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
> > * efa7df3e3bb5d mm: align larger anonymous mappings on THP boundaries
> > * 1803d0c5ee1a3 mailmap: add an old address for Naoya Horiguchi
> >
> > our auto-bisect captured new efa7df3e3b as fbc for quite a number of regression
> > so far, I will test d8d7b1dae6f03 for all these tests. Thanks
> >
>
Hi Oliver,
Thanks for running the test. Please see the inline comments.
> we got 12 regressions and 1 improvement results for efa7df3e3b so far.
> (4 regressions are just similar to what we reported for 1111d46b5c).
> by your patch, 6 of those regressions are fixed, others are not impacted.
>
> below is a summary:
>
> No. testsuite test status-on-efa7df3e3b fix-by-d8d7b1dae6 ?
> === ========= ==== ==================== ===================
> (1) stress-ng numa regression NO
> (2) pthread regression yes (on a Ice Lake server)
> (3) pthread regression yes (on a Cascade Lake desktop)
> (4) will-it-scale malloc1 regression NO
I think this was reported earlier when Rik submitted the patch in the
first place. IIRC, Huang Ying did some analysis on this one and
thought is can be ignored.
> (5) page_fault1 improvement no (so still improvement)
> (6) vm-scalability anon-w-seq-mt regression yes
> (7) stream nr_threads=25% regression yes
> (8) nr_threads=50% regression yes
> (9) phoronix osbench.CreateThreads regression yes (on a Cascade Lake server)
> (10) ramspeed.Add.Integer regression NO (and below 3, on a Coffee Lake desktop)
> (11) ramspeed.Average.FloatingPoint regression NO
> (12) ramspeed.Triad.Integer regression NO
> (13) ramspeed.Average.Integer regression NO
Not fixing the ramspeed regression is expected. But it seems like both
I and Fengwei can't reproduce the regression with running ramspeed
alone.
>
>
> below are details, for those regressions not fixed by d8d7b1dae6, attached
> full comparison.
>
>
> (1) detail comparison is attached as 'stress-ng-regression'
>
> Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
> =========================================================================================
> class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> cpu/gcc-12/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/numa/stress-ng/60s
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 251.12 -48.2% 130.00 -47.9% 130.75 stress-ng.numa.ops
> 4.10 -49.4% 2.08 -49.2% 2.09 stress-ng.numa.ops_per_sec
This is a new one. I did some analysis, it seems like it is not
related to the THP patch since I can reproduce it on the kernel (on
aarch64 VM) w/o the THP patch if I set THP to always.
The profiling showed the regression was caused by move_pages()
syscall. The test actually calls a bunch of NUMA syscalls, for
example, set_mempolicy(), mbind(), move_pages(), migrate_pages(), etc,
with different parameters. When calling move_pages() it tries to move
pages (at base page granularity) to different nodes in a circular
list. On my 2-node NUMA VM, it actually moves:
0th page to node #1
1st page to node #0
2nd page to node #1
3rd page to node #0
....
1023rd page to node #0
But for THP, it actually bounces the THP between the two nodes for 512 times.
The pgmigrate_success counter in /proc/vmstat also reflected the case:
For base page, the delta is 1928431, but for THP case the delta is 218466402.
The kernel already did the node check to kip move if the page is
already on the target node, but the test case just do the bounce on
purpose since it just assumes base page. So I think this case should
be run with THP disabled.
>
>
> (2)
> Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with memory: 256G
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp7/pthread/stress-ng/60s
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 3272223 -87.8% 400430 +0.5% 3287322 stress-ng.pthread.ops
> 54516 -87.8% 6664 +0.5% 54772 stress-ng.pthread.ops_per_sec
>
>
> (3)
> Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with memory: 128G
> =========================================================================================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
> os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/pthread/stress-ng/60s
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 2250845 -85.2% 332370 ± 6% -0.8% 2232820 stress-ng.pthread.ops
> 37510 -85.2% 5538 ± 6% -0.8% 37209 stress-ng.pthread.ops_per_sec
>
>
> (4) full comparison attached as 'will-it-scale-regression'
>
> Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/malloc1/will-it-scale
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 10994 -86.7% 1466 -86.7% 1460 will-it-scale.per_process_ops
> 1231431 -86.7% 164315 -86.7% 163624 will-it-scale.workload
>
>
> (5)
> Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/page_fault1/will-it-scale
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.224.threads
> 56.06 +13.3% 63.53 +13.8% 63.81 will-it-scale.224.threads_idle
> 84191 +44.8% 121869 +44.9% 122010 will-it-scale.per_thread_ops
> 18858970 +44.8% 27298921 +44.9% 27330479 will-it-scale.workload
>
>
> (6)
> Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with memory: 192G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/8T/lkp-cpl-4sp2/anon-w-seq-mt/vm-scalability
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 345968 -6.5% 323566 +0.1% 346304 vm-scalability.median
> 1.91 ± 10% -0.5 1.38 ± 20% -0.2 1.75 ± 13% vm-scalability.median_stddev%
> 79708409 -7.4% 73839640 -0.1% 79613742 vm-scalability.throughput
>
>
> (7)
> Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G
> =========================================================================================
> array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
> 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/25%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 349414 -16.2% 292854 ± 2% -0.4% 348048 stream.add_bandwidth_MBps
> 347727 ± 2% -16.5% 290470 ± 2% -0.6% 345750 ± 2% stream.add_bandwidth_MBps_harmonicMean
> 332206 -21.6% 260428 ± 3% -0.4% 330838 stream.copy_bandwidth_MBps
> 330746 ± 2% -22.6% 255915 ± 3% -0.6% 328725 ± 2% stream.copy_bandwidth_MBps_harmonicMean
> 301178 -16.9% 250209 ± 2% -0.4% 299920 stream.scale_bandwidth_MBps
> 300262 -17.7% 247151 ± 2% -0.6% 298586 ± 2% stream.scale_bandwidth_MBps_harmonicMean
> 337408 -12.5% 295287 ± 2% -0.3% 336304 stream.triad_bandwidth_MBps
> 336153 -12.7% 293621 -0.5% 334624 ± 2% stream.triad_bandwidth_MBps_harmonicMean
>
>
> (8)
> Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with memory: 512G
> =========================================================================================
> array_size/compiler/cpufreq_governor/iterations/kconfig/loop/nr_threads/omp/rootfs/tbox_group/testcase:
> 50000000/gcc-12/performance/10x/x86_64-rhel-8.3/100/50%/true/debian-11.1-x86_64-20220510.cgz/lkp-spr-2sp4/stream
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 345632 -19.7% 277550 ± 3% +0.4% 347067 ± 2% stream.add_bandwidth_MBps
> 342263 ± 2% -19.7% 274704 ± 2% +0.4% 343609 ± 2% stream.add_bandwidth_MBps_harmonicMean
> 343820 -17.3% 284428 ± 3% +0.1% 344248 stream.copy_bandwidth_MBps
> 341759 ± 2% -17.8% 280934 ± 3% +0.1% 342025 ± 2% stream.copy_bandwidth_MBps_harmonicMean
> 343270 -17.8% 282330 ± 3% +0.3% 344276 ± 2% stream.scale_bandwidth_MBps
> 340812 ± 2% -18.3% 278284 ± 3% +0.3% 341672 ± 2% stream.scale_bandwidth_MBps_harmonicMean
> 364596 -19.7% 292831 ± 3% +0.4% 366145 ± 2% stream.triad_bandwidth_MBps
> 360643 ± 2% -19.9% 289034 ± 3% +0.4% 362004 ± 2% stream.triad_bandwidth_MBps_harmonicMean
>
>
> (9)
> Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with memory: 512G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Create Threads/debian-x86_64-phoronix/lkp-csl-2sp7/osbench-1.0.2/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 26.82 +1348.4% 388.43 +4.0% 27.88 phoronix-test-suite.osbench.CreateThreads.us_per_event
>
>
> **** for below (10) - (13), full comparison is attached as phoronix-regressions
> (they all happen on a Coffee Lake desktop)
> (10)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Add/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 20115 -4.5% 19211 -4.5% 19217 phoronix-test-suite.ramspeed.Add.Integer.mb_s
>
>
> (11)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Average/Floating Point/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 19960 -2.9% 19378 -3.0% 19366 phoronix-test-suite.ramspeed.Average.FloatingPoint.mb_s
>
>
> (12)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Triad/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 19667 -6.4% 18399 -6.4% 18413 phoronix-test-suite.ramspeed.Triad.Integer.mb_s
>
>
> (13)
> Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz (Coffee Lake) with memory: 16G
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
> gcc-12/performance/x86_64-rhel-8.3/Average/Integer/debian-x86_64-phoronix/lkp-cfl-d1/ramspeed-1.4.3/phoronix-test-suite
>
> 1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274 d8d7b1dae6f0311d528b289cda7
> ---------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev
> \ | \ | \
> 19799 -3.5% 19106 -3.4% 19117 phoronix-test-suite.ramspeed.Average.Integer.mb_s
>
>
>
> >
> >
> > commit d8d7b1dae6f0311d528b289cda7b317520f9a984
> > Author: 0day robot <lkp@intel.com>
> > Date: Thu Jan 4 12:51:10 2024 +0800
> >
> > fix for 'mm: align larger anonymous mappings on THP boundaries' from Yang Shi
> >
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 40d94411d4920..91197bd387730 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> > arch_calc_vm_flag_bits(flags);
> > }
> >
> >
> > >
> > > Regards
> > > Yin, Fengwei
> > >
> > > >
> > > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > > index 40d94411d492..dc7048824be8 100644
> > > > --- a/include/linux/mman.h
> > > > +++ b/include/linux/mman.h
> > > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> > > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> > > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> > > > + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
> > > > arch_calc_vm_flag_bits(flags);
> > > > }
> > > >
prev parent reply other threads:[~2024-01-05 18:50 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-19 15:41 kernel test robot
2023-12-20 5:27 ` Yang Shi
2023-12-20 8:29 ` Yin Fengwei
2023-12-20 15:42 ` Christoph Lameter (Ampere)
2023-12-20 20:14 ` Yang Shi
2023-12-20 20:09 ` Yang Shi
2023-12-21 0:26 ` Yang Shi
2023-12-21 0:58 ` Yin Fengwei
2023-12-21 1:02 ` Yin Fengwei
2023-12-21 4:49 ` Matthew Wilcox
2023-12-21 4:58 ` Yin Fengwei
2023-12-21 18:07 ` Yang Shi
2023-12-21 18:14 ` Matthew Wilcox
2023-12-22 1:06 ` Yin, Fengwei
2023-12-22 2:23 ` Huang, Ying
2023-12-21 13:39 ` Yin, Fengwei
2023-12-21 18:11 ` Yang Shi
2023-12-22 1:13 ` Yin, Fengwei
2024-01-04 1:32 ` Yang Shi
2024-01-04 8:18 ` Yin Fengwei
2024-01-04 8:39 ` Oliver Sang
2024-01-05 9:29 ` Oliver Sang
2024-01-05 14:52 ` Yin, Fengwei
2024-01-05 18:49 ` Yang Shi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHbLzkpp0d=hPcKxprYnVJL=g0dxNcTN5vmg8AHueEXYMvoCCw@mail.gmail.com' \
--to=shy828301@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=feng.tang@intel.com \
--cc=fengwei.yin@intel.com \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=riel@surriel.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox