* Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks [not found] <20180606122731.GB27707@jra-laptop.brq.redhat.com> @ 2018-06-07 11:07 ` Michal Hocko 2018-06-07 11:19 ` Jakub Raček 0 siblings, 1 reply; 3+ messages in thread From: Michal Hocko @ 2018-06-07 11:07 UTC (permalink / raw) To: Jakub Racek Cc: linux-kernel, Rafael J. Wysocki, Len Brown, linux-acpi, Mel Gorman, linux-mm [CCing Mel and MM mailing list] On Wed 06-06-18 14:27:32, Jakub Racek wrote: > Hi, > > There is a huge performance regression on the 2 and 4 NUMA node systems on > stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack > and NAS parallel benchmarks show upto 50% performance drop. > > When running for example 20 stream processes in parallel, we see the following behavior: > > * all processes are started at NODE #1 > * memory is also allocated on NODE #1 > * roughly half of the processes are moved to the NODE #0 very quickly. * > however, memory is not moved to NODE #0 and stays allocated on NODE #1 > > As the result, half of the processes are running on NODE#0 with memory being > still allocated on NODE#1. This leads to non-local memory accesses > on the high Remote-To-Local Memory Access Ratio on the numatop charts. > > So it seems that 4.17 is not doing a good job to move the memory to the right NUMA > node after the process has been moved. > > ----8<---- > > The above is an excerpt from performance testing on 4.16 and 4.17 kernels. > > For now I'm merely making sure the problem is reported. Do you have numa balancing enabled? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks 2018-06-07 11:07 ` [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks Michal Hocko @ 2018-06-07 11:19 ` Jakub Raček 2018-06-07 11:56 ` Jirka Hladky 0 siblings, 1 reply; 3+ messages in thread From: Jakub Raček @ 2018-06-07 11:19 UTC (permalink / raw) To: Michal Hocko Cc: linux-kernel, Rafael J. Wysocki, Len Brown, linux-acpi, Mel Gorman, linux-mm Hi, On 06/07/2018 01:07 PM, Michal Hocko wrote: > [CCing Mel and MM mailing list] > > On Wed 06-06-18 14:27:32, Jakub Racek wrote: >> Hi, >> >> There is a huge performance regression on the 2 and 4 NUMA node systems on >> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack >> and NAS parallel benchmarks show upto 50% performance drop. >> >> When running for example 20 stream processes in parallel, we see the following behavior: >> >> * all processes are started at NODE #1 >> * memory is also allocated on NODE #1 >> * roughly half of the processes are moved to the NODE #0 very quickly. * >> however, memory is not moved to NODE #0 and stays allocated on NODE #1 >> >> As the result, half of the processes are running on NODE#0 with memory being >> still allocated on NODE#1. This leads to non-local memory accesses >> on the high Remote-To-Local Memory Access Ratio on the numatop charts. >> >> So it seems that 4.17 is not doing a good job to move the memory to the right NUMA >> node after the process has been moved. >> >> ----8<---- >> >> The above is an excerpt from performance testing on 4.16 and 4.17 kernels. >> >> For now I'm merely making sure the problem is reported. > > Do you have numa balancing enabled? > Yes. The relevant settings are: kernel.numa_balancing = 1 kernel.numa_balancing_scan_delay_ms = 1000 kernel.numa_balancing_scan_period_max_ms = 60000 kernel.numa_balancing_scan_period_min_ms = 1000 kernel.numa_balancing_scan_size_mb = 256 -- Best regards, Jakub Racek FMK ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks 2018-06-07 11:19 ` Jakub Raček @ 2018-06-07 11:56 ` Jirka Hladky 0 siblings, 0 replies; 3+ messages in thread From: Jirka Hladky @ 2018-06-07 11:56 UTC (permalink / raw) To: Jakub Raček Cc: Michal Hocko, linux-kernel, Rafael J. Wysocki, Len Brown, linux-acpi, Mel Gorman, linux-mm, jhladky [-- Attachment #1: Type: text/plain, Size: 1781 bytes --] Adding myself to Cc. On Thu, Jun 7, 2018 at 1:19 PM, Jakub Raček <jracek@redhat.com> wrote: > Hi, > > On 06/07/2018 01:07 PM, Michal Hocko wrote: > >> [CCing Mel and MM mailing list] >> >> On Wed 06-06-18 14:27:32, Jakub Racek wrote: >> >>> Hi, >>> >>> There is a huge performance regression on the 2 and 4 NUMA node systems >>> on >>> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, >>> Linpack >>> and NAS parallel benchmarks show upto 50% performance drop. >>> >>> When running for example 20 stream processes in parallel, we see the >>> following behavior: >>> >>> * all processes are started at NODE #1 >>> * memory is also allocated on NODE #1 >>> * roughly half of the processes are moved to the NODE #0 very quickly. * >>> however, memory is not moved to NODE #0 and stays allocated on NODE #1 >>> >>> As the result, half of the processes are running on NODE#0 with memory >>> being >>> still allocated on NODE#1. This leads to non-local memory accesses >>> on the high Remote-To-Local Memory Access Ratio on the numatop charts. >>> >>> So it seems that 4.17 is not doing a good job to move the memory to the >>> right NUMA >>> node after the process has been moved. >>> >>> ----8<---- >>> >>> The above is an excerpt from performance testing on 4.16 and 4.17 >>> kernels. >>> >>> For now I'm merely making sure the problem is reported. >>> >> >> Do you have numa balancing enabled? >> >> > Yes. The relevant settings are: > > kernel.numa_balancing = 1 > kernel.numa_balancing_scan_delay_ms = 1000 > kernel.numa_balancing_scan_period_max_ms = 60000 > kernel.numa_balancing_scan_period_min_ms = 1000 > kernel.numa_balancing_scan_size_mb = 256 > > > -- > Best regards, > Jakub Racek > FMK > [-- Attachment #2: Type: text/html, Size: 2424 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-06-07 11:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20180606122731.GB27707@jra-laptop.brq.redhat.com>
2018-06-07 11:07 ` [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks Michal Hocko
2018-06-07 11:19 ` Jakub Raček
2018-06-07 11:56 ` Jirka Hladky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox