Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks
       [not found] <20180606122731.GB27707@jra-laptop.brq.redhat.com>
@ 2018-06-07 11:07 ` Michal Hocko
  2018-06-07 11:19   ` Jakub Raček
  0 siblings, 1 reply; 3+ messages in thread
From: Michal Hocko @ 2018-06-07 11:07 UTC (permalink / raw)
  To: Jakub Racek
  Cc: linux-kernel, Rafael J. Wysocki, Len Brown, linux-acpi,
	Mel Gorman, linux-mm

[CCing Mel and MM mailing list]

On Wed 06-06-18 14:27:32, Jakub Racek wrote:
> Hi,
> 
> There is a huge performance regression on the 2 and 4 NUMA node systems on
> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
> and NAS parallel benchmarks show upto 50% performance drop.
> 
> When running for example 20 stream processes in parallel, we see the following behavior:
> 
> * all processes are started at NODE #1
> * memory is also allocated on NODE #1
> * roughly half of the processes are moved to the NODE #0 very quickly. *
> however, memory is not moved to NODE #0 and stays allocated on NODE #1
> 
> As the result, half of the processes are running on NODE#0 with memory being
> still allocated on NODE#1. This leads to non-local memory accesses
> on the high Remote-To-Local Memory Access Ratio on the numatop charts.
> 
> So it seems that 4.17 is not doing a good job to move the memory to the right NUMA
> node after the process has been moved.
> 
> ----8<----
> 
> The above is an excerpt from performance testing on 4.16 and 4.17 kernels.
> 
> For now I'm merely making sure the problem is reported.

Do you have numa balancing enabled?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks
  2018-06-07 11:07 ` [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks Michal Hocko
@ 2018-06-07 11:19   ` Jakub Raček
  2018-06-07 11:56     ` Jirka Hladky
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Raček @ 2018-06-07 11:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, Rafael J. Wysocki, Len Brown, linux-acpi,
	Mel Gorman, linux-mm

Hi,

On 06/07/2018 01:07 PM, Michal Hocko wrote:
> [CCing Mel and MM mailing list]
> 
> On Wed 06-06-18 14:27:32, Jakub Racek wrote:
>> Hi,
>>
>> There is a huge performance regression on the 2 and 4 NUMA node systems on
>> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream, Linpack
>> and NAS parallel benchmarks show upto 50% performance drop.
>>
>> When running for example 20 stream processes in parallel, we see the following behavior:
>>
>> * all processes are started at NODE #1
>> * memory is also allocated on NODE #1
>> * roughly half of the processes are moved to the NODE #0 very quickly. *
>> however, memory is not moved to NODE #0 and stays allocated on NODE #1
>>
>> As the result, half of the processes are running on NODE#0 with memory being
>> still allocated on NODE#1. This leads to non-local memory accesses
>> on the high Remote-To-Local Memory Access Ratio on the numatop charts.
>>
>> So it seems that 4.17 is not doing a good job to move the memory to the right NUMA
>> node after the process has been moved.
>>
>> ----8<----
>>
>> The above is an excerpt from performance testing on 4.16 and 4.17 kernels.
>>
>> For now I'm merely making sure the problem is reported.
> 
> Do you have numa balancing enabled?
> 

Yes. The relevant settings are:

kernel.numa_balancing = 1
kernel.numa_balancing_scan_delay_ms = 1000
kernel.numa_balancing_scan_period_max_ms = 60000
kernel.numa_balancing_scan_period_min_ms = 1000
kernel.numa_balancing_scan_size_mb = 256


-- 
Best regards,
Jakub Racek
FMK

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks
  2018-06-07 11:19   ` Jakub Raček
@ 2018-06-07 11:56     ` Jirka Hladky
  0 siblings, 0 replies; 3+ messages in thread
From: Jirka Hladky @ 2018-06-07 11:56 UTC (permalink / raw)
  To: Jakub Raček
  Cc: Michal Hocko, linux-kernel, Rafael J. Wysocki, Len Brown,
	linux-acpi, Mel Gorman, linux-mm, jhladky

[-- Attachment #1: Type: text/plain, Size: 1781 bytes --]

Adding myself to Cc.

On Thu, Jun 7, 2018 at 1:19 PM, Jakub Raček <jracek@redhat.com> wrote:

> Hi,
>
> On 06/07/2018 01:07 PM, Michal Hocko wrote:
>
>> [CCing Mel and MM mailing list]
>>
>> On Wed 06-06-18 14:27:32, Jakub Racek wrote:
>>
>>> Hi,
>>>
>>> There is a huge performance regression on the 2 and 4 NUMA node systems
>>> on
>>> stream benchmark with 4.17 kernel compared to 4.16 kernel. Stream,
>>> Linpack
>>> and NAS parallel benchmarks show upto 50% performance drop.
>>>
>>> When running for example 20 stream processes in parallel, we see the
>>> following behavior:
>>>
>>> * all processes are started at NODE #1
>>> * memory is also allocated on NODE #1
>>> * roughly half of the processes are moved to the NODE #0 very quickly. *
>>> however, memory is not moved to NODE #0 and stays allocated on NODE #1
>>>
>>> As the result, half of the processes are running on NODE#0 with memory
>>> being
>>> still allocated on NODE#1. This leads to non-local memory accesses
>>> on the high Remote-To-Local Memory Access Ratio on the numatop charts.
>>>
>>> So it seems that 4.17 is not doing a good job to move the memory to the
>>> right NUMA
>>> node after the process has been moved.
>>>
>>> ----8<----
>>>
>>> The above is an excerpt from performance testing on 4.16 and 4.17
>>> kernels.
>>>
>>> For now I'm merely making sure the problem is reported.
>>>
>>
>> Do you have numa balancing enabled?
>>
>>
> Yes. The relevant settings are:
>
> kernel.numa_balancing = 1
> kernel.numa_balancing_scan_delay_ms = 1000
> kernel.numa_balancing_scan_period_max_ms = 60000
> kernel.numa_balancing_scan_period_min_ms = 1000
> kernel.numa_balancing_scan_size_mb = 256
>
>
> --
> Best regards,
> Jakub Racek
> FMK
>

[-- Attachment #2: Type: text/html, Size: 2424 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-06-07 11:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20180606122731.GB27707@jra-laptop.brq.redhat.com>
2018-06-07 11:07 ` [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks Michal Hocko
2018-06-07 11:19   ` Jakub Raček
2018-06-07 11:56     ` Jirka Hladky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox