Re: darktable performance regression on AMD systems caused by "mm: align larger anonymous mappings on THP boundaries"

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Thorsten Leemhuis <regressions@leemhuis.info>,
	Rik van Riel <riel@surriel.com>
Cc: Matthias <matthias@bodenbinder.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux kernel regressions list <regressions@lists.linux.dev>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	Yang Shi <yang@os.amperecomputing.com>,
	Petr Tesarik <ptesarik@suse.com>
Subject: Re: darktable performance regression on AMD systems caused by "mm: align larger anonymous mappings on THP boundaries"
Date: Thu, 24 Oct 2024 11:58:43 +0200	[thread overview]
Message-ID: <f81ef5bd-e930-4982-a5a8-cd4aca272912@suse.cz> (raw)
In-Reply-To: <2050f0d4-57b0-481d-bab8-05e8d48fed0c@leemhuis.info>

On 10/24/24 09:45, Thorsten Leemhuis wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker.
> 
> Rik, I noticed a report about a regression in bugzilla.kernel.org that
> appears to be caused by the following change of yours:
> 
> efa7df3e3bb5da ("mm: align larger anonymous mappings on THP boundaries")
> [v6.7]
> 
> It might be one of those "some things got faster, a few things became
> slower" situations. Not sure. Felt odd that the reporter was able to
> reproduce it on two AMD systems, but not on a Intel system. Maybe there
> is a bug somewhere else that was exposed by this.

It seems very similar to what we've seen with spec benchmarks such as cactus
and bisected to the same commit:

https://bugzilla.suse.com/show_bug.cgi?id=1229012

The exact regression varies per system. Intel regresses too but relatively
less. The theory is that there are many large-ish allocations that don't
have individual sizes aligned to 2MB and would have been merged, commit
efa7df3e3bb5da causes them to become separate areas where each aligns its
start at 2MB boundary and there are gaps between. This (gaps and vma
fragmentation) itself is not great, but most of the problem seemed to be
from the start alignment, which togethter with the access pattern causes
more TLB or cache missess due to limited associtativity.

So maybe darktable has a similar problem. A simple candidate fix could
change commit efa7df3e3bb5da so that the mapping size has to be a multiple
of THP size (2MB) in order to become aligned, right now it's enough if it's
THP sized or larger.

> So in the end it felt worth forwarding by mail to me. Not tracking this
> yet, first waiting for feedback.
> 
> To quote from https://bugzilla.kernel.org/show_bug.cgi?id=219366 :
> 
>> Matthias 2024-10-09 05:37:51 UTC
>> 
>> I am using a darktable benchmark and I am finding that RAW-to-JPG
>> conversion is about 15-25 % slower with kernels 6.7-6.10. The last
>> fast kernel series is 6.6. I also tested kernel series 6.5 and it is
>> as fast as 6.6
>> 
>> I know this sounds weird. What has darktable to do with the kernel?
>> But the numbers are true. And the darktable devs tell me that this
>> is a kernel regression. The darktable github issue is: https://
>> github.com/darktable-org/darktable/issues/17397  You can find more
>> details there.
>> 
>> What do I do to measure the performance?
>> 
>> I am executing darktable on the command line. opencl is disabled so
>> that all activities are only on the CPU:
>> 
>> darktable-cli bench.SRW /tmp/test.jpg --core --disable-opencl -d
>> perf -d opencl --configdir /tmp
>> 
>> ( bench.SRW and the sidecar file can be found here: https://
>> drive.google.com/drive/folders/1cfV2b893JuobVwGiZXcaNv5-yszH6j-N )
>> 
>> This will show some debug output. The line to look for is
>> 
>> 4,2765 [dev_process_export] pixel pipeline processing took 3,811
>> secs (81,883 CPU)
>> 
>> This gives an exact number how much time darktable needed to convert
>> the image. The time darktable needs has a clear dependency on the
>> kernel version. It is fast with kernel 6.6. and older and slow with
>> kernel 6.7 and newer. Something must have happened from 6.6 to 6.7
>> which slows down darktable.
>> 
>> The darktable debug output shows that basically only one module is
>> responsible for the slow down: 'atrous'
>> 
>> with kernel 6.6.47:
>> 
>> 4,0548 [dev_pixelpipe] took 0,635 secs (14,597 CPU) [export]
>> processed 'atrous' on CPU, blended on CPU ... 4,2765
>> [dev_process_export] pixel pipeline processing took 3,811 secs
>> (81,883 CPU)
>> 
>> with kernel 6.10.6:
>> 
>> 4,9645 [dev_pixelpipe] took 1,489 secs (33,736 CPU) [export]
>> processed 'atrous' on CPU, blended on CPU ... 5,2151
>> [dev_process_export] pixel pipeline processing took 4,773 secs
>> (102,452 CPU)
>> 
>> 
>> This is also being discussed here: https://discuss.pixls.us/t/
>> darktable-performance-regression-with-kernel-6-7-and-newer/45945/1 
>> And other users confirm the performance degradation.
> 
> [...]
> 
>> This seems to affect AMD only. I reproduced this performance
>> degradation on two different Ryzen Desktop PCs (Ryzen 5 and Ryzen
>> 9). But I can not reproduce it on my Intel PC (Lenovo X1 Carbon,
>> core i5).
> 
> [...]
> 
>> By the way, there is also a thread in the darktable forum on this topic:
>> https://discuss.pixls.us/t/darktable-performance-regression-with-kernel-6-7-and-newer/45945
>>  
>> Some users reproduced it there as well.
> 
> See the ticket for more details. The reporter is CCed. openZFS is in
> use, but the problem was reproduced on vanilla kernels.
> 
> Ciao, Thorsten
>

next prev parent reply	other threads:[~2024-10-24  9:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-24  7:45 Thorsten Leemhuis
2024-10-24  9:58 ` Vlastimil Babka [this message]
2024-10-24 10:23   ` Vlastimil Babka
2024-10-24 10:49     ` Petr Tesarik
2024-10-24 10:56       ` Vlastimil Babka
2024-10-24 11:13         ` Petr Tesarik
2024-10-24 13:29           ` Vlastimil Babka
2024-10-24 14:14             ` Petr Tesarik
2024-10-24 11:20     ` Matthias Bodenbinder
2024-10-24 15:12 ` [PATCH hotfix 6.12] mm, mmap: limit THP aligment of anonymous mappings to PMD-aligned sizes Vlastimil Babka
2024-10-24 15:47   ` Lorenzo Stoakes
2024-10-24 16:00     ` Lorenzo Stoakes
2024-10-24 16:04     ` Vlastimil Babka
2024-10-24 16:17       ` Lorenzo Stoakes
2024-10-28 13:45     ` Michael Matz
2024-10-24 18:32   ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f81ef5bd-e930-4982-a5a8-cd4aca272912@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthias@bodenbinder.de \
    --cc=ptesarik@suse.com \
    --cc=regressions@leemhuis.info \
    --cc=regressions@lists.linux.dev \
    --cc=riel@surriel.com \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox