From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C584FCF537D for ; Thu, 24 Oct 2024 09:58:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4500C6B0082; Thu, 24 Oct 2024 05:58:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 400976B0083; Thu, 24 Oct 2024 05:58:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C7FD6B0085; Thu, 24 Oct 2024 05:58:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0D6A26B0082 for ; Thu, 24 Oct 2024 05:58:48 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 20C6D121160 for ; Thu, 24 Oct 2024 09:58:31 +0000 (UTC) X-FDA: 82708046148.30.D25F70B Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf22.hostedemail.com (Postfix) with ESMTP id DFFEAC0016 for ; Thu, 24 Oct 2024 09:58:22 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=JnuHwcF6; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="TVw3P/pZ"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=JnuHwcF6; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="TVw3P/pZ"; spf=pass (imf22.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729763873; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sZntrTTptEiqTMwL6pTNPKXw1ByWsGeunjCMJ5apeUM=; b=4U5u93SkST0AF34JXwTTgfO58rpC3uPNkIr4vYop1DhiB2ikPg4aG2G52hlDrFgCDmXeB4 VY6CvyJ4Ls9WMyYuHppuNraTu+lXHDbAb4EbPJtlsw0xIwEy0kGdNy3/iYaVmaLlswUFBd eSql6y/cHWQDD8ru3YA4+CT/dNN58a4= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=JnuHwcF6; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="TVw3P/pZ"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=JnuHwcF6; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="TVw3P/pZ"; spf=pass (imf22.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729763873; a=rsa-sha256; cv=none; b=wVjXiy4S0Q9C7nQ8fjmV9zor8HNwU7FG8Xa+VEezyoayYQ7XLEzMq9Gy/DyfD9iH15XELs JK4okfzkYWF20yiA1PuW0Rh2vhUv1UjfRNW+iEjG2Pa6jjKGscEmtqMErXDDeDmBYlfxyO otE8ftpO/kjj8DHixwuJEWttb5GqFIQ= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 9DC9D21B62; Thu, 24 Oct 2024 09:58:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1729763923; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sZntrTTptEiqTMwL6pTNPKXw1ByWsGeunjCMJ5apeUM=; b=JnuHwcF6niwKs6WSZOtKmV4ZWu+TtP3BcV+t/7IMV93PyqLhjkUhFCvH+a4XZJSfcNAYn2 GtYIcp5Uj6jABfCP8kaTn7IG/m1ECaWKU9/dCwfN0kqrR/rBV/aj+CNGSXcmSYIGVL9GRN IcEwUz+R0Cv/4RSsImt/4GlhAyF+gbI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1729763923; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sZntrTTptEiqTMwL6pTNPKXw1ByWsGeunjCMJ5apeUM=; b=TVw3P/pZjP1aBnRezLWl/ZwgrLtgBdT1zRgOFRIxCSW23Ktpe6UpdHnZATwvaGOqP44IYs C7SUbHloUMRBgnCg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1729763923; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sZntrTTptEiqTMwL6pTNPKXw1ByWsGeunjCMJ5apeUM=; b=JnuHwcF6niwKs6WSZOtKmV4ZWu+TtP3BcV+t/7IMV93PyqLhjkUhFCvH+a4XZJSfcNAYn2 GtYIcp5Uj6jABfCP8kaTn7IG/m1ECaWKU9/dCwfN0kqrR/rBV/aj+CNGSXcmSYIGVL9GRN IcEwUz+R0Cv/4RSsImt/4GlhAyF+gbI= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1729763923; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sZntrTTptEiqTMwL6pTNPKXw1ByWsGeunjCMJ5apeUM=; b=TVw3P/pZjP1aBnRezLWl/ZwgrLtgBdT1zRgOFRIxCSW23Ktpe6UpdHnZATwvaGOqP44IYs C7SUbHloUMRBgnCg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 7D13E136F5; Thu, 24 Oct 2024 09:58:43 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id U3EqHlMaGmcwVAAAD6G6ig (envelope-from ); Thu, 24 Oct 2024 09:58:43 +0000 Message-ID: Date: Thu, 24 Oct 2024 11:58:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: darktable performance regression on AMD systems caused by "mm: align larger anonymous mappings on THP boundaries" Content-Language: en-US To: Thorsten Leemhuis , Rik van Riel Cc: Matthias , Andrew Morton , Linux kernel regressions list , LKML , Linux-MM , Yang Shi , Petr Tesarik References: <2050f0d4-57b0-481d-bab8-05e8d48fed0c@leemhuis.info> From: Vlastimil Babka In-Reply-To: <2050f0d4-57b0-481d-bab8-05e8d48fed0c@leemhuis.info> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Action: no action X-Rspam-User: X-Stat-Signature: b6rctsuwhtpwadnr5r4zdsgehfrdyh5t X-Rspamd-Queue-Id: DFFEAC0016 X-Rspamd-Server: rspam11 X-HE-Tag: 1729763902-677307 X-HE-Meta: U2FsdGVkX1+KS8lVxy3dQVqjAfQ8a6xoucocRq/EESBKjIEPNzvTJ0aEfyho/vP8Fye1KTRXIKva4G1gt2F09LrTjAgyO8X1SSMyDrpdclYLrlnydZtzsoqAP72lsFns4Ub9U27yPJhk/T2nOZR6kIUkoWOuQ8vfsAMcBP98dSq+Ztf3mDqfEdhBzgaWk6a911HSZ8iN2c0v5Y+5kUei8bpZeGJ/ij4/5biSaAqJpEqMNdHGi+V5cDawCelHDOIwAmr4ExHSJSsUZnT/G3/w8ew10V2JcVTjqU+fXhh80zLcJlr0OrMwcha4wr49qlNUzMeqwc2gSVn4YR18ub0DxaB/R1IFAXQ3wOJLejfhKSr/SlZyc74/hPXpe4225u+3HrA7abRDuoYya4gHY70moUh+CyPIoXkAWs+FLH74Go8vuci/FZU/a4EhOb07ZA/V2vGGocDE3xRGANrJXxkMoTAj8yCnwpVStpFviUJecgZkGAFU3unbqas1jnRXljq7GQdGAKa0hD6WjutbXsFsdJdQhetlv7OpszR+YjK22VRI10TY1uFbx+vCwfIJxckl1UF1OC44rwCW5fM0oHwjMKebBC5VM9Cwtj5kLiVisnbNj0l7Fp8ArIpESxQZNRsSIgZJa6agN0GlUWroYNz8cMafmjeTwX64vUGB46dmwNmOReokjR/DbkUpwOD3uV023lbZvOWXVIWs/lxXVo8k1kl7n/4y4dKT8lnNTCxF1VwmvvGv+0uxoFO6ZiIoR2XMwdPM5B84eGU/2hsKadgUyYRJNVJkUDl+EouEtnTVJKajKtMMInRIiDcHRktwhMmxn1k05mpAVcUP3XLFfZZPOseYzuvq7iZPEvFUjSVkLERP/2h02tcLxTdKrQI5DiPDnrb6R8Jx+OhtIDaWq0TDrbW+BjWV54m1GOpsa9pvYl0sFUjlRGQhEWKE/nvnJTQZ0y4sMGOxBobrQNU+CEj L4wTp3GZ KYFsdJtlEtiGzdNct2rYm/emXQ2X3XmnF/XA1GBFUV18FNFvcFLnWWM4otZtLGCH6t+e/u6/ek8XzTVbOUXh5KLPQZVjuG4pyAipqmHMwORuwo1P+e1h2qLvm14CotvNUqSvknBMSxsYOZtKMzdSNcY5nmybGm5Nb4/dAjq+wRF85MpnIaC6nzlq+Hgn6/zcqCEDA6PZd3OoyR/vCbGjPLpEWiEyGdgZCHZ/qi6QfCaaZxQ5UaUvj2pd9b4ab5iAHqgs4t9XNtSFcQt3kElojan4Pig8KlO8s0ZmWmkUQavLbZ5CJSvHD2Wi5M6+GaM9gA662RAOSPoX3h1o4/8wtMkfeWwnwlp91xPDd1Z89XjuW5EWuFIykVw0cgp2Fe4oZRxURcS0lfQsezINLLHdFNpnKwl9ZsLkVU9lGZeisANUhzGN0j6EHgK6eyBZl3HiN9JdEd1b0FIM99O0ELmJS71DwveiWQ46eb0fbvxyqPGPFiGyOtxZV3DQ3zLRVgX8ZYCAHm7+OFCdHDIkW1+4ufqU0b0NnIwtvISIwGQJlB5N7M4T+pnLJ6x4SZA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/24/24 09:45, Thorsten Leemhuis wrote: > Hi, Thorsten here, the Linux kernel's regression tracker. > > Rik, I noticed a report about a regression in bugzilla.kernel.org that > appears to be caused by the following change of yours: > > efa7df3e3bb5da ("mm: align larger anonymous mappings on THP boundaries") > [v6.7] > > It might be one of those "some things got faster, a few things became > slower" situations. Not sure. Felt odd that the reporter was able to > reproduce it on two AMD systems, but not on a Intel system. Maybe there > is a bug somewhere else that was exposed by this. It seems very similar to what we've seen with spec benchmarks such as cactus and bisected to the same commit: https://bugzilla.suse.com/show_bug.cgi?id=1229012 The exact regression varies per system. Intel regresses too but relatively less. The theory is that there are many large-ish allocations that don't have individual sizes aligned to 2MB and would have been merged, commit efa7df3e3bb5da causes them to become separate areas where each aligns its start at 2MB boundary and there are gaps between. This (gaps and vma fragmentation) itself is not great, but most of the problem seemed to be from the start alignment, which togethter with the access pattern causes more TLB or cache missess due to limited associtativity. So maybe darktable has a similar problem. A simple candidate fix could change commit efa7df3e3bb5da so that the mapping size has to be a multiple of THP size (2MB) in order to become aligned, right now it's enough if it's THP sized or larger. > So in the end it felt worth forwarding by mail to me. Not tracking this > yet, first waiting for feedback. > > To quote from https://bugzilla.kernel.org/show_bug.cgi?id=219366 : > >> Matthias 2024-10-09 05:37:51 UTC >> >> I am using a darktable benchmark and I am finding that RAW-to-JPG >> conversion is about 15-25 % slower with kernels 6.7-6.10. The last >> fast kernel series is 6.6. I also tested kernel series 6.5 and it is >> as fast as 6.6 >> >> I know this sounds weird. What has darktable to do with the kernel? >> But the numbers are true. And the darktable devs tell me that this >> is a kernel regression. The darktable github issue is: https:// >> github.com/darktable-org/darktable/issues/17397 You can find more >> details there. >> >> What do I do to measure the performance? >> >> I am executing darktable on the command line. opencl is disabled so >> that all activities are only on the CPU: >> >> darktable-cli bench.SRW /tmp/test.jpg --core --disable-opencl -d >> perf -d opencl --configdir /tmp >> >> ( bench.SRW and the sidecar file can be found here: https:// >> drive.google.com/drive/folders/1cfV2b893JuobVwGiZXcaNv5-yszH6j-N ) >> >> This will show some debug output. The line to look for is >> >> 4,2765 [dev_process_export] pixel pipeline processing took 3,811 >> secs (81,883 CPU) >> >> This gives an exact number how much time darktable needed to convert >> the image. The time darktable needs has a clear dependency on the >> kernel version. It is fast with kernel 6.6. and older and slow with >> kernel 6.7 and newer. Something must have happened from 6.6 to 6.7 >> which slows down darktable. >> >> The darktable debug output shows that basically only one module is >> responsible for the slow down: 'atrous' >> >> with kernel 6.6.47: >> >> 4,0548 [dev_pixelpipe] took 0,635 secs (14,597 CPU) [export] >> processed 'atrous' on CPU, blended on CPU ... 4,2765 >> [dev_process_export] pixel pipeline processing took 3,811 secs >> (81,883 CPU) >> >> with kernel 6.10.6: >> >> 4,9645 [dev_pixelpipe] took 1,489 secs (33,736 CPU) [export] >> processed 'atrous' on CPU, blended on CPU ... 5,2151 >> [dev_process_export] pixel pipeline processing took 4,773 secs >> (102,452 CPU) >> >> >> This is also being discussed here: https://discuss.pixls.us/t/ >> darktable-performance-regression-with-kernel-6-7-and-newer/45945/1 >> And other users confirm the performance degradation. > > [...] > >> This seems to affect AMD only. I reproduced this performance >> degradation on two different Ryzen Desktop PCs (Ryzen 5 and Ryzen >> 9). But I can not reproduce it on my Intel PC (Lenovo X1 Carbon, >> core i5). > > [...] > >> By the way, there is also a thread in the darktable forum on this topic: >> https://discuss.pixls.us/t/darktable-performance-regression-with-kernel-6-7-and-newer/45945 >> >> Some users reproduced it there as well. > > See the ticket for more details. The reporter is CCed. openZFS is in > use, but the problem was reproduced on vanilla kernels. > > Ciao, Thorsten >