From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B991C4321E for ; Thu, 1 Dec 2022 21:35:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2CFE6B0074; Thu, 1 Dec 2022 16:35:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BDC526B0075; Thu, 1 Dec 2022 16:35:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7C0D6B0078; Thu, 1 Dec 2022 16:35:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9570D6B0074 for ; Thu, 1 Dec 2022 16:35:38 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6B697AADA6 for ; Thu, 1 Dec 2022 21:35:38 +0000 (UTC) X-FDA: 80195044356.19.8D2A649 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf14.hostedemail.com (Postfix) with ESMTP id CADDF100017 for ; Thu, 1 Dec 2022 21:35:37 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=igVVAWmh; spf=pass (imf14.hostedemail.com: domain of nathan@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=nathan@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669930538; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i1yroBbNRbCABJWnsIPOpOAaumjEnnfXwIk1529efCY=; b=M7yppJZRcduLOccqsHdHEbSqPaLSAeP2mCzoAaNn9bM7k/AytL16uo7mWwT4wIpKWDCqAT hk99q5f6kL8vB5bS+bal0sztlBExhMK7heEd42DnRGRn8Meplqgqj7q0XSl6aq8ugRcCZX hEEpbIBfOmhXb87rDL1FmROgyCuC12U= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669930538; a=rsa-sha256; cv=none; b=WgdamKD91vnE2mI9XZDDDRxinskE9S6N3Bl3/CACjYcjIyfIAXte+/Kw3YhB2A/9DFuagE RverFkXu+r/sTf2pXigoxtHEBa02BRQFK9JNqq0pOIViFqxOnGBkwtITiYCFkGOKCmQawz gMIatpAPfVTRmFizoh3OM+Mq8o6Haa0= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=igVVAWmh; spf=pass (imf14.hostedemail.com: domain of nathan@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=nathan@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B0AEFB82027; Thu, 1 Dec 2022 21:35:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 93FD0C433C1; Thu, 1 Dec 2022 21:35:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1669930534; bh=bPrANl5O/CR6UQ6leZGabXI4wglOYxJwd5Aq1s06nMo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=igVVAWmhBrmkKxFznNIj8c2NFs3LakG02IVH0b/MMp2xfjhnqkrGVUYpeukyUMDG4 HUAim2RAOoQOycE4AwzerzQkQnkwRCIcS00I3nDQAVQK7FJhgqs/jmx3tIMOfqPXcr 4AUcpUlwXKe20ont5QoxqtV2gdnK/Uxo0mLLCrS86msO7L9z14WVfehI0R0asMr8aV twKrtLyQzTyAys8v3Rc0YP03pJUWRSn7s75ASTJCdRcQXNocrWShgagmrX4Q0c/ykt WoUwG6VKiSNTots7qaDhlOTjaEWAYSlc2RTm7lgopjrhnO1RTH3XRLmF8scpsZDPpp A9L/KImo1obpg== Date: Thu, 1 Dec 2022 14:35:31 -0700 From: Nathan Chancellor To: Rik van Riel Cc: Thorsten Leemhuis , Andrew Morton , Yang Shi , "Huang, Ying" , kernel test robot , lkp@lists.01.org, lkp@intel.com, Matthew Wilcox , linux-kernel@vger.kernel.org, linux-mm@kvack.org, feng.tang@intel.com, zhengjun.xing@linux.intel.com, fengwei.yin@intel.com Subject: Re: [mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression Message-ID: References: <202210181535.7144dd15-yujie.liu@intel.com> <87edv4r2ip.fsf@yhuang6-desk2.ccr.corp.intel.com> <871qr3nkw2.fsf@yhuang6-desk2.ccr.corp.intel.com> <366045a27a96e01d0526d63fd78d4f3c5d1f530b.camel@surriel.com> <07adee081a70c2b4b44d9bf93a0ad3142e091086.camel@surriel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <07adee081a70c2b4b44d9bf93a0ad3142e091086.camel@surriel.com> X-Rspam-User: X-Rspamd-Queue-Id: CADDF100017 X-Stat-Signature: sckmxb365mgf7bur8meryy8c5dtt79f8 X-Rspamd-Server: rspam01 X-Spamd-Result: default: False [0.10 / 9.00]; BAYES_HAM(-3.00)[100.00%]; IRL_BL_25(2.00)[52.25.139.140:received]; SUBJECT_HAS_UNDERSCORES(1.00)[]; RCVD_NO_TLS_LAST(0.10)[]; BAD_REP_POLICIES(0.10)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_POLICY_ALLOW(0.00)[kernel.org,none]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_TWELVE(0.00)[14]; MIME_TRACE(0.00)[0:+]; DKIM_TRACE(0.00)[kernel.org:+]; RCVD_COUNT_THREE(0.00)[3]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; ARC_NA(0.00)[]; R_DKIM_ALLOW(0.00)[kernel.org:s=k20201202]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(0.00)[+a:ams.source.kernel.org]; TO_DN_SOME(0.00)[]; TC_DOMAIN_MIX_CASE(0.00)[] X-HE-Tag: 1669930537-830231 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 01, 2022 at 03:29:41PM -0500, Rik van Riel wrote: > On Thu, 2022-12-01 at 19:33 +0100, Thorsten Leemhuis wrote: > > Hi, this is your Linux kernel regression tracker. > > > > On 28.11.22 07:40, Nathan Chancellor wrote: > > > Hi Rik, > > > > I wonder what we should do about below performance regression. Is > > reverting the culprit now and reapplying it later together with a fix > > a > > viable option? Or was anything done/is anybody doing something > > already > > to address the problem and I just missed it? > > The changeset in question speeds up kernel compiles with > GCC, as well as the runtime speed of other programs, due > to being able to use THPs more. However, it slows down kernel > compiles with clang, due to ... something clang does. > > I have not figured out what that something is yet. > > I don't know if I have the wrong version of clang here, > but I have not seen any smoking gun at all when tracing > clang system calls. I see predominantly small mmap and > unmap calls, and nothing that even triggers 2MB alignment. Sorry about that :/ What version of clang are you trying to reproduce with? I was able to see this with 14.x and 16.x, it is possible that older versions do not do the thing that is causing this. I cannot really be testing much on my main workstation but I will see if I can reproduce this behavior on one of my other test systems or a virtual machine. Once I do that, if you are still unable to reproduce it, I can potentially try and help you debug this, although I will likely need some hand holding. Cheers, Nathan > > Yang Shi, Andrew, what's your option on this? I ask you directly, > > because it looks like Rik hasn't posted anything to lists archived on > > lore during the last few weeks. :-/ > > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' > > hat) > > > > P.S.: As the Linux kernel's regression tracker I deal with a lot of > > reports and sometimes miss something important when writing mails > > like > > this. If that's the case here, don't hesitate to tell me in a public > > reply, it's in everyone's interest to set the public record straight. > > > > > On Thu, Oct 20, 2022 at 10:16:20AM -0700, Nathan Chancellor wrote: > > > > On Thu, Oct 20, 2022 at 11:28:16AM -0400, Rik van Riel wrote: > > > > > On Thu, 2022-10-20 at 13:07 +0800, Huang, Ying wrote: > > > > > > Nathan Chancellor writes: > > > > > > > > > > > > > > For what it's worth, I just bisected a massive and visible > > > > > > > performance > > > > > > > regression on my Threadripper 3990X workstation to commit > > > > > > > f35b5d7d676e > > > > > > > ("mm: align larger anonymous mappings on THP boundaries"), > > > > > > > which > > > > > > > seems > > > > > > > directly related to this report/analysis. I initially > > > > > > > noticed this > > > > > > > because my full set of kernel builds against mainline went > > > > > > > from 2 > > > > > > > hours > > > > > > > and 20 minutes or so to over 3 hours. Zeroing in on x86_64 > > > > > > > allmodconfig, > > > > > > > which I used for the bisect: > > > > > > > > > > > > > > @ 7b5a0b664ebe ("mm/page_ext: remove unused variable in > > > > > > > offline_page_ext"): > > > > > > > > > > > > > > Benchmark 1: make -skj128 LLVM=1 allmodconfig all > > > > > > >   Time (mean ± σ):     318.172 s ±  0.730 s    [User: > > > > > > > 31750.902 s, > > > > > > > System: 4564.246 s] > > > > > > >   Range (min … max):   317.332 s … 318.662 s    3 runs > > > > > > > > > > > > > > @ f35b5d7d676e ("mm: align larger anonymous mappings on THP > > > > > > > boundaries"): > > > > > > > > > > > > > > Benchmark 1: make -skj128 LLVM=1 allmodconfig all > > > > > > >   Time (mean ± σ):     406.688 s ±  0.676 s    [User: > > > > > > > 31819.526 s, > > > > > System: 16327.022 s] > > > > > > >   Range (min … max):   405.954 s … 407.284 s    3 run > > > > > > > > > > > > Have you tried to build with gcc?  Want to check whether is > > > > > > this > > > > > > clang > > > > > > specific issue or not. > > > > > > > > > > This may indeed be something LLVM specific. In previous tests, > > > > > GCC has generally seen a benefit from increased THP usage. > > > > > Many other applications also benefit from getting more THPs. > > > > > > > > Indeed, GCC builds actually appear to be slightly faster on my > > > > system now, > > > > apologies for not trying that before reporting :/ > > > > > > > > 7b5a0b664ebe: > > > > > > > > Benchmark 1: make -skj128 allmodconfig all > > > >   Time (mean ± σ):     355.294 s ±  0.931 s    [User: 33620.469 > > > > s, System: 6390.064 s] > > > >   Range (min … max):   354.571 s … 356.344 s    3 runs > > > > > > > > f35b5d7d676e: > > > > > > > > Benchmark 1: make -skj128 allmodconfig all > > > >   Time (mean ± σ):     347.400 s ±  2.029 s    [User: 34389.724 > > > > s, System: 4603.175 s] > > > >   Range (min … max):   345.815 s … 349.686 s    3 runs > > > > > > > > > LLVM showing 10% system time before this change, and a whopping > > > > > 30% system time after that change, suggests that LLVM is > > > > > behaving > > > > > quite differently from GCC in some ways. > > > > > > > > The above tests were done with GCC 12.2.0 from Arch Linux. The > > > > previous LLVM > > > > tests were done with a self-compiled version of LLVM from the > > > > main branch > > > > (16.0.0), optimized with BOLT [1]. To eliminate that as a source > > > > of issues, I > > > > used my distribution's version of clang (14.0.6) and saw similar > > > > results as > > > > before: > > > > > > > > 7b5a0b664ebe: > > > > > > > > Benchmark 1: make -skj128 LLVM=/usr/bin/ allmodconfig all > > > >   Time (mean ± σ):     462.517 s ±  1.214 s    [User: 48544.240 > > > > s, System: 5586.212 s] > > > >   Range (min … max):   461.115 s … 463.245 s    3 runs > > > > > > > > f35b5d7d676e: > > > > > > > > Benchmark 1: make -skj128 LLVM=/usr/bin/ allmodconfig all > > > >   Time (mean ± σ):     547.927 s ±  0.862 s    [User: 47913.709 > > > > s, System: 17682.514 s] > > > >   Range (min … max):   547.429 s … 548.922 s    3 runs > > > > > > > > > If we can figure out what these differences are, maybe we can > > > > > just fine tune the code to avoid this issue. > > > > > > > > > > I'll try to play around with LLVM compilation a little bit next > > > > > week, to see if I can figure out what might be going on. I > > > > > wonder > > > > > if LLVM is doing lots of mremap calls or something... > > > > > > > > If there is any further information I can provide or patches I > > > > can test, > > > > I am more than happy to do so. > > > > > > > > [1]: > > > > https://github.com/llvm/llvm-project/tree/96552e73900176d65ee6650facae8d669d6f9498/bolt > > > > > > Was there ever a follow up to this report that I missed? I just > > > noticed that I am still reverting f35b5d7d676e in my mainline > > > kernel. > > > > > > Cheers, > > > Nathan > > > > > > > #regzbot ignore-activity > > > > -- > All Rights Reversed.