From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ptr1337.dev (mail.ptr1337.dev [202.61.224.105]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B513914A09A; Tue, 5 Nov 2024 17:19:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.61.224.105 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730827197; cv=none; b=TI7sWXak8APR6cy2LMQMtB9M+zQce1HvTZG0fnjizMZZ3EMOFHILnfOlsvPZMT/yI5jbg6g5dyUj29ofOT5+F8lo8Uy2A/0HYUfyLOlTcnnfgI1gK/zyx8Q3YCJ3krb/Oyax+COqQtvZ7bYIMwA/DexCQSvQYqxAvsjUVFYe7wA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730827197; c=relaxed/simple; bh=/bX30gzCn1WaGdN6UaNjhjh3nLrFDnZyAJu1tkyMiiE=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=CI+zM8ONqIeDg7FAHD3V1iO+2NSbWMlGGjYtCe46RE+Cd5XX7CYbmCvfu9B46BdzkSuuPCZ22W1Ndf6qyVZWKLT0DB+Q9t3SlYhWLVlSkMWGoCASAInimHC6yF3TvS1ax0qMMb/iROvUastFtPtv4wau+PhVbxYyEGKzq2vsbqA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=cachyos.org; spf=pass smtp.mailfrom=cachyos.org; dkim=pass (2048-bit key) header.d=cachyos.org header.i=@cachyos.org header.b=Gq+f2AxN; arc=none smtp.client-ip=202.61.224.105 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=cachyos.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cachyos.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cachyos.org header.i=@cachyos.org header.b="Gq+f2AxN" Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 282912805B7; Tue, 5 Nov 2024 18:19:41 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cachyos.org; s=dkim; t=1730827190; h=from:subject:date:message-id:to:cc:mime-version:content-type: content-transfer-encoding:content-language:in-reply-to:references; bh=vgfw7rDj5f/syr7E1CkO3oIgV84tN86uVeGmrlkmarQ=; b=Gq+f2AxNGuD3AduQOgJrrNil0VAYXQ04QWw1/94m/8D7pERLY9aqAipNvYA7dFE4UMQEMW XzPSHY/nep8AlD2jD6PYYFkAK8S8+9mmZe7eo70X8HvGBXZYLvYzPnTO9R8evFbQBUdWSY BeRWGUAPJ2OiH0wIDUuJLB7rXQELe3+zDaCUfUe0IH5HRc32EudTip5wBmqkcGjCtHsrjd oA9GxucR1qjMb0jXIaoS7hPX1xJZkM36Zm7qQN0ioJFQrNy+pMvt6KZShGVb3hdf+5ZQOU 83n95TeQ6pjdc+fyeFnGwNfHlKiP+YnJSfcEJg3GRO26RvTDHI3MJqVzf3dV7A== Message-ID: Date: Tue, 5 Nov 2024 18:19:40 +0100 Precedence: bulk X-Mailing-List: workflows@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7 1/7] Add AutoFDO support for Clang build From: Peter Jung To: Rong Xu , Han Shen Cc: Alice Ryhl , Andrew Morton , Arnd Bergmann , Bill Wendling , Borislav Petkov , Breno Leitao , Brian Gerst , Dave Hansen , David Li , Heiko Carstens , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Jonathan Corbet , Josh Poimboeuf , Juergen Gross , Justin Stitt , Kees Cook , Masahiro Yamada , "Mike Rapoport (IBM)" , Nathan Chancellor , Nick Desaulniers , Nicolas Schier , "Paul E. McKenney" , Peter Zijlstra , Sami Tolvanen , Thomas Gleixner , Wei Yang , workflows@vger.kernel.org, Miguel Ojeda , Maksim Panchenko , "David S. Miller" , Andreas Larsson , Yonghong Song , Yabin Cui , Krzysztof Pszeniczny , Sriraman Tallam , Stephane Eranian , x86@kernel.org, linux-arch@vger.kernel.org, sparclinux@vger.kernel.org, linux-doc@vger.kernel.org, linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, llvm@lists.linux.dev References: <20241102175115.1769468-1-xur@google.com> <20241102175115.1769468-2-xur@google.com> <09349180-027a-4b29-a40c-9dc3425e592c@cachyos.org> <3183ab86-8f1f-4624-9175-31e77d773699@cachyos.org> <67c07d2f-fb1f-4b7d-96e2-fb5ceb8fc692@cachyos.org> <449fddd2-342f-48cc-9a11-8a34814f1284@cachyos.org> Content-Language: en-US Organization: CachyOS In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Last-TLS-Session-Version: TLSv1.3 Here the bugreport, in case someone wants to track it: https://sourceware.org/bugzilla/show_bug.cgi?id=32340 On 05.11.24 15:56, Peter Jung wrote: > You were right - reverting commit: > https://github.com/bminor/binutils-gdb/commit/ > b20ab53f81db7eefa0db00d14f06c04527ac324c from the 2.43 branch does fix > the packaging. > > I will forward this to an issue at their bugzilla. > > On 05.11.24 15:33, Peter Jung wrote: >> Hi Rong, >> >> Glad that you were able to reproduce the issue! >> Thanks for finding the root cause as well as the part of the code. >> This really helps. >> >> I was able to do a successful packaging with binutils 2.42. >> Lets forward this to the binutils tracker and hope this will be soon >> solved. 🙂 >> >> I have tested this also on the latest commit >> (e1e4078ac59740a79cd709d61872abe15aba0087) and the issue is also >> reproducible there. >> >> Thanks for your time! I dont see this as blocker. 🙂 >> It gets time to get this series merged :P >> >> Best regards, >> >> Peter >> >> >> >> On 05.11.24 08:25, Rong Xu wrote: >>> We debugged this issue and we found the failure seems to only happen >>> with strip (version 2.43) in binutil. >>> >>> For a profile-use compilation, either with -fprofile-use (PGO or >>> iFDO), or -fprofile-sample-use (AutoFDO), >>> an ELF section of .llvm.call-graph-profile is created for the object. >>> For some reasons (like to save space?), >>> the relocations in this section are of type "rel', rather the more >>> common "rela" type. >>> >>> In this case, >>> $ readelf -r kvm.ko |grep llvm.call-graph-profile >>> Relocation section '.rel.llvm.call-graph-profile' at offset 0xf62a00 >>> contains 4 entries: >>> >>> strip (v2.43.0) has difficulty handling the relocations in >>> .rel.llvm.call-graph-profile -- it silently failed with --strip-debug. >>> But strip (v.2.42) has no issue with kvm.ko. The strip in llvm (i.e. >>> llvm-strip) also passes with kvm.ko >>> >>> I compared binutil/strip source code for version v2.43.0 and v2.42. >>> The different is around here: >>> In v2.42 of bfd/elfcode.h >>>     1618       if ((entsize == sizeof (Elf_External_Rela) >>>     1619            && ebd->elf_info_to_howto != NULL) >>>     1620           || ebd->elf_info_to_howto_rel == NULL) >>>     1621         res = ebd->elf_info_to_howto (abfd, relent, &rela); >>>     1622       else >>>     1623         res = ebd->elf_info_to_howto_rel (abfd, relent, &rela); >>> >>> In v2.43.0 of bfd/elfcode.h >>>     1618       if (entsize == sizeof (Elf_External_Rela) >>>     1619           && ebd->elf_info_to_howto != NULL) >>>     1620         res = ebd->elf_info_to_howto (abfd, relent, &rela); >>>     1621       else if (ebd->elf_info_to_howto_rel != NULL) >>>     1622         res = ebd->elf_info_to_howto_rel (abfd, relent, &rela); >>> >>> In the 2.43 strip, line 1618 is false and line 1621 is also false. >>> "res" is returned as false and the program exits with -1. >>> >>> While in 2.42, line 1620 is true and we get "res" from line 1621 and >>> program functions correctly. >>> >>> I'm not familiar with binutil code base and don't know the reason for >>> removing line 1620. >>> I can file a bug for binutil for people to further investigate this. >>> >>> It seems to me that this issue should not be a blocker for our patch. >>> >>> Regards, >>> >>> -Rong >>> >>> >>> >>> >>> >>> On Mon, Nov 4, 2024 at 12:24 PM Han Shen wrote: >>>> Hi Peter, >>>> Thanks for providing the detailed reproduce. >>>> Now I can see the error (after I synced to 6.12.0-rc6, I was using >>>> rc5). >>>> I'll look into that and report back. >>>> >>>>> I have tested your provided method, but the AutoFDO profile (lld does >>>> not get lto-sample-profile=$pathtoprofile passed) >>>> >>>> I see. You also turned on ThinLTO, which I didn't, so the profile was >>>> only used during compilation, not passed to lld. >>>> >>>> Thanks, >>>> Han >>>> >>>> On Mon, Nov 4, 2024 at 9:31 AM Peter Jung wrote: >>>>> Hi Han, >>>>> >>>>> I have tested your provided method, but the AutoFDO profile (lld does >>>>> not get lto-sample-profile=$pathtoprofile passed)  nor Clang as >>>>> compiler >>>>> gets used. >>>>> Please replace following PKGBUILD and config from linux-mainline with >>>>> the provided one in the gist. The patch is also included there. >>>>> >>>>> https://gist.github.com/ptr1337/c92728bb273f7dbc2817db75eedec9ed >>>>> >>>>> The main change I am doing here, is passing following to the build >>>>> array >>>>> and replacing "make all": >>>>> >>>>> make LLVM=1 LLVM_IAS=1 CLANG_AUTOFDO_PROFILE=${srcdir}/perf.afdo all >>>>> >>>>> When compiling the kernel with makepkg, this results at the >>>>> packaging to >>>>> following issue and can be reliable reproduced. >>>>> >>>>> Regards, >>>>> >>>>> Peter >>>>> >>>>> >>>>> On 04.11.24 05:50, Han Shen wrote: >>>>>> Hi Peter, thanks for reporting the issue. I am trying to reproduce it >>>>>> in the up-to-date archlinux environment. Below is what I have: >>>>>>     0. pacman -Syu >>>>>>     1. cloned archlinux build files from >>>>>> https://aur.archlinux.org/linux-mainline.git the newest mainline >>>>>> version is 6.12rc5-1. >>>>>>     2. changed the PKGBUILD file to include the patches series >>>>>>     3. changed the "config" to turn on clang autofdo >>>>>>     4. collected afdo profiles >>>>>>     5. MAKEFLAGS="-j48 V=1 LLVM=1 CLANG_AUTOFDO_PROFILE=$(pwd)/ >>>>>> perf.afdo" \ >>>>>>           makepkg -s --skipinteg --skippgp >>>>>>     6. install and reboot >>>>>> The above steps succeeded. >>>>>> You mentioned the error happens at "module_install", can you instruct >>>>>> me how to execute the "module_install" step? >>>>>> >>>>>> Thanks, >>>>>> Han >>>>>> >>>>>> On Sat, Nov 2, 2024 at 12:53 PM Peter Jung >>>>>> wrote: >>>>>>> >>>>>>> On 02.11.24 20:46, Peter Jung wrote: >>>>>>>> On 02.11.24 18:51, Rong Xu wrote: >>>>>>>>> Add the build support for using Clang's AutoFDO. Building the >>>>>>>>> kernel >>>>>>>>> with AutoFDO does not reduce the optimization level from the >>>>>>>>> compiler. AutoFDO uses hardware sampling to gather information >>>>>>>>> about >>>>>>>>> the frequency of execution of different code paths within a >>>>>>>>> binary. >>>>>>>>> This information is then used to guide the compiler's optimization >>>>>>>>> decisions, resulting in a more efficient binary. Experiments >>>>>>>>> showed that the kernel can improve up to 10% in latency. >>>>>>>>> >>>>>>>>> The support requires a Clang compiler after LLVM 17. This >>>>>>>>> submission >>>>>>>>> is limited to x86 platforms that support PMU features like LBR on >>>>>>>>> Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1, >>>>>>>>>     and BRBE on ARM 1 is part of planned future work. >>>>>>>>> >>>>>>>>> Here is an example workflow for AutoFDO kernel: >>>>>>>>> >>>>>>>>> 1) Build the kernel on the host machine with LLVM enabled, for >>>>>>>>> example, >>>>>>>>>           $ make menuconfig LLVM=1 >>>>>>>>>        Turn on AutoFDO build config: >>>>>>>>>          CONFIG_AUTOFDO_CLANG=y >>>>>>>>>        With a configuration that has LLVM enabled, use the >>>>>>>>> following >>>>>>>>>        command: >>>>>>>>>           scripts/config -e AUTOFDO_CLANG >>>>>>>>>        After getting the config, build with >>>>>>>>>          $ make LLVM=1 >>>>>>>>> >>>>>>>>> 2) Install the kernel on the test machine. >>>>>>>>> >>>>>>>>> 3) Run the load tests. The '-c' option in perf specifies the >>>>>>>>> sample >>>>>>>>>       event period. We suggest     using a suitable prime number, >>>>>>>>>       like 500009, for this purpose. >>>>>>>>>       For Intel platforms: >>>>>>>>>          $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c >>>>>>>>> \ >>>>>>>>>            -o -- >>>>>>>>>       For AMD platforms: >>>>>>>>>          The supported system are: Zen3 with BRS, or Zen4 with >>>>>>>>> amd_lbr_v2 >>>>>>>>>         For Zen3: >>>>>>>>>          $ cat proc/cpuinfo | grep " brs" >>>>>>>>>          For Zen4: >>>>>>>>>          $ cat proc/cpuinfo | grep amd_lbr_v2 >>>>>>>>>          $ perf record --pfm-events >>>>>>>>> RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k >>>>>>>>> -a \ >>>>>>>>>            -N -b -c -o -- >>>>>>>>> >>>>>>>>> 4) (Optional) Download the raw perf file to the host machine. >>>>>>>>> >>>>>>>>> 5) To generate an AutoFDO profile, two offline tools are >>>>>>>>> available: >>>>>>>>>       create_llvm_prof and llvm_profgen. The create_llvm_prof >>>>>>>>> tool is part >>>>>>>>>       of the AutoFDO project and can be found on GitHub >>>>>>>>>       (https://github.com/google/autofdo), version v0.30.1 or >>>>>>>>> later. The >>>>>>>>>       llvm_profgen tool is included in the LLVM compiler >>>>>>>>> itself. It's >>>>>>>>>       important to note that the version of llvm_profgen >>>>>>>>> doesn't need to >>>>>>>>>       match the version of Clang. It needs to be the LLVM 19 >>>>>>>>> release or >>>>>>>>>       later, or from the LLVM trunk. >>>>>>>>>          $ llvm-profgen --kernel --binary= -- >>>>>>>>> perfdata= \ >>>>>>>>>            -o >>>>>>>>>       or >>>>>>>>>          $ create_llvm_prof --binary= -- >>>>>>>>> profile= \ >>>>>>>>>            --format=extbinary --out= >>>>>>>>> >>>>>>>>>       Note that multiple AutoFDO profile files can be merged >>>>>>>>> into one via: >>>>>>>>>          $ llvm-profdata merge -o   ... >>>>>>>>> >>>>>>>>> >>>>>>>>> 6) Rebuild the kernel using the AutoFDO profile file with the >>>>>>>>> same config >>>>>>>>>       as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled): >>>>>>>>>          $ make LLVM=1 CLANG_AUTOFDO_PROFILE= >>>>>>>>> >>>>>>>>> Co-developed-by: Han Shen >>>>>>>>> Signed-off-by: Han Shen >>>>>>>>> Signed-off-by: Rong Xu >>>>>>>>> Suggested-by: Sriraman Tallam >>>>>>>>> Suggested-by: Krzysztof Pszeniczny >>>>>>>>> Suggested-by: Nick Desaulniers >>>>>>>>> Suggested-by: Stephane Eranian >>>>>>>>> Tested-by: Yonghong Song >>>>>>>>> Tested-by: Yabin Cui >>>>>>>>> Tested-by: Nathan Chancellor >>>>>>>>> Reviewed-by: Kees Cook >>>>>>>> Tested-by: Peter Jung >>>>>>>> >>>>>>> The compilations and testing with the "make pacman-pkg" function >>>>>>> from >>>>>>> the kernel worked fine. >>>>>>> >>>>>>> One problem I do face: >>>>>>> When I apply a AutoFDO profile together with the PKGBUILD [1] from >>>>>>> archlinux im running into issues at "module_install" at the >>>>>>> packaging. >>>>>>> >>>>>>> See following log: >>>>>>> ``` >>>>>>> make[2]: *** [scripts/Makefile.modinst:125: >>>>>>> /tmp/makepkg/linux-cachyos-rc-autofdo/pkg/linux-cachyos-rc- >>>>>>> autofdo/usr/lib/modules/6.12.0-rc5-5-cachyos-rc-autofdo/kernel/ >>>>>>> arch/x86/kvm/kvm.ko] >>>>>>> Error 1 >>>>>>> make[2]: *** Deleting file >>>>>>> '/tmp/makepkg/linux-cachyos-rc-autofdo/pkg/linux-cachyos-rc- >>>>>>> autofdo/usr/lib/modules/6.12.0-rc5-5-cachyos-rc-autofdo/kernel/ >>>>>>> arch/x86/kvm/kvm.ko' >>>>>>>      INSTALL >>>>>>> /tmp/makepkg/linux-cachyos-rc-autofdo/pkg/linux-cachyos-rc- >>>>>>> autofdo/usr/lib/modules/6.12.0-rc5-5-cachyos-rc-autofdo/kernel/ >>>>>>> crypto/cryptd.ko >>>>>>> make[2]: *** Waiting for unfinished jobs.... >>>>>>> ``` >>>>>>> >>>>>>> >>>>>>> This can be fixed with removed "INSTALL_MOD_STRIP=1" to the passed >>>>>>> parameters of module_install. >>>>>>> >>>>>>> This explicitly only happens, if a profile is passed - otherwise the >>>>>>> packaging works without problems. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Peter Jung >>>>>>> >> >