From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.ptr1337.dev (mail.ptr1337.dev [202.61.224.105]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FE591D8DE0; Tue, 5 Nov 2024 14:57:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=202.61.224.105 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730818628; cv=none; b=NUmLkQdUbii+jmjMs4oFoCIvKYdsFI5UJsMBCveml2qK3/fd3a5eJR8dpNAy+HXkkg9kC61q5decY/1ojYBk4Vjz5ew3Lh5F3zsWCjAhc+xwFOaj0+F9zyMtWQW+3yeFbA/uuyB8HfqN3+N+seC3uSfLCZRz57Omfw9JhaKxrF0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730818628; c=relaxed/simple; bh=PxqwVVcYGVdItCi+GbxYmGsYLaqI1+qVjoBZ5+jwOPA=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=hRVVpO0tWCkgBdTVqRQy0c/+49ZW/Zj/AwwCFCGS/jdmcUmD/cZYdFQ04vcKDHwiSdG27YmrjLEkSUI+dOYAsCzPSy4UgW9PO/oNSpACsqVRC95+hfzYCNf+uc5TsmZguZYPNuBkJRj/Ohs4R9tuSx1xa9IUlqzJ7Rb0AiutWz0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=cachyos.org; spf=pass smtp.mailfrom=cachyos.org; dkim=pass (2048-bit key) header.d=cachyos.org header.i=@cachyos.org header.b=YWjP+Q2L; arc=none smtp.client-ip=202.61.224.105 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=cachyos.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cachyos.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cachyos.org header.i=@cachyos.org header.b="YWjP+Q2L" Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 3D0112805A2; Tue, 5 Nov 2024 15:56:52 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cachyos.org; s=dkim; t=1730818617; h=from:subject:date:message-id:to:cc:mime-version:content-type: content-transfer-encoding:content-language:in-reply-to:references; bh=0YTgtBqW+0YW1QX49rj8Vs5Pdkwg9by73TTIC3Ltc3g=; b=YWjP+Q2LUuViWIN9YEFqY6pzzJ+x2KOjIxg8Dq/wzbLcXQOMRfbAdys9z4irK6jTXan8BU +haVU74KkzlGO7AWiMLDHBfAaLMnEGTldr8e9ITY6zFTGBld1svpqVa6QJCboAiMdj0AHB hsGZfq7nKotgssZDNc636aTdo3pzgQbeF5v4fryDaQmpwdqEcaiNCm0GYVqwc7PEi4XcgQ 5h7OZdxIN03SNpwDnWMSibgTaLJbm+a41H3fDulJyVuzLOEWCEsPmjMtpe4E2WX+zaitPq l8AqgUqhY9lcZSl2j7C9xBZ6SYX8SzvrWOkxjgyGDnIOvKSAxqoxxiN0F4AzYQ== Message-ID: Date: Tue, 5 Nov 2024 15:56:51 +0100 Precedence: bulk X-Mailing-List: workflows@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7 1/7] Add AutoFDO support for Clang build From: Peter Jung To: Rong Xu , Han Shen Cc: Alice Ryhl , Andrew Morton , Arnd Bergmann , Bill Wendling , Borislav Petkov , Breno Leitao , Brian Gerst , Dave Hansen , David Li , Heiko Carstens , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Jonathan Corbet , Josh Poimboeuf , Juergen Gross , Justin Stitt , Kees Cook , Masahiro Yamada , "Mike Rapoport (IBM)" , Nathan Chancellor , Nick Desaulniers , Nicolas Schier , "Paul E. McKenney" , Peter Zijlstra , Sami Tolvanen , Thomas Gleixner , Wei Yang , workflows@vger.kernel.org, Miguel Ojeda , Maksim Panchenko , "David S. Miller" , Andreas Larsson , Yonghong Song , Yabin Cui , Krzysztof Pszeniczny , Sriraman Tallam , Stephane Eranian , x86@kernel.org, linux-arch@vger.kernel.org, sparclinux@vger.kernel.org, linux-doc@vger.kernel.org, linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, llvm@lists.linux.dev References: <20241102175115.1769468-1-xur@google.com> <20241102175115.1769468-2-xur@google.com> <09349180-027a-4b29-a40c-9dc3425e592c@cachyos.org> <3183ab86-8f1f-4624-9175-31e77d773699@cachyos.org> <67c07d2f-fb1f-4b7d-96e2-fb5ceb8fc692@cachyos.org> <449fddd2-342f-48cc-9a11-8a34814f1284@cachyos.org> Content-Language: en-US Organization: CachyOS In-Reply-To: <449fddd2-342f-48cc-9a11-8a34814f1284@cachyos.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Last-TLS-Session-Version: TLSv1.3 You were right - reverting commit: https://github.com/bminor/binutils-gdb/commit/b20ab53f81db7eefa0db00d14f06c04527ac324c from the 2.43 branch does fix the packaging. I will forward this to an issue at their bugzilla. On 05.11.24 15:33, Peter Jung wrote: > Hi Rong, > > Glad that you were able to reproduce the issue! > Thanks for finding the root cause as well as the part of the code. This > really helps. > > I was able to do a successful packaging with binutils 2.42. > Lets forward this to the binutils tracker and hope this will be soon > solved. 🙂 > > I have tested this also on the latest commit > (e1e4078ac59740a79cd709d61872abe15aba0087) and the issue is also > reproducible there. > > Thanks for your time! I dont see this as blocker. 🙂 > It gets time to get this series merged :P > > Best regards, > > Peter > > > > On 05.11.24 08:25, Rong Xu wrote: >> We debugged this issue and we found the failure seems to only happen >> with strip (version 2.43) in binutil. >> >> For a profile-use compilation, either with -fprofile-use (PGO or >> iFDO), or -fprofile-sample-use (AutoFDO), >> an ELF section of .llvm.call-graph-profile is created for the object. >> For some reasons (like to save space?), >> the relocations in this section are of type "rel', rather the more >> common "rela" type. >> >> In this case, >> $ readelf -r kvm.ko |grep llvm.call-graph-profile >> Relocation section '.rel.llvm.call-graph-profile' at offset 0xf62a00 >> contains 4 entries: >> >> strip (v2.43.0) has difficulty handling the relocations in >> .rel.llvm.call-graph-profile -- it silently failed with --strip-debug. >> But strip (v.2.42) has no issue with kvm.ko. The strip in llvm (i.e. >> llvm-strip) also passes with kvm.ko >> >> I compared binutil/strip source code for version v2.43.0 and v2.42. >> The different is around here: >> In v2.42 of bfd/elfcode.h >>     1618       if ((entsize == sizeof (Elf_External_Rela) >>     1619            && ebd->elf_info_to_howto != NULL) >>     1620           || ebd->elf_info_to_howto_rel == NULL) >>     1621         res = ebd->elf_info_to_howto (abfd, relent, &rela); >>     1622       else >>     1623         res = ebd->elf_info_to_howto_rel (abfd, relent, &rela); >> >> In v2.43.0 of bfd/elfcode.h >>     1618       if (entsize == sizeof (Elf_External_Rela) >>     1619           && ebd->elf_info_to_howto != NULL) >>     1620         res = ebd->elf_info_to_howto (abfd, relent, &rela); >>     1621       else if (ebd->elf_info_to_howto_rel != NULL) >>     1622         res = ebd->elf_info_to_howto_rel (abfd, relent, &rela); >> >> In the 2.43 strip, line 1618 is false and line 1621 is also false. >> "res" is returned as false and the program exits with -1. >> >> While in 2.42, line 1620 is true and we get "res" from line 1621 and >> program functions correctly. >> >> I'm not familiar with binutil code base and don't know the reason for >> removing line 1620. >> I can file a bug for binutil for people to further investigate this. >> >> It seems to me that this issue should not be a blocker for our patch. >> >> Regards, >> >> -Rong >> >> >> >> >> >> On Mon, Nov 4, 2024 at 12:24 PM Han Shen wrote: >>> Hi Peter, >>> Thanks for providing the detailed reproduce. >>> Now I can see the error (after I synced to 6.12.0-rc6, I was using rc5). >>> I'll look into that and report back. >>> >>>> I have tested your provided method, but the AutoFDO profile (lld does >>> not get lto-sample-profile=$pathtoprofile passed) >>> >>> I see. You also turned on ThinLTO, which I didn't, so the profile was >>> only used during compilation, not passed to lld. >>> >>> Thanks, >>> Han >>> >>> On Mon, Nov 4, 2024 at 9:31 AM Peter Jung wrote: >>>> Hi Han, >>>> >>>> I have tested your provided method, but the AutoFDO profile (lld does >>>> not get lto-sample-profile=$pathtoprofile passed)  nor Clang as >>>> compiler >>>> gets used. >>>> Please replace following PKGBUILD and config from linux-mainline with >>>> the provided one in the gist. The patch is also included there. >>>> >>>> https://gist.github.com/ptr1337/c92728bb273f7dbc2817db75eedec9ed >>>> >>>> The main change I am doing here, is passing following to the build >>>> array >>>> and replacing "make all": >>>> >>>> make LLVM=1 LLVM_IAS=1 CLANG_AUTOFDO_PROFILE=${srcdir}/perf.afdo all >>>> >>>> When compiling the kernel with makepkg, this results at the >>>> packaging to >>>> following issue and can be reliable reproduced. >>>> >>>> Regards, >>>> >>>> Peter >>>> >>>> >>>> On 04.11.24 05:50, Han Shen wrote: >>>>> Hi Peter, thanks for reporting the issue. I am trying to reproduce it >>>>> in the up-to-date archlinux environment. Below is what I have: >>>>>     0. pacman -Syu >>>>>     1. cloned archlinux build files from >>>>> https://aur.archlinux.org/linux-mainline.git the newest mainline >>>>> version is 6.12rc5-1. >>>>>     2. changed the PKGBUILD file to include the patches series >>>>>     3. changed the "config" to turn on clang autofdo >>>>>     4. collected afdo profiles >>>>>     5. MAKEFLAGS="-j48 V=1 LLVM=1 CLANG_AUTOFDO_PROFILE=$(pwd)/ >>>>> perf.afdo" \ >>>>>           makepkg -s --skipinteg --skippgp >>>>>     6. install and reboot >>>>> The above steps succeeded. >>>>> You mentioned the error happens at "module_install", can you instruct >>>>> me how to execute the "module_install" step? >>>>> >>>>> Thanks, >>>>> Han >>>>> >>>>> On Sat, Nov 2, 2024 at 12:53 PM Peter Jung wrote: >>>>>> >>>>>> On 02.11.24 20:46, Peter Jung wrote: >>>>>>> On 02.11.24 18:51, Rong Xu wrote: >>>>>>>> Add the build support for using Clang's AutoFDO. Building the >>>>>>>> kernel >>>>>>>> with AutoFDO does not reduce the optimization level from the >>>>>>>> compiler. AutoFDO uses hardware sampling to gather information >>>>>>>> about >>>>>>>> the frequency of execution of different code paths within a binary. >>>>>>>> This information is then used to guide the compiler's optimization >>>>>>>> decisions, resulting in a more efficient binary. Experiments >>>>>>>> showed that the kernel can improve up to 10% in latency. >>>>>>>> >>>>>>>> The support requires a Clang compiler after LLVM 17. This >>>>>>>> submission >>>>>>>> is limited to x86 platforms that support PMU features like LBR on >>>>>>>> Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1, >>>>>>>>     and BRBE on ARM 1 is part of planned future work. >>>>>>>> >>>>>>>> Here is an example workflow for AutoFDO kernel: >>>>>>>> >>>>>>>> 1) Build the kernel on the host machine with LLVM enabled, for >>>>>>>> example, >>>>>>>>           $ make menuconfig LLVM=1 >>>>>>>>        Turn on AutoFDO build config: >>>>>>>>          CONFIG_AUTOFDO_CLANG=y >>>>>>>>        With a configuration that has LLVM enabled, use the >>>>>>>> following >>>>>>>>        command: >>>>>>>>           scripts/config -e AUTOFDO_CLANG >>>>>>>>        After getting the config, build with >>>>>>>>          $ make LLVM=1 >>>>>>>> >>>>>>>> 2) Install the kernel on the test machine. >>>>>>>> >>>>>>>> 3) Run the load tests. The '-c' option in perf specifies the sample >>>>>>>>       event period. We suggest     using a suitable prime number, >>>>>>>>       like 500009, for this purpose. >>>>>>>>       For Intel platforms: >>>>>>>>          $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c >>>>>>>> \ >>>>>>>>            -o -- >>>>>>>>       For AMD platforms: >>>>>>>>          The supported system are: Zen3 with BRS, or Zen4 with >>>>>>>> amd_lbr_v2 >>>>>>>>         For Zen3: >>>>>>>>          $ cat proc/cpuinfo | grep " brs" >>>>>>>>          For Zen4: >>>>>>>>          $ cat proc/cpuinfo | grep amd_lbr_v2 >>>>>>>>          $ perf record --pfm-events >>>>>>>> RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k >>>>>>>> -a \ >>>>>>>>            -N -b -c -o -- >>>>>>>> >>>>>>>> 4) (Optional) Download the raw perf file to the host machine. >>>>>>>> >>>>>>>> 5) To generate an AutoFDO profile, two offline tools are available: >>>>>>>>       create_llvm_prof and llvm_profgen. The create_llvm_prof >>>>>>>> tool is part >>>>>>>>       of the AutoFDO project and can be found on GitHub >>>>>>>>       (https://github.com/google/autofdo), version v0.30.1 or >>>>>>>> later. The >>>>>>>>       llvm_profgen tool is included in the LLVM compiler itself. >>>>>>>> It's >>>>>>>>       important to note that the version of llvm_profgen doesn't >>>>>>>> need to >>>>>>>>       match the version of Clang. It needs to be the LLVM 19 >>>>>>>> release or >>>>>>>>       later, or from the LLVM trunk. >>>>>>>>          $ llvm-profgen --kernel --binary= -- >>>>>>>> perfdata= \ >>>>>>>>            -o >>>>>>>>       or >>>>>>>>          $ create_llvm_prof --binary= -- >>>>>>>> profile= \ >>>>>>>>            --format=extbinary --out= >>>>>>>> >>>>>>>>       Note that multiple AutoFDO profile files can be merged >>>>>>>> into one via: >>>>>>>>          $ llvm-profdata merge -o   ... >>>>>>>> >>>>>>>> >>>>>>>> 6) Rebuild the kernel using the AutoFDO profile file with the >>>>>>>> same config >>>>>>>>       as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled): >>>>>>>>          $ make LLVM=1 CLANG_AUTOFDO_PROFILE= >>>>>>>> >>>>>>>> Co-developed-by: Han Shen >>>>>>>> Signed-off-by: Han Shen >>>>>>>> Signed-off-by: Rong Xu >>>>>>>> Suggested-by: Sriraman Tallam >>>>>>>> Suggested-by: Krzysztof Pszeniczny >>>>>>>> Suggested-by: Nick Desaulniers >>>>>>>> Suggested-by: Stephane Eranian >>>>>>>> Tested-by: Yonghong Song >>>>>>>> Tested-by: Yabin Cui >>>>>>>> Tested-by: Nathan Chancellor >>>>>>>> Reviewed-by: Kees Cook >>>>>>> Tested-by: Peter Jung >>>>>>> >>>>>> The compilations and testing with the "make pacman-pkg" function from >>>>>> the kernel worked fine. >>>>>> >>>>>> One problem I do face: >>>>>> When I apply a AutoFDO profile together with the PKGBUILD [1] from >>>>>> archlinux im running into issues at "module_install" at the >>>>>> packaging. >>>>>> >>>>>> See following log: >>>>>> ``` >>>>>> make[2]: *** [scripts/Makefile.modinst:125: >>>>>> /tmp/makepkg/linux-cachyos-rc-autofdo/pkg/linux-cachyos-rc- >>>>>> autofdo/usr/lib/modules/6.12.0-rc5-5-cachyos-rc-autofdo/kernel/ >>>>>> arch/x86/kvm/kvm.ko] >>>>>> Error 1 >>>>>> make[2]: *** Deleting file >>>>>> '/tmp/makepkg/linux-cachyos-rc-autofdo/pkg/linux-cachyos-rc- >>>>>> autofdo/usr/lib/modules/6.12.0-rc5-5-cachyos-rc-autofdo/kernel/ >>>>>> arch/x86/kvm/kvm.ko' >>>>>>      INSTALL >>>>>> /tmp/makepkg/linux-cachyos-rc-autofdo/pkg/linux-cachyos-rc- >>>>>> autofdo/usr/lib/modules/6.12.0-rc5-5-cachyos-rc-autofdo/kernel/ >>>>>> crypto/cryptd.ko >>>>>> make[2]: *** Waiting for unfinished jobs.... >>>>>> ``` >>>>>> >>>>>> >>>>>> This can be fixed with removed "INSTALL_MOD_STRIP=1" to the passed >>>>>> parameters of module_install. >>>>>> >>>>>> This explicitly only happens, if a profile is passed - otherwise the >>>>>> packaging works without problems. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Peter Jung >>>>>> >