From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 539D61E1311 for ; Tue, 5 Nov 2024 17:52:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730829131; cv=none; b=qOPKfmI35c21faKWunh/fYVLwyBaoflAFAdrsAEA6GAMWpUXCVcbR9cLVocjJekdp0Jymu0Z3Sid8JOYaE/bZb7LNR5m/Z++hylP6tTBfOG/flZgc7dlSNxPME6qTdYbWa+jcplue6Cn5OKiEEGh9S/92OcfzYbTmG0s+z/1xVo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730829131; c=relaxed/simple; bh=+7QG5+NJbKj00cNkZtLscXQvF3IjZiYR4AanUDs5sPs=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=I7PkQe5l+4R75mKKU/CAhsgdqgxHm926nDTPptRR0Eqrt8cGq3LsBhjIN+RpRdp86bwQdrifG2DBV9Xde0kGi0T7/SQIJwGeG2YEG3j+RhqM8aGwjGvbCSTx2BAZnJyCen/1FFcmE3GaFWcRgsNjRXpYaDdhXEbuPqawTRDyxGI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ZN7o5Oqm; arc=none smtp.client-ip=209.85.160.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ZN7o5Oqm" Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-4608dddaa35so8891cf.0 for ; Tue, 05 Nov 2024 09:52:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730829127; x=1731433927; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=BfUA3/rDNyi3HTYVcuuUTBZRfkh4ulVjf/kXG8SdzRY=; b=ZN7o5OqmyJbTrqlhyXEATLXm1+UyZwVBBu8i/Vg6IZQnWXYWcbG+I2aANvEW5/aAQQ RAd8lBuZmf3i0we0K5/oXE+CMSmtkYP7NpKsNpcWbhcquMFWkV23FPbS5bue9S0lOzIu 4WG+jiSPH2lHA4M0okGyipiDqWZHnjpWAIdw3QjcH/0T4TtniRWZzRjMZ4Mx93HmIRjS FMU+goalAb5aCGRPFrdzuIiXdm7LBKWmJorOSOx6qAzXJK0UpirpWxhfJhyFu/ZCxjL+ Bau6vS7MI/bXFaUT6pyelOttP+0zhtJEykZzAUlEsc5RPxbAhmGDbHZoObEkQLxruzru gTDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730829127; x=1731433927; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BfUA3/rDNyi3HTYVcuuUTBZRfkh4ulVjf/kXG8SdzRY=; b=lfT7h9tufbrWHdNo+TrKrwvL2u0rurYHCqc97felwvKJW1Lt+8s/2dI0oZzEo6v5wi UANqwDLcZh0gxKGV2OKph+jvGq4J94zASPb0OGv7B+Jx5itGHeT0LJhIXbTo2gjzs1Pi BUssu/hZK7dGrEd1KI3JgZi52MJoz8I3WRsScWrp3rP7+g0TidkFNFZgECjI51szvP1P 1G1e/FQnMvwQT9hRy+b+2WKpblg3XWzO11zk3PPuiGAHVjg5cMBCfy0le9evRcjK/zm5 ls81kHis+/v1dz1J8koH6JLN2WKTJxY7UKeiIeachSXHqvtrYvVBFeDZ9FbxlgV9zym3 dTTA== X-Forwarded-Encrypted: i=1; AJvYcCXPH2OYfBw9ktkS9eRYpM6O+ZLtyUR/lPfcy4WTb0OpAUcwQjDsFyAi6UYTfWt5gmH/JLuh28VTHxg=@vger.kernel.org X-Gm-Message-State: AOJu0YzcZEJDdXfVYtCFNPbrCi2ILsKX4r+hjz3mqkUgbMT7lOl2/wR9 WQRFZp371xEmQJgIpjWBPIwsYZ5To4Gkoi8i9zqCvzBTxyTP4Fogxtsgm+7iSQxbnapJ6oTpeKK d8kdGAk12WEpxFUUbvPb5KCc6/0NfTpRIo3HD X-Gm-Gg: ASbGnctaoQeSBzKatSSSWt0jHkY3gHwPTpWKVoWIXOtu2hutlGgMEFhoS2iRTRa1AGa BFKL1/zhKyje2ewGz9C011JB4byhWW3DyY1DL3K/QdpvIfIeFeypHuCNlpMSUAQ== X-Google-Smtp-Source: AGHT+IGONnoDoxOE1c2aZlxWJ/Cx1ZVvDUz6sU9V9ccTIGEg2o+Kn0N18L0g84R6qxxrsMsZ+Tlpk13NrTZuLEVSMiA= X-Received: by 2002:ac8:5d0a:0:b0:462:ad94:3555 with SMTP id d75a77b69052e-462e6183889mr3434541cf.25.1730829126973; Tue, 05 Nov 2024 09:52:06 -0800 (PST) Precedence: bulk X-Mailing-List: workflows@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20241102175115.1769468-1-xur@google.com> <20241102175115.1769468-2-xur@google.com> <09349180-027a-4b29-a40c-9dc3425e592c@cachyos.org> <3183ab86-8f1f-4624-9175-31e77d773699@cachyos.org> <67c07d2f-fb1f-4b7d-96e2-fb5ceb8fc692@cachyos.org> <449fddd2-342f-48cc-9a11-8a34814f1284@cachyos.org> In-Reply-To: From: Rong Xu Date: Tue, 5 Nov 2024 09:51:55 -0800 Message-ID: Subject: Re: [PATCH v7 1/7] Add AutoFDO support for Clang build To: Peter Jung Cc: Han Shen , Alice Ryhl , Andrew Morton , Arnd Bergmann , Bill Wendling , Borislav Petkov , Breno Leitao , Brian Gerst , Dave Hansen , David Li , Heiko Carstens , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Jonathan Corbet , Josh Poimboeuf , Juergen Gross , Justin Stitt , Kees Cook , Masahiro Yamada , "Mike Rapoport (IBM)" , Nathan Chancellor , Nick Desaulniers , Nicolas Schier , "Paul E. McKenney" , Peter Zijlstra , Sami Tolvanen , Thomas Gleixner , Wei Yang , workflows@vger.kernel.org, Miguel Ojeda , Maksim Panchenko , "David S. Miller" , Andreas Larsson , Yonghong Song , Yabin Cui , Krzysztof Pszeniczny , Sriraman Tallam , Stephane Eranian , x86@kernel.org, linux-arch@vger.kernel.org, sparclinux@vger.kernel.org, linux-doc@vger.kernel.org, linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, llvm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Peter, thanks for verifying and filing the bug report! -Rong On Tue, Nov 5, 2024 at 9:19=E2=80=AFAM Peter Jung wro= te: > > Here the bugreport, in case someone wants to track it: > > https://sourceware.org/bugzilla/show_bug.cgi?id=3D32340 > > On 05.11.24 15:56, Peter Jung wrote: > > You were right - reverting commit: > > https://github.com/bminor/binutils-gdb/commit/ > > b20ab53f81db7eefa0db00d14f06c04527ac324c from the 2.43 branch does fix > > the packaging. > > > > I will forward this to an issue at their bugzilla. > > > > On 05.11.24 15:33, Peter Jung wrote: > >> Hi Rong, > >> > >> Glad that you were able to reproduce the issue! > >> Thanks for finding the root cause as well as the part of the code. > >> This really helps. > >> > >> I was able to do a successful packaging with binutils 2.42. > >> Lets forward this to the binutils tracker and hope this will be soon > >> solved. =F0=9F=99=82 > >> > >> I have tested this also on the latest commit > >> (e1e4078ac59740a79cd709d61872abe15aba0087) and the issue is also > >> reproducible there. > >> > >> Thanks for your time! I dont see this as blocker. =F0=9F=99=82 > >> It gets time to get this series merged :P > >> > >> Best regards, > >> > >> Peter > >> > >> > >> > >> On 05.11.24 08:25, Rong Xu wrote: > >>> We debugged this issue and we found the failure seems to only happen > >>> with strip (version 2.43) in binutil. > >>> > >>> For a profile-use compilation, either with -fprofile-use (PGO or > >>> iFDO), or -fprofile-sample-use (AutoFDO), > >>> an ELF section of .llvm.call-graph-profile is created for the object. > >>> For some reasons (like to save space?), > >>> the relocations in this section are of type "rel', rather the more > >>> common "rela" type. > >>> > >>> In this case, > >>> $ readelf -r kvm.ko |grep llvm.call-graph-profile > >>> Relocation section '.rel.llvm.call-graph-profile' at offset 0xf62a00 > >>> contains 4 entries: > >>> > >>> strip (v2.43.0) has difficulty handling the relocations in > >>> .rel.llvm.call-graph-profile -- it silently failed with --strip-debug= . > >>> But strip (v.2.42) has no issue with kvm.ko. The strip in llvm (i.e. > >>> llvm-strip) also passes with kvm.ko > >>> > >>> I compared binutil/strip source code for version v2.43.0 and v2.42. > >>> The different is around here: > >>> In v2.42 of bfd/elfcode.h > >>> 1618 if ((entsize =3D=3D sizeof (Elf_External_Rela) > >>> 1619 && ebd->elf_info_to_howto !=3D NULL) > >>> 1620 || ebd->elf_info_to_howto_rel =3D=3D NULL) > >>> 1621 res =3D ebd->elf_info_to_howto (abfd, relent, &rela)= ; > >>> 1622 else > >>> 1623 res =3D ebd->elf_info_to_howto_rel (abfd, relent, &r= ela); > >>> > >>> In v2.43.0 of bfd/elfcode.h > >>> 1618 if (entsize =3D=3D sizeof (Elf_External_Rela) > >>> 1619 && ebd->elf_info_to_howto !=3D NULL) > >>> 1620 res =3D ebd->elf_info_to_howto (abfd, relent, &rela)= ; > >>> 1621 else if (ebd->elf_info_to_howto_rel !=3D NULL) > >>> 1622 res =3D ebd->elf_info_to_howto_rel (abfd, relent, &r= ela); > >>> > >>> In the 2.43 strip, line 1618 is false and line 1621 is also false. > >>> "res" is returned as false and the program exits with -1. > >>> > >>> While in 2.42, line 1620 is true and we get "res" from line 1621 and > >>> program functions correctly. > >>> > >>> I'm not familiar with binutil code base and don't know the reason for > >>> removing line 1620. > >>> I can file a bug for binutil for people to further investigate this. > >>> > >>> It seems to me that this issue should not be a blocker for our patch. > >>> > >>> Regards, > >>> > >>> -Rong > >>> > >>> > >>> > >>> > >>> > >>> On Mon, Nov 4, 2024 at 12:24=E2=80=AFPM Han Shen = wrote: > >>>> Hi Peter, > >>>> Thanks for providing the detailed reproduce. > >>>> Now I can see the error (after I synced to 6.12.0-rc6, I was using > >>>> rc5). > >>>> I'll look into that and report back. > >>>> > >>>>> I have tested your provided method, but the AutoFDO profile (lld do= es > >>>> not get lto-sample-profile=3D$pathtoprofile passed) > >>>> > >>>> I see. You also turned on ThinLTO, which I didn't, so the profile wa= s > >>>> only used during compilation, not passed to lld. > >>>> > >>>> Thanks, > >>>> Han > >>>> > >>>> On Mon, Nov 4, 2024 at 9:31=E2=80=AFAM Peter Jung wrote: > >>>>> Hi Han, > >>>>> > >>>>> I have tested your provided method, but the AutoFDO profile (lld do= es > >>>>> not get lto-sample-profile=3D$pathtoprofile passed) nor Clang as > >>>>> compiler > >>>>> gets used. > >>>>> Please replace following PKGBUILD and config from linux-mainline wi= th > >>>>> the provided one in the gist. The patch is also included there. > >>>>> > >>>>> https://gist.github.com/ptr1337/c92728bb273f7dbc2817db75eedec9ed > >>>>> > >>>>> The main change I am doing here, is passing following to the build > >>>>> array > >>>>> and replacing "make all": > >>>>> > >>>>> make LLVM=3D1 LLVM_IAS=3D1 CLANG_AUTOFDO_PROFILE=3D${srcdir}/perf.a= fdo all > >>>>> > >>>>> When compiling the kernel with makepkg, this results at the > >>>>> packaging to > >>>>> following issue and can be reliable reproduced. > >>>>> > >>>>> Regards, > >>>>> > >>>>> Peter > >>>>> > >>>>> > >>>>> On 04.11.24 05:50, Han Shen wrote: > >>>>>> Hi Peter, thanks for reporting the issue. I am trying to reproduce= it > >>>>>> in the up-to-date archlinux environment. Below is what I have: > >>>>>> 0. pacman -Syu > >>>>>> 1. cloned archlinux build files from > >>>>>> https://aur.archlinux.org/linux-mainline.git the newest mainline > >>>>>> version is 6.12rc5-1. > >>>>>> 2. changed the PKGBUILD file to include the patches series > >>>>>> 3. changed the "config" to turn on clang autofdo > >>>>>> 4. collected afdo profiles > >>>>>> 5. MAKEFLAGS=3D"-j48 V=3D1 LLVM=3D1 CLANG_AUTOFDO_PROFILE=3D$(= pwd)/ > >>>>>> perf.afdo" \ > >>>>>> makepkg -s --skipinteg --skippgp > >>>>>> 6. install and reboot > >>>>>> The above steps succeeded. > >>>>>> You mentioned the error happens at "module_install", can you instr= uct > >>>>>> me how to execute the "module_install" step? > >>>>>> > >>>>>> Thanks, > >>>>>> Han > >>>>>> > >>>>>> On Sat, Nov 2, 2024 at 12:53=E2=80=AFPM Peter Jung > >>>>>> wrote: > >>>>>>> > >>>>>>> On 02.11.24 20:46, Peter Jung wrote: > >>>>>>>> On 02.11.24 18:51, Rong Xu wrote: > >>>>>>>>> Add the build support for using Clang's AutoFDO. Building the > >>>>>>>>> kernel > >>>>>>>>> with AutoFDO does not reduce the optimization level from the > >>>>>>>>> compiler. AutoFDO uses hardware sampling to gather information > >>>>>>>>> about > >>>>>>>>> the frequency of execution of different code paths within a > >>>>>>>>> binary. > >>>>>>>>> This information is then used to guide the compiler's optimizat= ion > >>>>>>>>> decisions, resulting in a more efficient binary. Experiments > >>>>>>>>> showed that the kernel can improve up to 10% in latency. > >>>>>>>>> > >>>>>>>>> The support requires a Clang compiler after LLVM 17. This > >>>>>>>>> submission > >>>>>>>>> is limited to x86 platforms that support PMU features like LBR = on > >>>>>>>>> Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1, > >>>>>>>>> and BRBE on ARM 1 is part of planned future work. > >>>>>>>>> > >>>>>>>>> Here is an example workflow for AutoFDO kernel: > >>>>>>>>> > >>>>>>>>> 1) Build the kernel on the host machine with LLVM enabled, for > >>>>>>>>> example, > >>>>>>>>> $ make menuconfig LLVM=3D1 > >>>>>>>>> Turn on AutoFDO build config: > >>>>>>>>> CONFIG_AUTOFDO_CLANG=3Dy > >>>>>>>>> With a configuration that has LLVM enabled, use the > >>>>>>>>> following > >>>>>>>>> command: > >>>>>>>>> scripts/config -e AUTOFDO_CLANG > >>>>>>>>> After getting the config, build with > >>>>>>>>> $ make LLVM=3D1 > >>>>>>>>> > >>>>>>>>> 2) Install the kernel on the test machine. > >>>>>>>>> > >>>>>>>>> 3) Run the load tests. The '-c' option in perf specifies the > >>>>>>>>> sample > >>>>>>>>> event period. We suggest using a suitable prime numbe= r, > >>>>>>>>> like 500009, for this purpose. > >>>>>>>>> For Intel platforms: > >>>>>>>>> $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b= -c > >>>>>>>>> \ > >>>>>>>>> -o -- > >>>>>>>>> For AMD platforms: > >>>>>>>>> The supported system are: Zen3 with BRS, or Zen4 with > >>>>>>>>> amd_lbr_v2 > >>>>>>>>> For Zen3: > >>>>>>>>> $ cat proc/cpuinfo | grep " brs" > >>>>>>>>> For Zen4: > >>>>>>>>> $ cat proc/cpuinfo | grep amd_lbr_v2 > >>>>>>>>> $ perf record --pfm-events > >>>>>>>>> RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k > >>>>>>>>> -a \ > >>>>>>>>> -N -b -c -o -- > >>>>>>>>> > >>>>>>>>> 4) (Optional) Download the raw perf file to the host machine. > >>>>>>>>> > >>>>>>>>> 5) To generate an AutoFDO profile, two offline tools are > >>>>>>>>> available: > >>>>>>>>> create_llvm_prof and llvm_profgen. The create_llvm_prof > >>>>>>>>> tool is part > >>>>>>>>> of the AutoFDO project and can be found on GitHub > >>>>>>>>> (https://github.com/google/autofdo), version v0.30.1 or > >>>>>>>>> later. The > >>>>>>>>> llvm_profgen tool is included in the LLVM compiler > >>>>>>>>> itself. It's > >>>>>>>>> important to note that the version of llvm_profgen > >>>>>>>>> doesn't need to > >>>>>>>>> match the version of Clang. It needs to be the LLVM 19 > >>>>>>>>> release or > >>>>>>>>> later, or from the LLVM trunk. > >>>>>>>>> $ llvm-profgen --kernel --binary=3D -- > >>>>>>>>> perfdata=3D \ > >>>>>>>>> -o > >>>>>>>>> or > >>>>>>>>> $ create_llvm_prof --binary=3D -- > >>>>>>>>> profile=3D \ > >>>>>>>>> --format=3Dextbinary --out=3D > >>>>>>>>> > >>>>>>>>> Note that multiple AutoFDO profile files can be merged > >>>>>>>>> into one via: > >>>>>>>>> $ llvm-profdata merge -o .= .. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> 6) Rebuild the kernel using the AutoFDO profile file with the > >>>>>>>>> same config > >>>>>>>>> as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled= ): > >>>>>>>>> $ make LLVM=3D1 CLANG_AUTOFDO_PROFILE=3D > >>>>>>>>> > >>>>>>>>> Co-developed-by: Han Shen > >>>>>>>>> Signed-off-by: Han Shen > >>>>>>>>> Signed-off-by: Rong Xu > >>>>>>>>> Suggested-by: Sriraman Tallam > >>>>>>>>> Suggested-by: Krzysztof Pszeniczny > >>>>>>>>> Suggested-by: Nick Desaulniers > >>>>>>>>> Suggested-by: Stephane Eranian > >>>>>>>>> Tested-by: Yonghong Song > >>>>>>>>> Tested-by: Yabin Cui > >>>>>>>>> Tested-by: Nathan Chancellor > >>>>>>>>> Reviewed-by: Kees Cook > >>>>>>>> Tested-by: Peter Jung > >>>>>>>> > >>>>>>> The compilations and testing with the "make pacman-pkg" function > >>>>>>> from > >>>>>>> the kernel worked fine. > >>>>>>> > >>>>>>> One problem I do face: > >>>>>>> When I apply a AutoFDO profile together with the PKGBUILD [1] fro= m > >>>>>>> archlinux im running into issues at "module_install" at the > >>>>>>> packaging. > >>>>>>> > >>>>>>> See following log: > >>>>>>> ``` > >>>>>>> make[2]: *** [scripts/Makefile.modinst:125: > >>>>>>> /tmp/makepkg/linux-cachyos-rc-autofdo/pkg/linux-cachyos-rc- > >>>>>>> autofdo/usr/lib/modules/6.12.0-rc5-5-cachyos-rc-autofdo/kernel/ > >>>>>>> arch/x86/kvm/kvm.ko] > >>>>>>> Error 1 > >>>>>>> make[2]: *** Deleting file > >>>>>>> '/tmp/makepkg/linux-cachyos-rc-autofdo/pkg/linux-cachyos-rc- > >>>>>>> autofdo/usr/lib/modules/6.12.0-rc5-5-cachyos-rc-autofdo/kernel/ > >>>>>>> arch/x86/kvm/kvm.ko' > >>>>>>> INSTALL > >>>>>>> /tmp/makepkg/linux-cachyos-rc-autofdo/pkg/linux-cachyos-rc- > >>>>>>> autofdo/usr/lib/modules/6.12.0-rc5-5-cachyos-rc-autofdo/kernel/ > >>>>>>> crypto/cryptd.ko > >>>>>>> make[2]: *** Waiting for unfinished jobs.... > >>>>>>> ``` > >>>>>>> > >>>>>>> > >>>>>>> This can be fixed with removed "INSTALL_MOD_STRIP=3D1" to the pas= sed > >>>>>>> parameters of module_install. > >>>>>>> > >>>>>>> This explicitly only happens, if a profile is passed - otherwise = the > >>>>>>> packaging works without problems. > >>>>>>> > >>>>>>> Regards, > >>>>>>> > >>>>>>> Peter Jung > >>>>>>> > >> > > >