From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D4DD1A2658 for ; Fri, 1 Nov 2024 20:08:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730491735; cv=none; b=NTtatRzgjbBLr5f7LhRRLQZ1hyt5xEK35rokx5J06wU+0ft8HX7GpzgjEeyl+1/XnvMjBgV/8ZTaWEiN2UDtQ5cEo7ZqM/x4/a3KawhVW9e1p8GY4cxzWr2IbMK3RWtpzo9LjWmwNbTjE+bGNUjjRNI/HallU0MqwsPzOLFR1BE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730491735; c=relaxed/simple; bh=C3bK9lq2Iat+ytyW1C1WkA7TwIFFMRn5QKSxbvhI+cU=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=nde5qJSHYe7GGW5rcyk0EIjgqGgdtOeneeDsla++M9f2TZKEpitUN3wCD+VTYdq471kTzbtrUd7ygga5G4BXt82L8zUjOkiEKbGk/SkwroKU1ntLTmQjMSATrQCdGhQPwxwWORYm2qY5HZK/k0589hphrmDBldcazxNY82IY+5o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=zXNwz71D; arc=none smtp.client-ip=209.85.160.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="zXNwz71D" Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-4608dddaa35so74751cf.0 for ; Fri, 01 Nov 2024 13:08:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730491732; x=1731096532; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JBhvgsyP68AaFdlci9462ArR+t5qoTnQQ8nehGqn+cY=; b=zXNwz71DBFYPz1sP5tfi2wFU+rcqCxWNhXVt2i2142V47hg70xrU6vguW7SsouXX7r Zyqc37ECf0g/6Ws31Gy/T9wCErubS/LXrbzp2jJUgp3CR0i6mZfArpC6O1rjG8q3FJxW d1TFY/HsqQF5DHQL1bLEMwC3ex1Al6xI2tqEKITA14oGphNGm90PA9JyNHHzv4VFc7Tz SCgcrLj0pGdRigYvgQJ5q0Pbo8vfGPkKmTxrXf68CypkgzJ40jRgbkudCemnIRzbGlSZ a6hVjNGCACYFOVkQXGWoD7rxzdpDHAwBtLf6l5KrVSXaoWx5WrEClQqgorYlL/SmrsUG d2QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730491732; x=1731096532; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JBhvgsyP68AaFdlci9462ArR+t5qoTnQQ8nehGqn+cY=; b=oT/c3ZWuxtRJVA3z9aGIIJTbL9eyJr+xvzJ0iXtIHlklSHYw5J3P8WJOnlUShft8pN 0h0NbIa6l5+l8q7H2dCREcwkK5V/whhdsmhMofPKaxlKnmBevSh2+3WyENcaVldSubM7 w3JBjH4RY7o2yhAn5NoCtzNRmgjP1jA2n19g/oGgfY7m+HG987VcNwNT3/TWTR+LAypI XJI/daYe+LC9QV8+gmYFf1uDH85OlzrJAgtleQfcKpTsmvKWGdmSnIk65KNRhjhlIYiE itmCFmgmlr9Vte1AkQKVk6NZKjiO0YTbqvd7qrr8C4ShWdhJTsB533qKVJfiaooGNiYU 8XQA== X-Forwarded-Encrypted: i=1; AJvYcCUhj134rcAjgaMecGqq5ouqtYrELJxfE30//fgaRu78OMNazC/oapUD6F699BFNTB9BVdA2FdSqrBw=@vger.kernel.org X-Gm-Message-State: AOJu0YwqBfXS52qWKGgXr46TXlluUQkQQ2ULSOa55hMk55lUht1dZ9Rg heT4uT7bG6A3LMnGG41D42FUE4SG/Ex/rtS52x/LlsbvS699WAS0kyAli5Fnvu/5YFjewNi7p/3 gr9+b/WTziLLJ5vhFrrUc4zEtUlUUoXlQcOI8 X-Gm-Gg: ASbGnctv9QivGpRerXN1tCN1OLaSx5V82W+mCjus2wNf2jjKnT58RoJUc7Ljy3YZ4ft 9oRFI8hDmBH2y/2E/xyrh7nqkH3uNXyU2Kqdsq7VfhRDEA6hFthQb8jx0hwtG X-Google-Smtp-Source: AGHT+IHLzbr5jZfQL/RFF11+CfvITxPQwh4PBiAl7YBrfB0o/NYh07B9/QUFW2fV1iv3quuB+T2eISwzjxXYEChmiRQ= X-Received: by 2002:a05:622a:5289:b0:460:77ac:8773 with SMTP id d75a77b69052e-462c60000d5mr609401cf.26.1730491732027; Fri, 01 Nov 2024 13:08:52 -0700 (PDT) Precedence: bulk X-Mailing-List: workflows@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20241023224409.201771-1-xur@google.com> <20241023224409.201771-2-xur@google.com> In-Reply-To: From: Rong Xu Date: Fri, 1 Nov 2024 13:08:38 -0700 Message-ID: Subject: Re: [PATCH v5 1/7] Add AutoFDO support for Clang build To: Masahiro Yamada Cc: Alice Ryhl , Andrew Morton , Arnd Bergmann , Bill Wendling , Borislav Petkov , Breno Leitao , Brian Gerst , Dave Hansen , David Li , Han Shen , Heiko Carstens , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Jonathan Corbet , Josh Poimboeuf , Juergen Gross , Justin Stitt , Kees Cook , "Mike Rapoport (IBM)" , Nathan Chancellor , Nick Desaulniers , Nicolas Schier , "Paul E. McKenney" , Peter Zijlstra , Sami Tolvanen , Thomas Gleixner , Wei Yang , workflows@vger.kernel.org, Miguel Ojeda , Maksim Panchenko , Yonghong Song , Yabin Cui , Krzysztof Pszeniczny , Sriraman Tallam , Stephane Eranian , x86@kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-kbuild@vger.kernel.org, linux-kernel@vger.kernel.org, llvm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Nov 1, 2024 at 11:02=E2=80=AFAM Masahiro Yamada wrote: > > On Thu, Oct 24, 2024 at 7:44=E2=80=AFAM Rong Xu wrote: > > > > Add the build support for using Clang's AutoFDO. Building the kernel > > with AutoFDO does not reduce the optimization level from the > > compiler. AutoFDO uses hardware sampling to gather information about > > the frequency of execution of different code paths within a binary. > > This information is then used to guide the compiler's optimization > > decisions, resulting in a more efficient binary. Experiments > > showed that the kernel can improve up to 10% in latency. > > > > The support requires a Clang compiler after LLVM 17. This submission > > is limited to x86 platforms that support PMU features like LBR on > > Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1, > > and BRBE on ARM 1 is part of planned future work. > > > > Here is an example workflow for AutoFDO kernel: > > > > 1) Build the kernel on the host machine with LLVM enabled, for example, > > $ make menuconfig LLVM=3D1 > > Turn on AutoFDO build config: > > CONFIG_AUTOFDO_CLANG=3Dy > > With a configuration that has LLVM enabled, use the following > > command: > > scripts/config -e AUTOFDO_CLANG > > After getting the config, build with > > $ make LLVM=3D1 > > > > 2) Install the kernel on the test machine. > > > > 3) Run the load tests. The '-c' option in perf specifies the sample > > event period. We suggest using a suitable prime number, > > like 500009, for this purpose. > > For Intel platforms: > > $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c = \ > > -o -- > > For AMD platforms: > > The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 > > For Zen3: > > $ cat proc/cpuinfo | grep " brs" > > For Zen4: > > $ cat proc/cpuinfo | grep amd_lbr_v2 > > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a= \ > > -N -b -c -o -- > > > > 4) (Optional) Download the raw perf file to the host machine. > > > > 5) To generate an AutoFDO profile, two offline tools are available: > > create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part > > of the AutoFDO project and can be found on GitHub > > (https://github.com/google/autofdo), version v0.30.1 or later. The > > llvm_profgen tool is included in the LLVM compiler itself. It's > > important to note that the version of llvm_profgen doesn't need to > > match the version of Clang. It needs to be the LLVM 19 release or > > later, or from the LLVM trunk. > > $ llvm-profgen --kernel --binary=3D --perfdata=3D \ > > -o > > or > > $ create_llvm_prof --binary=3D --profile=3D \ > > --format=3Dextbinary --out=3D > > > > Note that multiple AutoFDO profile files can be merged into one via: > > $ llvm-profdata merge -o ... > > > > 6) Rebuild the kernel using the AutoFDO profile file with the same conf= ig > > as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled): > > $ make LLVM=3D1 CLANG_AUTOFDO_PROFILE=3D > > > > Co-developed-by: Han Shen > > Signed-off-by: Han Shen > > Signed-off-by: Rong Xu > > Suggested-by: Sriraman Tallam > > Suggested-by: Krzysztof Pszeniczny > > Suggested-by: Nick Desaulniers > > Suggested-by: Stephane Eranian > > Tested-by: Yonghong Song > > > > > > +Workflow > > +=3D=3D=3D=3D=3D=3D=3D=3D > > + > > +Here is an example workflow for AutoFDO kernel: > > + > > +1) Build the kernel on the host machine with LLVM enabled, > > + for example, :: > > + > > + $ make menuconfig LLVM=3D1 > > + > > + Turn on AutoFDO build config:: > > + > > + CONFIG_AUTOFDO_CLANG=3Dy > > + > > + With a configuration that with LLVM enabled, use the following com= mand:: > > + > > + $ scripts/config -e AUTOFDO_CLANG > > + > > + After getting the config, build with :: > > + > > + $ make LLVM=3D1 > > + > > +2) Install the kernel on the test machine. > > + > > +3) Run the load tests. The '-c' option in perf specifies the sample > > + event period. We suggest using a suitable prime number, like 500009= , > > + for this purpose. > > + > > + - For Intel platforms:: > > + > > + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c -o -- > > + > > + - For AMD platforms:: > > I am not sure if this double-colon is needed > when the next line is not code. Thanks for catching this. We don't mean to use "::" here. It should be ":" and there is supposed to be a blank line after this. Also a blank line before "For Zen3::". I will fix this in the patch. > > > > > + The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2= . To check, > > + For Zen3:: > > + > > + $ cat proc/cpuinfo | grep " brs" > > + > > + For Zen4:: > > + > > + $ cat proc/cpuinfo | grep amd_lbr_v2 > > + > > + The following command generated the perf data file:: > > + > > + $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -= a -N -b -c -o -- > > + > > +4) (Optional) Download the raw perf file to the host machine. > > + > > +5) To generate an AutoFDO profile, two offline tools are available: > > + create_llvm_prof and llvm_profgen. The create_llvm_prof tool is par= t > > + of the AutoFDO project and can be found on GitHub > > + (https://github.com/google/autofdo), version v0.30.1 or later. > > + The llvm_profgen tool is included in the LLVM compiler itself. It's > > + important to note that the version of llvm_profgen doesn't need to = match > > + the version of Clang. It needs to be the LLVM 19 release of Clang > > + or later, or just from the LLVM trunk. :: > > + > > + $ llvm-profgen --kernel --binary=3D --perfdata=3D -o > > + > > + or :: > > + > > + $ create_llvm_prof --binary=3D --profile=3D = --format=3Dextbinary --out=3D > > + > > + Note that multiple AutoFDO profile files can be merged into one via= :: > > + > > + $ llvm-profdata merge -o = ... > > + > > +6) Rebuild the kernel using the AutoFDO profile file with the same con= fig as step 1, > > + (Note CONFIG_AUTOFDO_CLANG needs to be enabled):: > > + > > + $ make LLVM=3D1 CLANG_AUTOFDO_PROFILE=3D > > + > > Trailing blank line. > > .git/rebase-apply/patch:187: new blank line at EOF. Will remote the blank line. > > > > > > -- > Best Regards > Masahiro Yamada