* [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
@ 2024-11-18 22:25 Yabin Cui
2024-11-18 23:49 ` Rong Xu
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Yabin Cui @ 2024-11-18 22:25 UTC (permalink / raw)
To: Rong Xu, Han Shen, Jonathan Corbet, Catalin Marinas, Will Deacon,
Masahiro Yamada, Kees Cook, Nick Desaulniers, workflows,
linux-doc, linux-kernel, linux-arm-kernel
Cc: Yabin Cui
Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
selected.
On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
Experiments on Android show 4% improvement in cold app startup time
and 13% improvement in binder benchmarks.
Signed-off-by: Yabin Cui <yabinc@google.com>
---
Change-Logs in V2:
1. Use "For ARM platforms with ETM trace" in autofdo.rst.
2. Create an issue and a change to use extbinary format in instructions:
https://github.com/Linaro/OpenCSD/issues/65
https://android-review.googlesource.com/c/platform/system/extras/+/3362107
Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
arch/arm64/Kconfig | 1 +
2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
index 1f0a451e9ccd..a890e84a2fdd 100644
--- a/Documentation/dev-tools/autofdo.rst
+++ b/Documentation/dev-tools/autofdo.rst
@@ -55,7 +55,7 @@ process consists of the following steps:
workload to gather execution frequency data. This data is
collected using hardware sampling, via perf. AutoFDO is most
effective on platforms supporting advanced PMU features like
- LBR on Intel machines.
+ LBR on Intel machines, ETM traces on ARM machines.
#. AutoFDO profile generation: Perf output file is converted to
the AutoFDO profile via offline tools.
@@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
+ - For ARM platforms with ETM trace:
+
+ Follow the instructions in the `Linaro OpenCSD document
+ https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
+ to record ETM traces for AutoFDO::
+
+ $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>
+ $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
+
+ For ARM platforms running Android, follow the instructions in the
+ `Android simpleperf document
+ <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
+ to record ETM traces for AutoFDO::
+
+ $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
+
4) (Optional) Download the raw perf file to the host machine.
5) To generate an AutoFDO profile, two offline tools are available:
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd9df6dcc593..c3814df5e391 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -103,6 +103,7 @@ config ARM64
select ARCH_SUPPORTS_PER_VMA_LOCK
select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
select ARCH_SUPPORTS_RT
+ select ARCH_SUPPORTS_AUTOFDO_CLANG
select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
select ARCH_WANT_DEFAULT_BPF_JIT
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
2024-11-18 22:25 [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected Yabin Cui
@ 2024-11-18 23:49 ` Rong Xu
2024-11-20 0:04 ` Yabin Cui
2024-11-20 17:59 ` Kees Cook
2024-12-09 16:20 ` Will Deacon
2 siblings, 1 reply; 10+ messages in thread
From: Rong Xu @ 2024-11-18 23:49 UTC (permalink / raw)
To: Yabin Cui
Cc: Han Shen, Jonathan Corbet, Catalin Marinas, Will Deacon,
Masahiro Yamada, Kees Cook, Nick Desaulniers, workflows,
linux-doc, linux-kernel, linux-arm-kernel
This patch looks good to me.
I assume the profile format change in the Android doc will be submitted soon.
Since "extbinary" is a superset of "binary", using the "extbinary"
format profile
in Android shouldn't cause any compatibility issues.
Reviewed-by: Rong Xu <xur.google.com>
-Rong
On Mon, Nov 18, 2024 at 2:25 PM Yabin Cui <yabinc@google.com> wrote:
>
> Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> selected.
>
> On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> Experiments on Android show 4% improvement in cold app startup time
> and 13% improvement in binder benchmarks.
>
> Signed-off-by: Yabin Cui <yabinc@google.com>
> ---
>
> Change-Logs in V2:
>
> 1. Use "For ARM platforms with ETM trace" in autofdo.rst.
> 2. Create an issue and a change to use extbinary format in instructions:
> https://github.com/Linaro/OpenCSD/issues/65
> https://android-review.googlesource.com/c/platform/system/extras/+/3362107
>
> Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
> arch/arm64/Kconfig | 1 +
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
> index 1f0a451e9ccd..a890e84a2fdd 100644
> --- a/Documentation/dev-tools/autofdo.rst
> +++ b/Documentation/dev-tools/autofdo.rst
> @@ -55,7 +55,7 @@ process consists of the following steps:
> workload to gather execution frequency data. This data is
> collected using hardware sampling, via perf. AutoFDO is most
> effective on platforms supporting advanced PMU features like
> - LBR on Intel machines.
> + LBR on Intel machines, ETM traces on ARM machines.
>
> #. AutoFDO profile generation: Perf output file is converted to
> the AutoFDO profile via offline tools.
> @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
>
> $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
>
> + - For ARM platforms with ETM trace:
> +
> + Follow the instructions in the `Linaro OpenCSD document
> + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
> + to record ETM traces for AutoFDO::
> +
> + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>
> + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
> +
> + For ARM platforms running Android, follow the instructions in the
> + `Android simpleperf document
> + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
> + to record ETM traces for AutoFDO::
> +
> + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
> +
> 4) (Optional) Download the raw perf file to the host machine.
>
> 5) To generate an AutoFDO profile, two offline tools are available:
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fd9df6dcc593..c3814df5e391 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -103,6 +103,7 @@ config ARM64
> select ARCH_SUPPORTS_PER_VMA_LOCK
> select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> select ARCH_SUPPORTS_RT
> + select ARCH_SUPPORTS_AUTOFDO_CLANG
> select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> select ARCH_WANT_DEFAULT_BPF_JIT
> --
> 2.47.0.338.g60cca15819-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
2024-11-18 23:49 ` Rong Xu
@ 2024-11-20 0:04 ` Yabin Cui
2024-11-20 15:54 ` George Burgess
0 siblings, 1 reply; 10+ messages in thread
From: Yabin Cui @ 2024-11-20 0:04 UTC (permalink / raw)
To: Rong Xu
Cc: Han Shen, Jonathan Corbet, Catalin Marinas, Will Deacon,
Masahiro Yamada, Kees Cook, Nick Desaulniers, workflows,
linux-doc, linux-kernel, linux-arm-kernel, George Burgess
Add George from ChromeOS.
On Mon, Nov 18, 2024 at 3:49 PM Rong Xu <xur@google.com> wrote:
>
> This patch looks good to me.
>
> I assume the profile format change in the Android doc will be submitted soon.
> Since "extbinary" is a superset of "binary", using the "extbinary"
> format profile
> in Android shouldn't cause any compatibility issues.
>
> Reviewed-by: Rong Xu <xur.google.com>
>
> -Rong
>
> On Mon, Nov 18, 2024 at 2:25 PM Yabin Cui <yabinc@google.com> wrote:
> >
> > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> > selected.
> >
> > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> > Experiments on Android show 4% improvement in cold app startup time
> > and 13% improvement in binder benchmarks.
> >
> > Signed-off-by: Yabin Cui <yabinc@google.com>
> > ---
> >
> > Change-Logs in V2:
> >
> > 1. Use "For ARM platforms with ETM trace" in autofdo.rst.
> > 2. Create an issue and a change to use extbinary format in instructions:
> > https://github.com/Linaro/OpenCSD/issues/65
> > https://android-review.googlesource.com/c/platform/system/extras/+/3362107
> >
> > Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
> > arch/arm64/Kconfig | 1 +
> > 2 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
> > index 1f0a451e9ccd..a890e84a2fdd 100644
> > --- a/Documentation/dev-tools/autofdo.rst
> > +++ b/Documentation/dev-tools/autofdo.rst
> > @@ -55,7 +55,7 @@ process consists of the following steps:
> > workload to gather execution frequency data. This data is
> > collected using hardware sampling, via perf. AutoFDO is most
> > effective on platforms supporting advanced PMU features like
> > - LBR on Intel machines.
> > + LBR on Intel machines, ETM traces on ARM machines.
> >
> > #. AutoFDO profile generation: Perf output file is converted to
> > the AutoFDO profile via offline tools.
> > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
> >
> > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> >
> > + - For ARM platforms with ETM trace:
> > +
> > + Follow the instructions in the `Linaro OpenCSD document
> > + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
> > + to record ETM traces for AutoFDO::
> > +
> > + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>
> > + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
> > +
> > + For ARM platforms running Android, follow the instructions in the
> > + `Android simpleperf document
> > + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
> > + to record ETM traces for AutoFDO::
> > +
> > + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
> > +
> > 4) (Optional) Download the raw perf file to the host machine.
> >
> > 5) To generate an AutoFDO profile, two offline tools are available:
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index fd9df6dcc593..c3814df5e391 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -103,6 +103,7 @@ config ARM64
> > select ARCH_SUPPORTS_PER_VMA_LOCK
> > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> > select ARCH_SUPPORTS_RT
> > + select ARCH_SUPPORTS_AUTOFDO_CLANG
> > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> > select ARCH_WANT_DEFAULT_BPF_JIT
> > --
> > 2.47.0.338.g60cca15819-goog
> >
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
2024-11-20 0:04 ` Yabin Cui
@ 2024-11-20 15:54 ` George Burgess
0 siblings, 0 replies; 10+ messages in thread
From: George Burgess @ 2024-11-20 15:54 UTC (permalink / raw)
To: Yabin Cui
Cc: Rong Xu, Han Shen, Jonathan Corbet, Catalin Marinas, Will Deacon,
Masahiro Yamada, Kees Cook, Nick Desaulniers, workflows,
linux-doc, linux-kernel, linux-arm-kernel
We've used ETM in ChromeOS for a while now. Hardware
requirements make it unfortunately less ubiquitous than LBR, but:
- we first launched it on 5.15,
- it's still humming along nicely today on 6.6, so:
Tested-by: George Burgess IV <gbiv@google.com>
IIRC, with a baseline of "using x86_64 AFDO profiles on ARM kernels,"
we saw a perf win on the order of a few (3? 4?) percentage points when
we made the switch.
On Tue, Nov 19, 2024 at 5:04 PM Yabin Cui <yabinc@google.com> wrote:
>
> Add George from ChromeOS.
>
> On Mon, Nov 18, 2024 at 3:49 PM Rong Xu <xur@google.com> wrote:
> >
> > This patch looks good to me.
> >
> > I assume the profile format change in the Android doc will be submitted soon.
> > Since "extbinary" is a superset of "binary", using the "extbinary"
> > format profile
> > in Android shouldn't cause any compatibility issues.
> >
> > Reviewed-by: Rong Xu <xur.google.com>
> >
> > -Rong
> >
> > On Mon, Nov 18, 2024 at 2:25 PM Yabin Cui <yabinc@google.com> wrote:
> > >
> > > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> > > selected.
> > >
> > > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> > > Experiments on Android show 4% improvement in cold app startup time
> > > and 13% improvement in binder benchmarks.
> > >
> > > Signed-off-by: Yabin Cui <yabinc@google.com>
> > > ---
> > >
> > > Change-Logs in V2:
> > >
> > > 1. Use "For ARM platforms with ETM trace" in autofdo.rst.
> > > 2. Create an issue and a change to use extbinary format in instructions:
> > > https://github.com/Linaro/OpenCSD/issues/65
> > > https://android-review.googlesource.com/c/platform/system/extras/+/3362107
> > >
> > > Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
> > > arch/arm64/Kconfig | 1 +
> > > 2 files changed, 18 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
> > > index 1f0a451e9ccd..a890e84a2fdd 100644
> > > --- a/Documentation/dev-tools/autofdo.rst
> > > +++ b/Documentation/dev-tools/autofdo.rst
> > > @@ -55,7 +55,7 @@ process consists of the following steps:
> > > workload to gather execution frequency data. This data is
> > > collected using hardware sampling, via perf. AutoFDO is most
> > > effective on platforms supporting advanced PMU features like
> > > - LBR on Intel machines.
> > > + LBR on Intel machines, ETM traces on ARM machines.
> > >
> > > #. AutoFDO profile generation: Perf output file is converted to
> > > the AutoFDO profile via offline tools.
> > > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
> > >
> > > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> > >
> > > + - For ARM platforms with ETM trace:
> > > +
> > > + Follow the instructions in the `Linaro OpenCSD document
> > > + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
> > > + to record ETM traces for AutoFDO::
> > > +
> > > + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>
FWIW, CrOS spells the event 'cs_etm/autofdo/u'.
I'm not familiar enough with perf event syntax (or downstream patches
that CrOS has to its kernel) to say whether that should motivate a
change here. Happy to find out more if there's interest.
> > > + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
> > > +
> > > + For ARM platforms running Android, follow the instructions in the
> > > + `Android simpleperf document
> > > + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
> > > + to record ETM traces for AutoFDO::
> > > +
> > > + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
> > > +
> > > 4) (Optional) Download the raw perf file to the host machine.
> > >
> > > 5) To generate an AutoFDO profile, two offline tools are available:
> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > index fd9df6dcc593..c3814df5e391 100644
> > > --- a/arch/arm64/Kconfig
> > > +++ b/arch/arm64/Kconfig
> > > @@ -103,6 +103,7 @@ config ARM64
> > > select ARCH_SUPPORTS_PER_VMA_LOCK
> > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> > > select ARCH_SUPPORTS_RT
> > > + select ARCH_SUPPORTS_AUTOFDO_CLANG
> > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> > > select ARCH_WANT_DEFAULT_BPF_JIT
> > > --
> > > 2.47.0.338.g60cca15819-goog
> > >
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
2024-11-18 22:25 [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected Yabin Cui
2024-11-18 23:49 ` Rong Xu
@ 2024-11-20 17:59 ` Kees Cook
2024-12-09 16:20 ` Will Deacon
2 siblings, 0 replies; 10+ messages in thread
From: Kees Cook @ 2024-11-20 17:59 UTC (permalink / raw)
To: Yabin Cui
Cc: Rong Xu, Han Shen, Jonathan Corbet, Catalin Marinas, Will Deacon,
Masahiro Yamada, Nick Desaulniers, workflows, linux-doc,
linux-kernel, linux-arm-kernel
On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote:
> Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> selected.
>
> On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> Experiments on Android show 4% improvement in cold app startup time
> and 13% improvement in binder benchmarks.
>
> Signed-off-by: Yabin Cui <yabinc@google.com>
This looks trivial enough to enable. ;) I expect this could go via the
kbuild tree (Masahiro) with an arm64 maintainer Ack.
FWIW:
Reviewed-by: Kees Cook <kees@kernel.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
2024-11-18 22:25 [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected Yabin Cui
2024-11-18 23:49 ` Rong Xu
2024-11-20 17:59 ` Kees Cook
@ 2024-12-09 16:20 ` Will Deacon
2024-12-09 17:30 ` Rong Xu
2 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2024-12-09 16:20 UTC (permalink / raw)
To: Yabin Cui
Cc: Rong Xu, Han Shen, Jonathan Corbet, Catalin Marinas,
Masahiro Yamada, Kees Cook, Nick Desaulniers, workflows,
linux-doc, linux-kernel, linux-arm-kernel
On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote:
> Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> selected.
>
> On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> Experiments on Android show 4% improvement in cold app startup time
> and 13% improvement in binder benchmarks.
>
> Signed-off-by: Yabin Cui <yabinc@google.com>
> ---
>
> Change-Logs in V2:
>
> 1. Use "For ARM platforms with ETM trace" in autofdo.rst.
> 2. Create an issue and a change to use extbinary format in instructions:
> https://github.com/Linaro/OpenCSD/issues/65
> https://android-review.googlesource.com/c/platform/system/extras/+/3362107
>
> Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
> arch/arm64/Kconfig | 1 +
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
> index 1f0a451e9ccd..a890e84a2fdd 100644
> --- a/Documentation/dev-tools/autofdo.rst
> +++ b/Documentation/dev-tools/autofdo.rst
> @@ -55,7 +55,7 @@ process consists of the following steps:
> workload to gather execution frequency data. This data is
> collected using hardware sampling, via perf. AutoFDO is most
> effective on platforms supporting advanced PMU features like
> - LBR on Intel machines.
> + LBR on Intel machines, ETM traces on ARM machines.
>
> #. AutoFDO profile generation: Perf output file is converted to
> the AutoFDO profile via offline tools.
> @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
>
> $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
>
> + - For ARM platforms with ETM trace:
> +
> + Follow the instructions in the `Linaro OpenCSD document
> + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
> + to record ETM traces for AutoFDO::
> +
> + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>
> + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
> +
> + For ARM platforms running Android, follow the instructions in the
> + `Android simpleperf document
> + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
> + to record ETM traces for AutoFDO::
> +
> + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
> +
> 4) (Optional) Download the raw perf file to the host machine.
>
> 5) To generate an AutoFDO profile, two offline tools are available:
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fd9df6dcc593..c3814df5e391 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -103,6 +103,7 @@ config ARM64
> select ARCH_SUPPORTS_PER_VMA_LOCK
> select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> select ARCH_SUPPORTS_RT
> + select ARCH_SUPPORTS_AUTOFDO_CLANG
> select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> select ARCH_WANT_DEFAULT_BPF_JIT
After this change, both arm64 and x86 select this option unconditionally
and with no apparent support code being added. So what is actually
required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't
it just available for all architectures instead?
Will
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
2024-12-09 16:20 ` Will Deacon
@ 2024-12-09 17:30 ` Rong Xu
2024-12-09 18:56 ` Will Deacon
0 siblings, 1 reply; 10+ messages in thread
From: Rong Xu @ 2024-12-09 17:30 UTC (permalink / raw)
To: Will Deacon
Cc: Yabin Cui, Han Shen, Jonathan Corbet, Catalin Marinas,
Masahiro Yamada, Kees Cook, Nick Desaulniers, workflows,
linux-doc, linux-kernel, linux-arm-kernel
Enabling an AutoFDO build requires users to explicitly set CONFIG_AUTOFDO_CLANG.
The support code is in Commit 315ad8780a129e82 (kbuild: Add AutoFDO
support for Clang build).
The CONFIG_AUTOFDO_CLANG config, even if selected by the user, will
not be enabled
unless ARCH_SUPPORTS_AUTOFDO_CLANG is present.
We are not enabling this for all architectures because AutoFDO's optimized build
relies on Last Branch Records (LBR) which aren't available on all architectures.
-Rong
On Mon, Dec 9, 2024 at 8:20 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote:
> > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be
> > selected.
> >
> > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles.
> > Experiments on Android show 4% improvement in cold app startup time
> > and 13% improvement in binder benchmarks.
> >
> > Signed-off-by: Yabin Cui <yabinc@google.com>
> > ---
> >
> > Change-Logs in V2:
> >
> > 1. Use "For ARM platforms with ETM trace" in autofdo.rst.
> > 2. Create an issue and a change to use extbinary format in instructions:
> > https://github.com/Linaro/OpenCSD/issues/65
> > https://android-review.googlesource.com/c/platform/system/extras/+/3362107
> >
> > Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++-
> > arch/arm64/Kconfig | 1 +
> > 2 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst
> > index 1f0a451e9ccd..a890e84a2fdd 100644
> > --- a/Documentation/dev-tools/autofdo.rst
> > +++ b/Documentation/dev-tools/autofdo.rst
> > @@ -55,7 +55,7 @@ process consists of the following steps:
> > workload to gather execution frequency data. This data is
> > collected using hardware sampling, via perf. AutoFDO is most
> > effective on platforms supporting advanced PMU features like
> > - LBR on Intel machines.
> > + LBR on Intel machines, ETM traces on ARM machines.
> >
> > #. AutoFDO profile generation: Perf output file is converted to
> > the AutoFDO profile via offline tools.
> > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel:
> >
> > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
> >
> > + - For ARM platforms with ETM trace:
> > +
> > + Follow the instructions in the `Linaro OpenCSD document
> > + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_
> > + to record ETM traces for AutoFDO::
> > +
> > + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest>
> > + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il
> > +
> > + For ARM platforms running Android, follow the instructions in the
> > + `Android simpleperf document
> > + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_
> > + to record ETM traces for AutoFDO::
> > +
> > + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest>
> > +
> > 4) (Optional) Download the raw perf file to the host machine.
> >
> > 5) To generate an AutoFDO profile, two offline tools are available:
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index fd9df6dcc593..c3814df5e391 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -103,6 +103,7 @@ config ARM64
> > select ARCH_SUPPORTS_PER_VMA_LOCK
> > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> > select ARCH_SUPPORTS_RT
> > + select ARCH_SUPPORTS_AUTOFDO_CLANG
> > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> > select ARCH_WANT_DEFAULT_BPF_JIT
>
> After this change, both arm64 and x86 select this option unconditionally
> and with no apparent support code being added. So what is actually
> required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't
> it just available for all architectures instead?
>
> Will
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
2024-12-09 17:30 ` Rong Xu
@ 2024-12-09 18:56 ` Will Deacon
2024-12-09 23:51 ` Yabin Cui
0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2024-12-09 18:56 UTC (permalink / raw)
To: Rong Xu
Cc: Yabin Cui, Han Shen, Jonathan Corbet, Catalin Marinas,
Masahiro Yamada, Kees Cook, Nick Desaulniers, workflows,
linux-doc, linux-kernel, linux-arm-kernel
(Aside: please try to avoid top-posting on the public lists as it messes up
the flow of conversation; I'll try to piece this back together.)
On Mon, Dec 09, 2024 at 09:30:50AM -0800, Rong Xu wrote:
> On Mon, Dec 9, 2024 at 8:20 AM Will Deacon <will@kernel.org> wrote:
> > On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote:
> > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > index fd9df6dcc593..c3814df5e391 100644
> > > --- a/arch/arm64/Kconfig
> > > +++ b/arch/arm64/Kconfig
> > > @@ -103,6 +103,7 @@ config ARM64
> > > select ARCH_SUPPORTS_PER_VMA_LOCK
> > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> > > select ARCH_SUPPORTS_RT
> > > + select ARCH_SUPPORTS_AUTOFDO_CLANG
> > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> > > select ARCH_WANT_DEFAULT_BPF_JIT
> >
> > After this change, both arm64 and x86 select this option unconditionally
> > and with no apparent support code being added. So what is actually
> > required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't
> > it just available for all architectures instead?
> Enabling an AutoFDO build requires users to explicitly set CONFIG_AUTOFDO_CLANG.
> The support code is in Commit 315ad8780a129e82 (kbuild: Add AutoFDO
> support for Clang build).
Yes, that is precisely my point. The user has to enable
CONFIG_AUTOFDO_CLANG anyway, so what is the point in having
ARCH_SUPPORTS_AUTOFDO_CLANG. Why would an architecture _not_ want to
select that?
> We are not enabling this for all architectures because AutoFDO's optimized build
> relies on Last Branch Records (LBR) which aren't available on all architectures.
So? ETM isn't available on all arm64 machines and I doubt whether LBR is
available on _all_ x86 machines either. So there's a runtime failure
mode that needs to be handled anyway and I don't think the arch-specific
Kconfig option is really doing anything useful.
Will
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
2024-12-09 18:56 ` Will Deacon
@ 2024-12-09 23:51 ` Yabin Cui
2024-12-10 11:31 ` Will Deacon
0 siblings, 1 reply; 10+ messages in thread
From: Yabin Cui @ 2024-12-09 23:51 UTC (permalink / raw)
To: Will Deacon
Cc: Rong Xu, Han Shen, Jonathan Corbet, Catalin Marinas,
Masahiro Yamada, Kees Cook, Nick Desaulniers, workflows,
linux-doc, linux-kernel, linux-arm-kernel
On Mon, Dec 9, 2024 at 10:56 AM Will Deacon <will@kernel.org> wrote:
>
> (Aside: please try to avoid top-posting on the public lists as it messes up
> the flow of conversation; I'll try to piece this back together.)
>
> On Mon, Dec 09, 2024 at 09:30:50AM -0800, Rong Xu wrote:
> > On Mon, Dec 9, 2024 at 8:20 AM Will Deacon <will@kernel.org> wrote:
> > > On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote:
> > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > index fd9df6dcc593..c3814df5e391 100644
> > > > --- a/arch/arm64/Kconfig
> > > > +++ b/arch/arm64/Kconfig
> > > > @@ -103,6 +103,7 @@ config ARM64
> > > > select ARCH_SUPPORTS_PER_VMA_LOCK
> > > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> > > > select ARCH_SUPPORTS_RT
> > > > + select ARCH_SUPPORTS_AUTOFDO_CLANG
> > > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> > > > select ARCH_WANT_DEFAULT_BPF_JIT
> > >
> > > After this change, both arm64 and x86 select this option unconditionally
> > > and with no apparent support code being added. So what is actually
> > > required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't
> > > it just available for all architectures instead?
I think it's similar to ARCH_SUPPORTS_LTO_CLANG, which also doesn't need any
support code but requires testing to ensure it works on a specific architecture.
>
> > Enabling an AutoFDO build requires users to explicitly set CONFIG_AUTOFDO_CLANG.
> > The support code is in Commit 315ad8780a129e82 (kbuild: Add AutoFDO
> > support for Clang build).
>
> Yes, that is precisely my point. The user has to enable
> CONFIG_AUTOFDO_CLANG anyway, so what is the point in having
> ARCH_SUPPORTS_AUTOFDO_CLANG. Why would an architecture _not_ want to
> select that?
>
> > We are not enabling this for all architectures because AutoFDO's optimized build
> > relies on Last Branch Records (LBR) which aren't available on all architectures.
>
> So? ETM isn't available on all arm64 machines and I doubt whether LBR is
> available on _all_ x86 machines either. So there's a runtime failure
> mode that needs to be handled anyway and I don't think the arch-specific
> Kconfig option is really doing anything useful.
My understanding of the benefits of ARCH_SUPPORTS_AUTOFDO_CLANG is:
1. Generally, we don't prefer to collect an AutoFDO profile on one
architecture and use it to build the kernel for another architecture.
This is because the profile misses data for architecture-dependent
code. ARCH_SUPPORTS_AUTOFDO_CLANG can partially prevent this from
happening.
2. Building a kernel with an AutoFDO profile involves using new
optimization flags for clang. Having ARCH_SUPPORTS_AUTOFDO_CLANG=y
for one architecture means someone has tested building a kernel with
an AutoFDO profile on this architecture.
>
> Will
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected
2024-12-09 23:51 ` Yabin Cui
@ 2024-12-10 11:31 ` Will Deacon
0 siblings, 0 replies; 10+ messages in thread
From: Will Deacon @ 2024-12-10 11:31 UTC (permalink / raw)
To: Yabin Cui
Cc: Rong Xu, Han Shen, Jonathan Corbet, Catalin Marinas,
Masahiro Yamada, Kees Cook, Nick Desaulniers, workflows,
linux-doc, linux-kernel, linux-arm-kernel
On Mon, Dec 09, 2024 at 03:51:34PM -0800, Yabin Cui wrote:
> On Mon, Dec 9, 2024 at 10:56 AM Will Deacon <will@kernel.org> wrote:
> >
> > (Aside: please try to avoid top-posting on the public lists as it messes up
> > the flow of conversation; I'll try to piece this back together.)
> >
> > On Mon, Dec 09, 2024 at 09:30:50AM -0800, Rong Xu wrote:
> > > On Mon, Dec 9, 2024 at 8:20 AM Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote:
> > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > > index fd9df6dcc593..c3814df5e391 100644
> > > > > --- a/arch/arm64/Kconfig
> > > > > +++ b/arch/arm64/Kconfig
> > > > > @@ -103,6 +103,7 @@ config ARM64
> > > > > select ARCH_SUPPORTS_PER_VMA_LOCK
> > > > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE
> > > > > select ARCH_SUPPORTS_RT
> > > > > + select ARCH_SUPPORTS_AUTOFDO_CLANG
> > > > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
> > > > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
> > > > > select ARCH_WANT_DEFAULT_BPF_JIT
> > > >
> > > > After this change, both arm64 and x86 select this option unconditionally
> > > > and with no apparent support code being added. So what is actually
> > > > required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't
> > > > it just available for all architectures instead?
>
> I think it's similar to ARCH_SUPPORTS_LTO_CLANG, which also doesn't need any
> support code but requires testing to ensure it works on a specific architecture.
>
> >
> > > Enabling an AutoFDO build requires users to explicitly set CONFIG_AUTOFDO_CLANG.
> > > The support code is in Commit 315ad8780a129e82 (kbuild: Add AutoFDO
> > > support for Clang build).
> >
> > Yes, that is precisely my point. The user has to enable
> > CONFIG_AUTOFDO_CLANG anyway, so what is the point in having
> > ARCH_SUPPORTS_AUTOFDO_CLANG. Why would an architecture _not_ want to
> > select that?
> >
> > > We are not enabling this for all architectures because AutoFDO's optimized build
> > > relies on Last Branch Records (LBR) which aren't available on all architectures.
> >
> > So? ETM isn't available on all arm64 machines and I doubt whether LBR is
> > available on _all_ x86 machines either. So there's a runtime failure
> > mode that needs to be handled anyway and I don't think the arch-specific
> > Kconfig option is really doing anything useful.
>
> My understanding of the benefits of ARCH_SUPPORTS_AUTOFDO_CLANG is:
> 1. Generally, we don't prefer to collect an AutoFDO profile on one
> architecture and use it to build the kernel for another architecture.
> This is because the profile misses data for architecture-dependent
> code. ARCH_SUPPORTS_AUTOFDO_CLANG can partially prevent this from
> happening.
Hmm, not really. Once more than one architecture selects the option, you
have the possibility of the mismatch you're trying to avoid.
> 2. Building a kernel with an AutoFDO profile involves using new
> optimization flags for clang. Having ARCH_SUPPORTS_AUTOFDO_CLANG=y
> for one architecture means someone has tested building a kernel with
> an AutoFDO profile on this architecture.
On the flip side, allowing all architectures to select the option
actually increases your test coverage.
Will
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-12-10 11:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-18 22:25 [PATCH v2] arm64: Allow CONFIG_AUTOFDO_CLANG to be selected Yabin Cui
2024-11-18 23:49 ` Rong Xu
2024-11-20 0:04 ` Yabin Cui
2024-11-20 15:54 ` George Burgess
2024-11-20 17:59 ` Kees Cook
2024-12-09 16:20 ` Will Deacon
2024-12-09 17:30 ` Rong Xu
2024-12-09 18:56 ` Will Deacon
2024-12-09 23:51 ` Yabin Cui
2024-12-10 11:31 ` Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox