From: Ian Rogers <irogers@google.com>
To: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: kernel test robot <oliver.sang@intel.com>,
oe-lkp@lists.linux.dev, lkp@intel.com,
Linux Memory Management List <linux-mm@kvack.org>,
Namhyung Kim <namhyung@kernel.org>,
Weilin Wang <weilin.wang@intel.com>,
Caleb Biggers <caleb.biggers@intel.com>,
Alexandre Torgue <alexandre.torgue@foss.st.com>,
Maxime Coquelin <mcoquelin.stm32@gmail.com>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
Date: Mon, 15 Jul 2024 13:11:01 -0700 [thread overview]
Message-ID: <CAP-5=fUqGcnGvB71jHHTecLqcky6+TrFo+hWb=eBxZjxfe_m-g@mail.gmail.com> (raw)
In-Reply-To: <ec744c86-b73e-417a-8e3a-c07142bf37d1@linux.intel.com>
On Mon, Jul 15, 2024 at 1:05 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
> Hi Ian,
>
> On 2024-07-10 12:59 a.m., kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
> >
> > commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
> >
> > in testcase: perf-sanity-tests
> > version:
> > with following parameters:
> >
> > perf_compiler: gcc
> >
> >
> >
> > compiler: gcc-13
> > test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> > we also observed two cases which also failed on parent can pass on this commit.
> > FYI.
> >
> >
> > caccae3ce7b988b6 e2641db83f18782f57a0e107c50
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :6 100% 6:6 perf-sanity-tests.perf_all_PMU_test.fail
> > :6 100% 6:6 perf-sanity-tests.perf_all_metricgroups_test.pass
> > :6 100% 6:6 perf-sanity-tests.perf_all_metrics_test.pass
> >
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
> >
> >
> >
> > 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
> > 105: perf all metricgroups test : Ok
> > 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
> > 106: perf all metrics test : Ok
> > 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
> > 107: perf all libpfm4 events test : Ok
> > 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
> > 108: perf all PMU test : FAILED!
> >
>
> The failure is caused by the below change in the e2641db83f18.
>
> + {
> + "BriefDescription": "This 48-bit fixed counter counts the UCLK
> cycles",
> + "Counter": "FIXED",
> + "EventCode": "0xff",
> + "EventName": "UNC_CLOCK.SOCKET",
> + "PerPkg": "1",
> + "PublicDescription": "This 48-bit fixed counter counts the UCLK
> cycles.",
> + "Unit": "cbox_0"
> }
>
> The other cbox events have the unit name "CBOX", while the fixed counter
> has a unit name "cbox_0". So the events_table will maintain separate
> entries for cbox and cbox_0.
>
> The perf_pmus__print_pmu_events() calculate the total number of events,
> allocate an aliases buffer, store all the events into the buffer, sort,
> and print all the aliases one by one.
>
> The problem is that the calculated total number of events doesn't match
> the stored events on the SKL machine.
>
> The perf_pmu__num_events() is used to calculate the number of events. It
> invokes the pmu_events_table__num_events() to go through the entire
> events_table to find all events. Because of the
> pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
> the events for cbox and cbox_0 are all counted.
>
> When storing events into the aliases buffer, the
> perf_pmu__for_each_event() only process the events for cbox.
>
> Since a bigger buffer was allocated, the last entry are all 0.
> When printing all the aliases, null will be outputed.
>
> $ perf list pmu
>
> List of pre-defined events (to be used in -e or -M):
>
> (null) [Kernel PMU event]
> branch-instructions OR cpu/branch-instructions/ [Kernel PMU event]
> branch-misses OR cpu/branch-misses/ [Kernel PMU event]
>
>
> I'm thinking of two ways to address it.
> One is to only print all the stored events. The below patch can fix it.
>
> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
> index 3fcabfd8fca1..2b2f5117ff84 100644
> --- a/tools/perf/util/pmus.c
> +++ b/tools/perf/util/pmus.c
> @@ -485,6 +485,7 @@ void perf_pmus__print_pmu_events(const struct
> print_callbacks *print_cb, void *p
> perf_pmu__for_each_event(pmu, skip_duplicate_pmus, &state,
> perf_pmus__print_pmu_events__callback);
> }
> + len = state.index;
> qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
> for (int j = 0; j < len; j++) {
> /* Skip duplicates */
>
> The only drawback is that perf list will not show the new cbox_0 event.
> (But the event name still works. Users can still apply perf stat -e
> unc_clock.socket.)
>
> Since the cbox_0 event is only available on old machines (SKL and
> earlier), people should already use the equivalent kernel event. It
> doesn't sounds a big issue for me. I prefer this simple fix.
>
> I think the other way would be to modify the perf_pmu__for_each_event()
> to go through all the possible PMUs.
> It seems complicated and may impact others ARCHs (e.g., S390). I haven't
> tried it yet.
>
> What do you think?
> Do you see any other ways to address the issue?
Ugh. It seems the sizing and then iterating approach is just prone to
keep breaking. Perhaps we can switch to realloc-ed arrays to avoid the
need for perf_pmu__num_events, which seems to be the source of the
problems.
Thanks,
Ian
next prev parent reply other threads:[~2024-07-15 20:11 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-10 4:59 kernel test robot
2024-07-10 13:15 ` Liang, Kan
2024-07-11 8:04 ` Oliver Sang
2024-07-11 13:07 ` Liang, Kan
2024-07-15 20:05 ` Liang, Kan
2024-07-15 20:11 ` Ian Rogers [this message]
2024-07-15 21:41 ` Liang, Kan
2024-07-15 21:48 ` Ian Rogers
2024-07-16 12:52 ` Liang, Kan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAP-5=fUqGcnGvB71jHHTecLqcky6+TrFo+hWb=eBxZjxfe_m-g@mail.gmail.com' \
--to=irogers@google.com \
--cc=alexandre.torgue@foss.st.com \
--cc=caleb.biggers@intel.com \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=lkp@intel.com \
--cc=mcoquelin.stm32@gmail.com \
--cc=namhyung@kernel.org \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=weilin.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox