linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ian Rogers <irogers@google.com>
To: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: kernel test robot <oliver.sang@intel.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	 Linux Memory Management List <linux-mm@kvack.org>,
	Namhyung Kim <namhyung@kernel.org>,
	 Weilin Wang <weilin.wang@intel.com>,
	Caleb Biggers <caleb.biggers@intel.com>,
	 Alexandre Torgue <alexandre.torgue@foss.st.com>,
	Maxime Coquelin <mcoquelin.stm32@gmail.com>,
	 linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [linux-next:master] [perf vendor events] e2641db83f: perf-sanity-tests.perf_all_PMU_test.fail
Date: Mon, 15 Jul 2024 13:11:01 -0700	[thread overview]
Message-ID: <CAP-5=fUqGcnGvB71jHHTecLqcky6+TrFo+hWb=eBxZjxfe_m-g@mail.gmail.com> (raw)
In-Reply-To: <ec744c86-b73e-417a-8e3a-c07142bf37d1@linux.intel.com>

On Mon, Jul 15, 2024 at 1:05 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
> Hi Ian,
>
> On 2024-07-10 12:59 a.m., kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed "perf-sanity-tests.perf_all_PMU_test.fail" on:
> >
> > commit: e2641db83f18782f57a0e107c50d2d1731960fb8 ("perf vendor events: Add/update skylake events/metrics")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > [test failed on linux-next/master 82d01fe6ee52086035b201cfa1410a3b04384257]
> >
> > in testcase: perf-sanity-tests
> > version:
> > with following parameters:
> >
> >       perf_compiler: gcc
> >
> >
> >
> > compiler: gcc-13
> > test machine: 16 threads 1 sockets Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz (Coffee Lake) with 32G memory
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> > we also observed two cases which also failed on parent can pass on this commit.
> > FYI.
> >
> >
> > caccae3ce7b988b6 e2641db83f18782f57a0e107c50
> > ---------------- ---------------------------
> >        fail:runs  %reproduction    fail:runs
> >            |             |             |
> >            :6          100%           6:6     perf-sanity-tests.perf_all_PMU_test.fail
> >            :6          100%           6:6     perf-sanity-tests.perf_all_metricgroups_test.pass
> >            :6          100%           6:6     perf-sanity-tests.perf_all_metrics_test.pass
> >
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202407101021.2c8baddb-oliver.sang@intel.com
> >
> >
> >
> > 2024-07-09 07:09:53 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 105
> > 105: perf all metricgroups test                                      : Ok
> > 2024-07-09 07:10:11 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 106
> > 106: perf all metrics test                                           : Ok
> > 2024-07-09 07:10:23 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 107
> > 107: perf all libpfm4 events test                                    : Ok
> > 2024-07-09 07:10:47 sudo /usr/src/linux-perf-x86_64-rhel-8.3-bpf-e2641db83f18782f57a0e107c50d2d1731960fb8/tools/perf/perf test 108
> > 108: perf all PMU test                                               : FAILED!
> >
>
> The failure is caused by the below change in the e2641db83f18.
>
> +    {
> +        "BriefDescription": "This 48-bit fixed counter counts the UCLK
> cycles",
> +        "Counter": "FIXED",
> +        "EventCode": "0xff",
> +        "EventName": "UNC_CLOCK.SOCKET",
> +        "PerPkg": "1",
> +        "PublicDescription": "This 48-bit fixed counter counts the UCLK
> cycles.",
> +        "Unit": "cbox_0"
>      }
>
> The other cbox events have the unit name "CBOX", while the fixed counter
> has a unit name "cbox_0". So the events_table will maintain separate
> entries for cbox and cbox_0.
>
> The perf_pmus__print_pmu_events() calculate the total number of events,
> allocate an aliases buffer, store all the events into the buffer, sort,
> and print all the aliases one by one.
>
> The problem is that the calculated total number of events doesn't match
> the stored events on the SKL machine.
>
> The perf_pmu__num_events() is used to calculate the number of events. It
> invokes the pmu_events_table__num_events() to go through the entire
> events_table to find all events. Because of the
> pmu_uncore_alias_match(), the suffix of uncore PMU will be ignored. So
> the events for cbox and cbox_0 are all counted.
>
> When storing events into the aliases buffer, the
> perf_pmu__for_each_event() only process the events for cbox.
>
> Since a bigger buffer was allocated, the last entry are all 0.
> When printing all the aliases, null will be outputed.
>
> $ perf list pmu
>
> List of pre-defined events (to be used in -e or -M):
>
>   (null)                                             [Kernel PMU event]
>   branch-instructions OR cpu/branch-instructions/    [Kernel PMU event]
>   branch-misses OR cpu/branch-misses/                [Kernel PMU event]
>
>
> I'm thinking of two ways to address it.
> One is to only print all the stored events. The below patch can fix it.
>
> diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
> index 3fcabfd8fca1..2b2f5117ff84 100644
> --- a/tools/perf/util/pmus.c
> +++ b/tools/perf/util/pmus.c
> @@ -485,6 +485,7 @@ void perf_pmus__print_pmu_events(const struct
> print_callbacks *print_cb, void *p
>                 perf_pmu__for_each_event(pmu, skip_duplicate_pmus, &state,
>                                         perf_pmus__print_pmu_events__callback);
>         }
> +       len = state.index;
>         qsort(aliases, len, sizeof(struct sevent), cmp_sevent);
>         for (int j = 0; j < len; j++) {
>                 /* Skip duplicates */
>
> The only drawback is that perf list will not show the new cbox_0 event.
> (But the event name still works. Users can still apply perf stat -e
> unc_clock.socket.)
>
> Since the cbox_0 event is only available on old machines (SKL and
> earlier), people should already use the equivalent kernel event. It
> doesn't sounds a big issue for me. I prefer this simple fix.
>
> I think the other way would be to modify the perf_pmu__for_each_event()
> to go through all the possible PMUs.
> It seems complicated and may impact others ARCHs (e.g., S390). I haven't
> tried it yet.
>
> What do you think?
> Do you see any other ways to address the issue?

Ugh. It seems the sizing and then iterating approach is just prone to
keep breaking. Perhaps we can switch to realloc-ed arrays to avoid the
need for perf_pmu__num_events, which seems to be the source of the
problems.

Thanks,
Ian


  reply	other threads:[~2024-07-15 20:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-10  4:59 kernel test robot
2024-07-10 13:15 ` Liang, Kan
2024-07-11  8:04   ` Oliver Sang
2024-07-11 13:07     ` Liang, Kan
2024-07-15 20:05 ` Liang, Kan
2024-07-15 20:11   ` Ian Rogers [this message]
2024-07-15 21:41     ` Liang, Kan
2024-07-15 21:48       ` Ian Rogers
2024-07-16 12:52         ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP-5=fUqGcnGvB71jHHTecLqcky6+TrFo+hWb=eBxZjxfe_m-g@mail.gmail.com' \
    --to=irogers@google.com \
    --cc=alexandre.torgue@foss.st.com \
    --cc=caleb.biggers@intel.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=mcoquelin.stm32@gmail.com \
    --cc=namhyung@kernel.org \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=weilin.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox