From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBF18C433FE for ; Wed, 13 Apr 2022 16:03:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 790406B0072; Wed, 13 Apr 2022 12:03:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 73F6E6B0073; Wed, 13 Apr 2022 12:03:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 607986B0074; Wed, 13 Apr 2022 12:03:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0122.hostedemail.com [216.40.44.122]) by kanga.kvack.org (Postfix) with ESMTP id 521AB6B0072 for ; Wed, 13 Apr 2022 12:03:22 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 15A8E183F4511 for ; Wed, 13 Apr 2022 16:03:22 +0000 (UTC) X-FDA: 79352325444.22.96AAFF5 Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) by imf26.hostedemail.com (Postfix) with ESMTP id 8BC5B140013 for ; Wed, 13 Apr 2022 16:03:21 +0000 (UTC) Received: by mail-wr1-f54.google.com with SMTP id q3so2674156wrj.7 for ; Wed, 13 Apr 2022 09:03:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yedcRgkc7s4dvArNwh0MPlWFwggg/+T3kzvgrVF5RC0=; b=kaRyFK0fisCRAF9VzIqi6b4stsftgqRPZF+ozdSEXuqvdORc4erairCdtRraUUfHtt PV3SPCTfr3/0HhrklfVXy0QXkie6Ffd7bTbzT/rqGKBzQY5rmMcglDq9l5giBeF1ve9i fJoMt4rRNSXEQc7Q57La9XG0nCJSdtkDm+/KhMWrtpENrxL9aGQHgVdsnBHvxV5A2Hsk itRZyMPOeK77opSvV2Wxdu5ZnV3ftqAPNRIff9WeMYfejIbuUc8LKwn2gUlpT7SMz92S UC/BfOKe87giDXeznOs7GhLbfJSBJ5iBVEXe35Kgg2o433wczD8/EjlLBjqzfocVslHd dF6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yedcRgkc7s4dvArNwh0MPlWFwggg/+T3kzvgrVF5RC0=; b=BeNfH8P5qtmH6HOdFxzrGqayIlIBRvRq5sOCUWPTnn2cVSykddon85fI91R8+lh5zA ontOdP3LPaf8X09zcS3+nsg8/O20dl/fcgY7CenyfOpEkh29Tw6ZxO+LVUHJ6p4S1DDM TWodXNUlY0s5pSvLA5Z5FnsKFxbJ+4d1s3pYxHwypQrbI0BgndQY65Ig2spJdjBNhdrM nFfv/2mhYJ8nVtkJhXNF42X25mHAjyn+VcmrPRnXUcNbc/b989Us8wdgX8L1FqRfvbVL gTn8EYVH1wvp0grvoXYLnzmXgxRpzbUnNsoiLEUXWuXaFK2+SDMV7FirQzSgYBGypbUl 90Tg== X-Gm-Message-State: AOAM530AvxJbr1jMnCYiUJG0wAMCu+W7OKV2bwxjnwK2eeKDw63UYzzd MC1fzPs65WZxlfkV7aEEM9t0zmLe6sDbn2yF7g7HNw== X-Google-Smtp-Source: ABdhPJzRGJb5XgKOekLscTqnwjA0B7NTu6tP3S9Ws3LM3/CmfQvdndayaTVpGGqmJBXUvNxYbhdn5z2XTJntOp70Rz8= X-Received: by 2002:adf:eb09:0:b0:207:bb77:9abb with SMTP id s9-20020adfeb09000000b00207bb779abbmr1041171wrn.375.1649865799104; Wed, 13 Apr 2022 09:03:19 -0700 (PDT) MIME-Version: 1.0 References: <20220304083329.GC20556@xsang-OptiPlex-9020> <20220413070529.GA1320@linux.intel.com> In-Reply-To: <20220413070529.GA1320@linux.intel.com> From: Ian Rogers Date: Wed, 13 Apr 2022 09:03:05 -0700 Message-ID: Subject: Re: [LKP] Re: [perf vendor events] 3f5f0df7bf: perf-sanity-tests.perf_all_metrics_test.fail To: Carel Si Cc: acme@redhat.com, kan.liang@linux.intel.com, alexander.shishkin@linux.intel.com, alexandre.torgue@foss.st.com, ak@linux.intel.com, mingo@redhat.com, james.clark@arm.com, jolsa@kernel.org, john.garry@huawei.com, mark.rutland@arm.com, mcoquelin.stm32@gmail.com, namhyung@kernel.org, peterz@infradead.org, eranian@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, lkp@lists.01.org, lkp@intel.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kaRyFK0f; spf=pass (imf26.hostedemail.com: domain of irogers@google.com designates 209.85.221.54 as permitted sender) smtp.mailfrom=irogers@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 8BC5B140013 X-Stat-Signature: aunmb3jnqottbaddywacpwryaz4tsj1a X-HE-Tag: 1649865801-362790 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 13, 2022 at 12:06 AM Carel Si wrote: > > Hi, > > On Fri, Mar 04, 2022 at 10:10:53AM -0800, Ian Rogers wrote: > > On Fri, Mar 4, 2022 at 12:33 AM kernel test robot wrote: > > > > > > > > > > > > Greeting, > > > > > > FYI, we noticed the following commit (built with gcc-9): > > > > > > commit: 3f5f0df7bf0f8c48d33d43454fc0b7d0f3ab9537 ("perf vendor events: Update metrics for Skylake") > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > > > > > in testcase: perf-sanity-tests > > > version: perf-x86_64-fb184c4af9b9-1_20220302 > > > with following parameters: > > > > > > perf_compiler: clang > > > ucode: 0xec > > > > > > > > > > > > on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz with 32G memory > > > > > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > > Hi, > > > > Thanks for the report! There is no information in the test output that > > I can diagnose the issue with, could you add the -v option to perf > > test so that I can see what the cause is, rather than just pass/fail. > > We Added '-v' option, found out that 3f5f0df7bf failed at testing > 'Branching_Overhead' [1] and 'IpArith_Scalar_SP' [2], details attached > in perf-sanity-tests.xz > > [1] > > Testing Branching_Overhead > Metric 'Branching_Overhead' not printed in: > # Running 'internals/synthesize' benchmark: > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 459.468 usec (+- 0.265 usec) > Average num. events: 44.000 (+- 0.000) > Average time per event 10.442 usec > Average data synthesis took: 486.181 usec (+- 0.272 usec) > Average num. events: 296.000 (+- 0.000) > Average time per event 1.643 usec > > Performance counter stats for 'perf bench internals synthesize': > > BR_INST_RETIRED.NEAR_CALL (0.00%) > BR_INST_RETIRED.NEAR_TAKEN (0.00%) > BR_INST_RETIRED.NOT_TAKEN (0.00%) > BR_INST_RETIRED.CONDITIONAL (0.00%) > CPU_CLK_UNHALTED.THREAD (0.00%) > 9772951660 ns duration_time > > 9.772951660 seconds time elapsed > > 4.343887000 seconds user > 5.248839000 seconds sys > > > Some events weren't counted. Try disabling the NMI watchdog: > echo 0 > /proc/sys/kernel/nmi_watchdog > perf stat ... > echo 1 > /proc/sys/kernel/nmi_watchdog So the failure here is that the nmi_watchdog on your machine uses a performance counter which means the group of events doesn't have sufficient counters to compute the metric. There are a couple of known issues here: 1) We create metric groups as weak groups, the perf_event_open should fail for the group of events above so that then we don't group the events. Something is wrong in the kernel PMU code meaning this isn't happening. Perhaps Kan can take a look? I'll provide more details below. 2) Ideally we wouldn't use a performance counter for the NMI watchdog: https://lore.kernel.org/lkml/1558660583-28561-1-git-send-email-ricardo.neri-calderon@linux.intel.com/ We could expand the test here: https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/tests/shell/stat_all_metrics.sh?h=perf/core#n18 so that NMI watchdog failures are skip rather than fail. Skylake group failures not breaking weak group (tested on a SkylakeX): 1) No group works: $ perf stat -e 'BR_INST_RETIRED.NEAR_CALL,BR_INST_RETIRED.NEAR_TAKEN,BR_INST_RETIRED.NOT_TAKEN,BR_INST_RETIRED.CONDITIONAL,CPU_CLK_UNHALTED.THREAD' -a sleep 1 Performance counter stats for 'system wide': 7,979,997 BR_INST_RETIRED.NEAR_CALL (79.98%) 45,462,860 BR_INST_RETIRED.NEAR_TAKEN (80.04%) 54,698,502 BR_INST_RETIRED.NOT_TAKEN (80.05%) 78,865,520 BR_INST_RETIRED.CONDITIONAL (80.04%) 1,104,280,963 CPU_CLK_UNHALTED.THREAD (79.89%) 1.001761717 seconds time elapsed 2) Hard group fails: $ perf stat -e '{BR_INST_RETIRED.NEAR_CALL,BR_INST_RETIRED.NEAR_TAKEN,BR_INST_RETIRED.NOT_TAKEN,BR_INST_RETIRED.CONDITIONAL,CPU_CLK_UNHALTED.THREAD}' -a sleep 1 Performance counter stats for 'system wide': BR_INST_RETIRED.NEAR_CALL (0.00%) BR_INST_RETIRED.NEAR_TAKEN (0.00%) BR_INST_RETIRED.NOT_TAKEN (0.00%) BR_INST_RETIRED.CONDITIONAL (0.00%) CPU_CLK_UNHALTED.THREAD (0.00%) 1.001565418 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog 3) Weak group doesn't fall back to no group: $ perf stat -e '{BR_INST_RETIRED.NEAR_CALL,BR_INST_RETIRED.NEAR_TAKEN,BR_INST_RETIRED.NOT_TAKEN,BR_INST_RETIRED.CONDITIONAL,CPU_CLK_UNHALTED.THREAD}:W' -a sleep 1 Performance counter stats for 'system wide': BR_INST_RETIRED.NEAR_CALL (0.00%) BR_INST_RETIRED.NEAR_TAKEN (0.00%) BR_INST_RETIRED.NOT_TAKEN (0.00%) BR_INST_RETIRED.CONDITIONAL (0.00%) CPU_CLK_UNHALTED.THREAD (0.00%) 1.001690318 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog > [2] > > Testing IpArith_Scalar_SP > Metric 'IpArith_Scalar_SP' not printed in: > # Running 'internals/synthesize' benchmark: > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 458.601 usec (+- 0.257 usec) > Average num. events: 44.000 (+- 0.000) > Average time per event 10.423 usec > Average data synthesis took: 486.297 usec (+- 0.306 usec) > Average num. events: 296.000 (+- 0.000) > Average time per event 1.643 usec > > Performance counter stats for 'perf bench internals synthesize': > > 108854260048 INST_RETIRED.ANY > 0 FP_ARITH_INST_RETIRED.SCALAR_SINGLE > 9750270760 ns duration_time > > 9.750270760 seconds time elapsed > > 4.288438000 seconds user > 5.323337000 seconds sys I believe this fail case is now a skip. The relevant fix was: https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/tools/perf/tests/shell/stat_all_metrics.sh?h=perf/core&id=00236a2dc8a3768fdc689380d2e93b96cc971bd7 Thanks, Ian > Thanks > > > At the time of filing the update I didn't have access to a Skylake > > machine (just SkylakeX) but this test was ran as detailed in the > > commit message: > > https://lore.kernel.org/lkml/20220201015858.1226914-21-irogers@google.com/ > > Knowing the test, I suspect there may be a bad event on Skylake, but > > can't confirm this because I lack the hardware and/or the test output. > > The issue may also be how the test was run, such as not as root, not > > in a container. There is a further issue with this test that metrics > > (e.g. number of vector ops) that measure things that a simple > > benchmark doesn't cause counts for can fail the test, as the test is > > checking if the metric is reported - for example, there may be no > > vector ops within the simple benchmark. > > > > Thanks, > > Ian > > > > > If you fix the issue, kindly add following tag > > > Reported-by: kernel test robot > > > > > > > > > > > > 2022-03-02 19:01:56 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-func-3f5f0df7bf0f8c48d33d43454fc0b7d0f3ab9537/tools/perf/perf test 89 > > > 89: perf all metricgroups test : Ok > > > 2022-03-02 19:02:05 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-func-3f5f0df7bf0f8c48d33d43454fc0b7d0f3ab9537/tools/perf/perf test 90 > > > 90: perf all metrics test : FAILED! > > > 2022-03-02 19:07:00 sudo /usr/src/perf_selftests-x86_64-rhel-8.3-func-3f5f0df7bf0f8c48d33d43454fc0b7d0f3ab9537/tools/perf/perf test 91 > > > 91: perf all PMU test : Ok > > > > > > > > > > > > To reproduce: > > > > > > git clone https://github.com/intel/lkp-tests.git > > > cd lkp-tests > > > sudo bin/lkp install job.yaml # job file is attached in this email > > > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run > > > sudo bin/lkp run generated-yaml-file > > > > > > # if come across any failure that blocks the test, > > > # please remove ~/.lkp and /lkp dir to run from a clean state. > > > > > > > > > > > > --- > > > 0DAY/LKP+ Test Infrastructure Open Source Technology Center > > > https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation > > > > > > Thanks, > > > Oliver Sang > > > > > _______________________________________________ > > LKP mailing list -- lkp@lists.01.org > > To unsubscribe send an email to lkp-leave@lists.01.org