From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Bharata B Rao <bharata@amd.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
<dave.hansen@intel.com>, <gourry@gourry.net>,
<hannes@cmpxchg.org>, <mgorman@techsingularity.net>,
<mingo@redhat.com>, <peterz@infradead.org>,
<raghavendra.kt@amd.com>, <riel@surriel.com>,
<rientjes@google.com>, <sj@kernel.org>, <weixugc@google.com>,
<willy@infradead.org>, <ying.huang@linux.alibaba.com>,
<ziy@nvidia.com>, <dave@stgolabs.net>, <nifan.cxl@gmail.com>,
<xuezhengchu@huawei.com>, <yiannis@zptcorp.com>,
<akpm@linux-foundation.org>, <david@redhat.com>,
<byungchul@sk.com>, <kinseyho@google.com>,
<joshua.hahnjy@gmail.com>, <yuanchu@google.com>,
<balbirs@nvidia.com>, <alok.rathore@samsung.com>
Subject: Re: [RFC PATCH v2 4/8] x86: ibs: In-kernel IBS driver for memory access profiling
Date: Fri, 3 Oct 2025 13:19:26 +0100 [thread overview]
Message-ID: <20251003131926.0000363f@huawei.com> (raw)
In-Reply-To: <20250910144653.212066-5-bharata@amd.com>
On Wed, 10 Sep 2025 20:16:49 +0530
Bharata B Rao <bharata@amd.com> wrote:
> Use IBS (Instruction Based Sampling) feature present
> in AMD processors for memory access tracking. The access
> information obtained from IBS via NMI is fed to kpromoted
> daemon for futher action.
>
> In addition to many other information related to the memory
> access, IBS provides physical (and virtual) address of the access
> and indicates if the access came from slower tier. Only memory
> accesses originating from slower tiers are further acted upon
> by this driver.
>
> The samples are initially accumulated in percpu buffers which
> are flushed to pghot hot page tracking mechanism using irq_work.
>
> TODO: Many counters are added to vmstat just as debugging aid
> for now.
>
> About IBS
> ---------
> IBS can be programmed to provide data about instruction
> execution periodically. This is done by programming a desired
> sample count (number of ops) in a control register. When the
> programmed number of ops are dispatched, a micro-op gets tagged,
> various information about the tagged micro-op's execution is
> populated in IBS execution MSRs and an interrupt is raised.
> While IBS provides a lot of data for each sample, for the
> purpose of memory access profiling, we are interested in
> linear and physical address of the memory access that reached
> DRAM. Recent AMD processors provide further filtering where
> it is possible to limit the sampling to those ops that had
> an L3 miss which greately reduces the non-useful samples.
>
> While IBS provides capability to sample instruction fetch
> and execution, only IBS execution sampling is used here
> to collect data about memory accesses that occur during
> the instruction execution.
>
> More information about IBS is available in Sec 13.3 of
> AMD64 Architecture Programmer's Manual, Volume 2:System
> Programming which is present at:
> https://bugzilla.kernel.org/attachment.cgi?id=288923
>
> Information about MSRs used for programming IBS can be
> found in Sec 2.1.14.4 of PPR Vol 1 for AMD Family 19h
> Model 11h B1 which is currently present at:
> https://www.amd.com/system/files/TechDocs/55901_0.25.zip
>
> Signed-off-by: Bharata B Rao <bharata@amd.com>
> ---
> arch/x86/events/amd/ibs.c | 11 ++
> arch/x86/include/asm/ibs.h | 7 +
> arch/x86/include/asm/msr-index.h | 16 ++
> arch/x86/mm/Makefile | 3 +-
> arch/x86/mm/ibs.c | 311 +++++++++++++++++++++++++++++++
> include/linux/vm_event_item.h | 17 ++
> mm/vmstat.c | 17 ++
> 7 files changed, 381 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/include/asm/ibs.h
> create mode 100644 arch/x86/mm/ibs.c
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 112f43b23ebf..1498dc9caeb2 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -13,9 +13,11 @@
> #include <linux/ptrace.h>
> #include <linux/syscore_ops.h>
> #include <linux/sched/clock.h>
> +#include <linux/pghot.h>
>
> #include <asm/apic.h>
> #include <asm/msr.h>
> +#include <asm/ibs.h>
>
> #include "../perf_event.h"
>
> @@ -1756,6 +1758,15 @@ static __init int amd_ibs_init(void)
> {
> u32 caps;
>
> + /*
> + * TODO: Find a clean way to disable perf IBS so that IBS
> + * can be used for memory access profiling.
Agreed on this being a key thing. This applies to quite a few
other sources of data so finding a generally acceptable solution to this
would be great. Davidlohr mentioned on the CXL sync that he has
something tackling this for the CHMU driver around this.
> + */
> + if (arch_hw_access_profiling) {
> + pr_info("IBS isn't available for perf use\n");
> + return 0;
> + }
> +
> caps = __get_ibs_caps();
> if (!caps)
> return -ENODEV; /* ibs not supported by the cpu */
> diff --git a/arch/x86/mm/ibs.c b/arch/x86/mm/ibs.c
> new file mode 100644
> index 000000000000..6669710dd35b
> --- /dev/null
> +++ b/arch/x86/mm/ibs.c
> @@ -0,0 +1,311 @@
...
> +
> +static int ibs_pop_sample(struct ibs_sample *s)
> +{
> + struct ibs_sample_pcpu *ibs_pcpu = raw_cpu_ptr(ibs_s);
> +
> + int next = ibs_pcpu->tail + 1;
> +
> + if (ibs_pcpu->head == ibs_pcpu->tail)
> + return 0;
> +
> + if (next >= IBS_NR_SAMPLES)
== seems more appropriate to me. If it's > then something went wrong
and we lost data.
> + next = 0;
> +
> + *s = ibs_pcpu->samples[ibs_pcpu->tail];
> + ibs_pcpu->tail = next;
> + return 1;
> +}
> +static void setup_APIC_ibs(void)
> +{
> + int offset;
> +
> + offset = get_ibs_lvt_offset();
> + if (offset < 0)
> + goto failed;
> +
> + if (!setup_APIC_eilvt(offset, 0, APIC_EILVT_MSG_NMI, 0))
> + return;
> +failed:
> + pr_warn("IBS APIC setup failed on cpu #%d\n",
> + smp_processor_id());
Unless this is going to get more complex, move that up to the if () block
above and return directly there.
> +}
> +static int __init ibs_access_profiling_init(void)
> +{
> + if (!boot_cpu_has(X86_FEATURE_IBS)) {
> + pr_info("IBS capability is unavailable for access profiling\n");
> + return 0;
> + }
> +
> + ibs_s = alloc_percpu_gfp(struct ibs_sample_pcpu, GFP_KERNEL | __GFP_ZERO);
sizeof(*ibs_s).
Same as in other cases. It's nice to avoid having to check types when reviewing code.
> + if (!ibs_s)
> + return 0;
> +
> + INIT_WORK(&ibs_work, ibs_work_handler);
> + init_irq_work(&ibs_irq_work, ibs_irq_handler);
> +
> + /* Uses IBS Op sampling */
> + ibs_config = IBS_OP_CNT_CTL | IBS_OP_ENABLE;
> + ibs_caps = cpuid_eax(IBS_CPUID_FEATURES);
> + if (ibs_caps & IBS_CAPS_ZEN4)
> + ibs_config |= IBS_OP_L3MISSONLY;
ibs_config seems to only be used locally so the global seems unnecessary.
You'll need to pass it in to the one user in the next patch though.
> +
> + register_nmi_handler(NMI_LOCAL, ibs_overflow_handler, 0, "ibs");
> +
> + cpuhp_setup_state(CPUHP_AP_PERF_X86_AMD_IBS_STARTING,
> + "x86/amd/ibs_access_profile:starting",
> + x86_amd_ibs_access_profile_startup,
> + x86_amd_ibs_access_profile_teardown);
> +
> + pr_info("IBS setup for memory access profiling\n");
> + return 0;
> +}
> +
> +arch_initcall(ibs_access_profiling_init);
next prev parent reply other threads:[~2025-10-03 12:19 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-10 14:46 [RFC PATCH v2 0/8] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 1/8] mm: migrate: Allow misplaced migration without VMA too Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 2/8] migrate: implement migrate_misplaced_folios_batch Bharata B Rao
2025-10-03 10:36 ` Jonathan Cameron
2025-10-03 11:02 ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 3/8] mm: Hot page tracking and promotion Bharata B Rao
2025-10-03 11:17 ` Jonathan Cameron
2025-10-06 4:13 ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 4/8] x86: ibs: In-kernel IBS driver for memory access profiling Bharata B Rao
2025-10-03 12:19 ` Jonathan Cameron [this message]
2025-10-06 4:28 ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 5/8] x86: ibs: Enable IBS profiling for memory accesses Bharata B Rao
2025-10-03 12:22 ` Jonathan Cameron
2025-09-10 14:46 ` [RFC PATCH v2 6/8] mm: mglru: generalize page table walk Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 7/8] mm: klruscand: use mglru scanning for page promotion Bharata B Rao
2025-10-03 12:30 ` Jonathan Cameron
2025-09-10 14:46 ` [RFC PATCH v2 8/8] mm: sched: Move hot page promotion from NUMAB=2 to kpromoted Bharata B Rao
2025-10-03 12:38 ` Jonathan Cameron
2025-10-06 5:57 ` Bharata B Rao
2025-10-06 9:53 ` Jonathan Cameron
2025-09-10 15:39 ` [RFC PATCH v2 0/8] mm: Hot page tracking and promotion infrastructure Matthew Wilcox
2025-09-10 16:01 ` Gregory Price
2025-09-16 19:45 ` David Rientjes
2025-09-16 22:02 ` Gregory Price
2025-09-17 0:30 ` Wei Xu
2025-09-17 3:20 ` Balbir Singh
2025-09-17 4:15 ` Bharata B Rao
2025-09-17 16:49 ` Jonathan Cameron
2025-09-25 14:03 ` Yiannis Nikolakopoulos
2025-09-25 14:41 ` Gregory Price
2025-10-16 11:48 ` Yiannis Nikolakopoulos
2025-09-25 15:00 ` Jonathan Cameron
2025-09-25 15:08 ` Gregory Price
2025-09-25 15:18 ` Gregory Price
2025-09-25 15:24 ` Jonathan Cameron
2025-09-25 16:06 ` Gregory Price
2025-09-25 17:23 ` Jonathan Cameron
2025-09-25 19:02 ` Gregory Price
2025-10-01 7:22 ` Gregory Price
2025-10-17 9:53 ` Yiannis Nikolakopoulos
2025-10-17 14:15 ` Gregory Price
2025-10-17 14:36 ` Jonathan Cameron
2025-10-17 14:59 ` Gregory Price
2025-10-20 14:05 ` Jonathan Cameron
2025-10-21 18:52 ` Gregory Price
2025-10-21 18:57 ` Gregory Price
2025-10-22 9:09 ` Jonathan Cameron
2025-10-22 15:05 ` Gregory Price
2025-10-23 15:29 ` Jonathan Cameron
2025-10-16 16:16 ` Yiannis Nikolakopoulos
2025-10-20 14:23 ` Jonathan Cameron
2025-10-20 15:05 ` Gregory Price
2025-10-08 17:59 ` Vinicius Petrucci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251003131926.0000363f@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alok.rathore@samsung.com \
--cc=balbirs@nvidia.com \
--cc=bharata@amd.com \
--cc=byungchul@sk.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=joshua.hahnjy@gmail.com \
--cc=kinseyho@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nifan.cxl@gmail.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=sj@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=xuezhengchu@huawei.com \
--cc=yiannis@zptcorp.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox