linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Bharata B Rao <bharata@amd.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<dave.hansen@intel.com>, <gourry@gourry.net>,
	<hannes@cmpxchg.org>, <mgorman@techsingularity.net>,
	<mingo@redhat.com>, <peterz@infradead.org>,
	<raghavendra.kt@amd.com>, <riel@surriel.com>,
	<rientjes@google.com>, <sj@kernel.org>, <weixugc@google.com>,
	<willy@infradead.org>, <ying.huang@linux.alibaba.com>,
	<ziy@nvidia.com>, <dave@stgolabs.net>, <nifan.cxl@gmail.com>,
	<xuezhengchu@huawei.com>, <yiannis@zptcorp.com>,
	<akpm@linux-foundation.org>, <david@redhat.com>,
	<byungchul@sk.com>, <kinseyho@google.com>,
	<joshua.hahnjy@gmail.com>, <yuanchu@google.com>,
	<balbirs@nvidia.com>, <alok.rathore@samsung.com>
Subject: Re: [RFC PATCH v2 4/8] x86: ibs: In-kernel IBS driver for memory access profiling
Date: Fri, 3 Oct 2025 13:19:26 +0100	[thread overview]
Message-ID: <20251003131926.0000363f@huawei.com> (raw)
In-Reply-To: <20250910144653.212066-5-bharata@amd.com>

On Wed, 10 Sep 2025 20:16:49 +0530
Bharata B Rao <bharata@amd.com> wrote:

> Use IBS (Instruction Based Sampling) feature present
> in AMD processors for memory access tracking. The access
> information obtained from IBS via NMI is fed to kpromoted
> daemon for futher action.
> 
> In addition to many other information related to the memory
> access, IBS provides physical (and virtual) address of the access
> and indicates if the access came from slower tier. Only memory
> accesses originating from slower tiers are further acted upon
> by this driver.
> 
> The samples are initially accumulated in percpu buffers which
> are flushed to pghot hot page tracking mechanism using irq_work.
> 
> TODO: Many counters are added to vmstat just as debugging aid
> for now.
> 
> About IBS
> ---------
> IBS can be programmed to provide data about instruction
> execution periodically. This is done by programming a desired
> sample count (number of ops) in a control register. When the
> programmed number of ops are dispatched, a micro-op gets tagged,
> various information about the tagged micro-op's execution is
> populated in IBS execution MSRs and an interrupt is raised.
> While IBS provides a lot of data for each sample, for the
> purpose of  memory access profiling, we are interested in
> linear and physical address of the memory access that reached
> DRAM. Recent AMD processors provide further filtering where
> it is possible to limit the sampling to those ops that had
> an L3 miss which greately reduces the non-useful samples.
> 
> While IBS provides capability to sample instruction fetch
> and execution, only IBS execution sampling is used here
> to collect data about memory accesses that occur during
> the instruction execution.
> 
> More information about IBS is available in Sec 13.3 of
> AMD64 Architecture Programmer's Manual, Volume 2:System
> Programming which is present at:
> https://bugzilla.kernel.org/attachment.cgi?id=288923
> 
> Information about MSRs used for programming IBS can be
> found in Sec 2.1.14.4 of PPR Vol 1 for AMD Family 19h
> Model 11h B1 which is currently present at:
> https://www.amd.com/system/files/TechDocs/55901_0.25.zip
> 
> Signed-off-by: Bharata B Rao <bharata@amd.com>
> ---
>  arch/x86/events/amd/ibs.c        |  11 ++
>  arch/x86/include/asm/ibs.h       |   7 +
>  arch/x86/include/asm/msr-index.h |  16 ++
>  arch/x86/mm/Makefile             |   3 +-
>  arch/x86/mm/ibs.c                | 311 +++++++++++++++++++++++++++++++
>  include/linux/vm_event_item.h    |  17 ++
>  mm/vmstat.c                      |  17 ++
>  7 files changed, 381 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/include/asm/ibs.h
>  create mode 100644 arch/x86/mm/ibs.c
> 
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 112f43b23ebf..1498dc9caeb2 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -13,9 +13,11 @@
>  #include <linux/ptrace.h>
>  #include <linux/syscore_ops.h>
>  #include <linux/sched/clock.h>
> +#include <linux/pghot.h>
>  
>  #include <asm/apic.h>
>  #include <asm/msr.h>
> +#include <asm/ibs.h>
>  
>  #include "../perf_event.h"
>  
> @@ -1756,6 +1758,15 @@ static __init int amd_ibs_init(void)
>  {
>  	u32 caps;
>  
> +	/*
> +	 * TODO: Find a clean way to disable perf IBS so that IBS
> +	 * can be used for memory access profiling.

Agreed on this being a key thing.  This applies to quite a few
other sources of data so finding a generally acceptable solution to this
would be great.  Davidlohr mentioned on the CXL sync that he has
something tackling this for the CHMU driver around this.


> +	 */
> +	if (arch_hw_access_profiling) {
> +		pr_info("IBS isn't available for perf use\n");
> +		return 0;
> +	}
> +
>  	caps = __get_ibs_caps();
>  	if (!caps)
>  		return -ENODEV;	/* ibs not supported by the cpu */

> diff --git a/arch/x86/mm/ibs.c b/arch/x86/mm/ibs.c
> new file mode 100644
> index 000000000000..6669710dd35b
> --- /dev/null
> +++ b/arch/x86/mm/ibs.c
> @@ -0,0 +1,311 @@

...

> +
> +static int ibs_pop_sample(struct ibs_sample *s)
> +{
> +	struct ibs_sample_pcpu *ibs_pcpu = raw_cpu_ptr(ibs_s);
> +
> +	int next = ibs_pcpu->tail + 1;
> +
> +	if (ibs_pcpu->head == ibs_pcpu->tail)
> +		return 0;
> +
> +	if (next >= IBS_NR_SAMPLES)

== seems more appropriate to me.  If it's > then something went wrong
and we lost data.

> +		next = 0;
> +
> +	*s = ibs_pcpu->samples[ibs_pcpu->tail];
> +	ibs_pcpu->tail = next;
> +	return 1;
> +}


> +static void setup_APIC_ibs(void)
> +{
> +	int offset;
> +
> +	offset = get_ibs_lvt_offset();
> +	if (offset < 0)
> +		goto failed;
> +
> +	if (!setup_APIC_eilvt(offset, 0, APIC_EILVT_MSG_NMI, 0))
> +		return;
> +failed:
> +	pr_warn("IBS APIC setup failed on cpu #%d\n",
> +		smp_processor_id());

Unless this is going to get more complex, move that up to the if () block
above and return directly there.

> +}

> +static int __init ibs_access_profiling_init(void)
> +{
> +	if (!boot_cpu_has(X86_FEATURE_IBS)) {
> +		pr_info("IBS capability is unavailable for access profiling\n");
> +		return 0;
> +	}
> +
> +	ibs_s = alloc_percpu_gfp(struct ibs_sample_pcpu, GFP_KERNEL | __GFP_ZERO);

sizeof(*ibs_s).
Same as in other cases. It's nice to avoid having to check types when reviewing code.

> +	if (!ibs_s)
> +		return 0;
> +
> +	INIT_WORK(&ibs_work, ibs_work_handler);
> +	init_irq_work(&ibs_irq_work, ibs_irq_handler);
> +
> +	/* Uses IBS Op sampling */
> +	ibs_config = IBS_OP_CNT_CTL | IBS_OP_ENABLE;
> +	ibs_caps = cpuid_eax(IBS_CPUID_FEATURES);
> +	if (ibs_caps & IBS_CAPS_ZEN4)
> +		ibs_config |= IBS_OP_L3MISSONLY;
ibs_config seems to only be used locally so the global seems unnecessary.
You'll need to pass it in to the one user in the next patch though.


> +
> +	register_nmi_handler(NMI_LOCAL, ibs_overflow_handler, 0, "ibs");
> +
> +	cpuhp_setup_state(CPUHP_AP_PERF_X86_AMD_IBS_STARTING,
> +			  "x86/amd/ibs_access_profile:starting",
> +			  x86_amd_ibs_access_profile_startup,
> +			  x86_amd_ibs_access_profile_teardown);
> +
> +	pr_info("IBS setup for memory access profiling\n");
> +	return 0;
> +}
> +
> +arch_initcall(ibs_access_profiling_init);




  reply	other threads:[~2025-10-03 12:19 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-10 14:46 [RFC PATCH v2 0/8] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 1/8] mm: migrate: Allow misplaced migration without VMA too Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 2/8] migrate: implement migrate_misplaced_folios_batch Bharata B Rao
2025-10-03 10:36   ` Jonathan Cameron
2025-10-03 11:02     ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 3/8] mm: Hot page tracking and promotion Bharata B Rao
2025-10-03 11:17   ` Jonathan Cameron
2025-10-06  4:13     ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 4/8] x86: ibs: In-kernel IBS driver for memory access profiling Bharata B Rao
2025-10-03 12:19   ` Jonathan Cameron [this message]
2025-10-06  4:28     ` Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 5/8] x86: ibs: Enable IBS profiling for memory accesses Bharata B Rao
2025-10-03 12:22   ` Jonathan Cameron
2025-09-10 14:46 ` [RFC PATCH v2 6/8] mm: mglru: generalize page table walk Bharata B Rao
2025-09-10 14:46 ` [RFC PATCH v2 7/8] mm: klruscand: use mglru scanning for page promotion Bharata B Rao
2025-10-03 12:30   ` Jonathan Cameron
2025-09-10 14:46 ` [RFC PATCH v2 8/8] mm: sched: Move hot page promotion from NUMAB=2 to kpromoted Bharata B Rao
2025-10-03 12:38   ` Jonathan Cameron
2025-10-06  5:57     ` Bharata B Rao
2025-10-06  9:53       ` Jonathan Cameron
2025-09-10 15:39 ` [RFC PATCH v2 0/8] mm: Hot page tracking and promotion infrastructure Matthew Wilcox
2025-09-10 16:01   ` Gregory Price
2025-09-16 19:45     ` David Rientjes
2025-09-16 22:02       ` Gregory Price
2025-09-17  0:30       ` Wei Xu
2025-09-17  3:20         ` Balbir Singh
2025-09-17  4:15           ` Bharata B Rao
2025-09-17 16:49         ` Jonathan Cameron
2025-09-25 14:03           ` Yiannis Nikolakopoulos
2025-09-25 14:41             ` Gregory Price
2025-10-16 11:48               ` Yiannis Nikolakopoulos
2025-09-25 15:00             ` Jonathan Cameron
2025-09-25 15:08               ` Gregory Price
2025-09-25 15:18                 ` Gregory Price
2025-09-25 15:24                 ` Jonathan Cameron
2025-09-25 16:06                   ` Gregory Price
2025-09-25 17:23                     ` Jonathan Cameron
2025-09-25 19:02                       ` Gregory Price
2025-10-01  7:22                         ` Gregory Price
2025-10-17  9:53                           ` Yiannis Nikolakopoulos
2025-10-17 14:15                             ` Gregory Price
2025-10-17 14:36                               ` Jonathan Cameron
2025-10-17 14:59                                 ` Gregory Price
2025-10-20 14:05                                   ` Jonathan Cameron
2025-10-21 18:52                                     ` Gregory Price
2025-10-21 18:57                                       ` Gregory Price
2025-10-22  9:09                                         ` Jonathan Cameron
2025-10-22 15:05                                           ` Gregory Price
2025-10-23 15:29                                             ` Jonathan Cameron
2025-10-16 16:16               ` Yiannis Nikolakopoulos
2025-10-20 14:23                 ` Jonathan Cameron
2025-10-20 15:05                   ` Gregory Price
2025-10-08 17:59       ` Vinicius Petrucci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251003131926.0000363f@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alok.rathore@samsung.com \
    --cc=balbirs@nvidia.com \
    --cc=bharata@amd.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nifan.cxl@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xuezhengchu@huawei.com \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox