From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C860C35FF1 for ; Fri, 14 Mar 2025 15:38:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 268D6280002; Fri, 14 Mar 2025 11:38:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 21952280001; Fri, 14 Mar 2025 11:38:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 108C3280002; Fri, 14 Mar 2025 11:38:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E9F9E280001 for ; Fri, 14 Mar 2025 11:38:51 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 52576A8C3E for ; Fri, 14 Mar 2025 15:38:52 +0000 (UTC) X-FDA: 83220564504.14.B75A671 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf13.hostedemail.com (Postfix) with ESMTP id 2A95D20005 for ; Fri, 14 Mar 2025 15:38:49 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf13.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741966730; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=stb9Bbv0tdbmu459dBIQ6IXyxqxnspI9fI6G8cuy3sc=; b=VcXX1CJ0zAAvBcKKnM9FNzS4mHEc5tZ9nCTczLkDGAXxJp6/swcQ94DeORP8ndN52Z0F1g XhS6pfeuftsqZ3SmN8R+nTKVj5RsNd0bq20M6GMbn/RL4FTGs5RTrAImHo1zm8v59xkr+Y n/9S0mb24ecgpuCCCkoks+sjxqUqd0w= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf13.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741966730; a=rsa-sha256; cv=none; b=W53IYCZZ1pWdcmAAKgs2EdnFP3+oqEtBUJkvUbHi6FCGWkSUxox4Vs5TnzTu5XX2UTyaZ+ OOnj3uOwcFOLAFoWu7klEWzfAFNJj9DcHz62OAA9/B2VM/x1QmRsBoJdwMepGPTQWXnIIs 4MRvUIaq0a6k8cS5IH+YChbQ2PhAOZw= Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4ZDpNN5T8Mz6D91K; Fri, 14 Mar 2025 23:35:32 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id 21D9E14039F; Fri, 14 Mar 2025 23:38:44 +0800 (CST) Received: from localhost (10.203.177.66) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 14 Mar 2025 16:38:42 +0100 Date: Fri, 14 Mar 2025 15:38:41 +0000 From: Jonathan Cameron To: Bharata B Rao CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: [RFC PATCH 3/4] x86: ibs: In-kernel IBS driver for memory access profiling Message-ID: <20250314153841.00006978@huawei.com> In-Reply-To: <20250306054532.221138-4-bharata@amd.com> References: <20250306054532.221138-1-bharata@amd.com> <20250306054532.221138-4-bharata@amd.com> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.177.66] X-ClientProxiedBy: lhrpeml500005.china.huawei.com (7.191.163.240) To frapeml500008.china.huawei.com (7.182.85.71) X-Rspam-User: X-Rspamd-Queue-Id: 2A95D20005 X-Rspamd-Server: rspam08 X-Stat-Signature: pkan3k6fw43hh69d95wcw45khckdf3ni X-HE-Tag: 1741966729-677274 X-HE-Meta: U2FsdGVkX194NA7yJ908CzBD0P6znWDbzCGUs2jCyP5JvutrQSq9co5sDsKxs/76zm5rxGMMPWapdSAYZ9vTeI6ImvZxtN7ceTJEex4EiFvY0RS0i29kO7YB43IpjmUU95NPLyga7iyVuLml5foFH6gFsng/PpezpFb2k+4UgFRpDD4B03jFptY1IEZ8Vupgq842hsxdvuw29QHlEkgs6+/+Agx+0w/nmzBOL5mwAljRPdmGsHAsdp1eCERMsqIgSkJnleTkYibFfl6BScIuDNwy4JYXvhcHAeC5RfwypGtpa/zI6DYVNSGG8YEMLU0DdcicBI9lqGeoiDcGyvASEhEhayWbdsdHzjHYxScUTZZAGjB/GRm5qrTHKLkLszGb/SnA6X6lTzxSypDC4964wl4JgJPpd7wxRFT78Bjs5F0B+6+X6/ra1ByWr9rTQCu2EUQYK8PwAdZ2LvetwPqAuJgWVv7y7WV8md3FRmTFdc+ZbeNNNNfJalEtKr9aCeU/e5mjhiYGjvAQA5C7AlWbXhvZAkNsb6786WrnthvOwVTNlD/sGYrKlkwzxsg3pe9gYnsf3mEMO4p1WYqwow3JarUJnJMWghaYIDXebd1PnqYFIuEPcA+IbOI/4RrGrodWphG7KltXOeDvcaugwrM/z9e4ByKpsb3KDpoRvj3Da/XAWJf/UQOOhAvHPDF0TCvhaOCc5tX+TDCljbp/nPxwj/x6OzAU7QyEvKItDASHnYWD+1+NnO/OF01+bJWQW+zuBkecT5DUpMIh+KFvj3e9kRFdYppKgLWVDBSOr4TDj0lwB1PH2vU2WuTUTnhhFOKapQ+UG3sfe2tzGRKltYxhXbkyj+7Ma/czL8JWWuaovnk0r8INBWJ+UM6gDinqDWt7yRY7i4gGmc+zblJbEwNpti9obPXTL5aBdCDk8d0UKQsRRNACzpwlq3PtlhMinAxBtu1T5Bb5p5wZfV2nfvp Q36SNVSm lr/D4u2r4vltiBrigU5KAk7XtoNnVit1Hpq9fxwvD6EOqXH77m3gBBpTtF4NGC+76IMzeZeaLvliqHX1neum3W0STtvxQ7yF/PG+YvRai0FuP0fnJaWE96oaKQ8650bpfovPmromD0NOfS2y4qZTBaZCKftXBs5EMO/Xkl+xDXhVJBcbQCGxULZBNI2oHB9yAIFzMX2OBFok36PBTnNmoKhMaWO5vbX6t4Dtch7bTxjEI6ZmehI80+wTPy1S+8E7QhYLtj2ypkB+a6I2wBqBNFmjwaOoioTqFcsyRK2mEHfF3kpM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 6 Mar 2025 11:15:31 +0530 Bharata B Rao wrote: > Use IBS (Instruction Based Sampling) feature present > in AMD processors for memory access tracking. The access > information obtained from IBS via NMI is fed to kpromoted > daemon for futher action. > > In addition to many other information related to the memory > access, IBS provides physical (and virtual) address of the access > and indicates if the access came from slower tier. Only memory > accesses originating from slower tiers are further acted upon > by this driver. > > The samples are initially accumulated in percpu buffers which > are flushed to kpromoted using irq_work. > > About IBS > --------- > IBS can be programmed to provide data about instruction > execution periodically. This is done by programming a desired > sample count (number of ops) in a control register. When the > programmed number of ops are dispatched, a micro-op gets tagged, > various information about the tagged micro-op's execution is > populated in IBS execution MSRs and an interrupt is raised. > While IBS provides a lot of data for each sample, for the > purpose of memory access profiling, we are interested in > linear and physical address of the memory access that reached > DRAM. Recent AMD processors provide further filtering where > it is possible to limit the sampling to those ops that had > an L3 miss which greately reduces the non-useful samples. > > While IBS provides capability to sample instruction fetch > and execution, only IBS execution sampling is used here > to collect data about memory accesses that occur during > the instruction execution. > > More information about IBS is available in Sec 13.3 of > AMD64 Architecture Programmer's Manual, Volume 2:System > Programming which is present at: > https://bugzilla.kernel.org/attachment.cgi?id=288923 > > Information about MSRs used for programming IBS can be > found in Sec 2.1.14.4 of PPR Vol 1 for AMD Family 19h > Model 11h B1 which is currently present at: > https://www.amd.com/system/files/TechDocs/55901_0.25.zip > > Signed-off-by: Bharata B Rao > --- Trivial comments inline. I'd love to find a clean way to steal stuff perf is using though. > arch/x86/events/amd/ibs.c | 11 ++ > arch/x86/include/asm/ibs.h | 7 + > arch/x86/include/asm/msr-index.h | 16 ++ > arch/x86/mm/Makefile | 3 +- > arch/x86/mm/ibs.c | 312 +++++++++++++++++++++++++++++++ > include/linux/vm_event_item.h | 17 ++ > mm/vmstat.c | 17 ++ > 7 files changed, 382 insertions(+), 1 deletion(-) > create mode 100644 arch/x86/include/asm/ibs.h > create mode 100644 arch/x86/mm/ibs.c > > diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c > index e7a8b8758e08..35497e8c0846 100644 > --- a/arch/x86/events/amd/ibs.c > +++ b/arch/x86/events/amd/ibs.c > @@ -13,8 +13,10 @@ > #include > #include > #include > +#include > > #include > +#include > > #include "../perf_event.h" > > @@ -1539,6 +1541,15 @@ static __init int amd_ibs_init(void) > { > u32 caps; > > + /* > + * TODO: Find a clean way to disable perf IBS so that IBS > + * can be used for memory access profiling. Yeah. That bit us in a number of similar cases. Does anyone have a good solution for this? For my hammer (CXL HMU) the perf case is probably the niche one so I'm less worried, but for SPE, IBS, PEBS etc we need to figure out how to elegantly back off on promotion if a user wants to use tracing. > + */ > + if (arch_hw_access_profiling) { > + pr_info("IBS isn't available for perf use\n"); > + return 0; > + } > + > caps = __get_ibs_caps(); > if (!caps) > return -ENODEV; /* ibs not supported by the cpu */ > + > +static void clear_APIC_ibs(void) > +{ > + int offset; > + > + offset = get_ibs_lvt_offset(); Trivial but I'd flip condition and deal with the error out of line. Ah I see this is cut and paste from existing code I'll stop pointing this stuff out! if (offset < 0) return; setup_APIC_eivt(); > + if (offset >= 0) > + setup_APIC_eilvt(offset, 0, APIC_EILVT_MSG_FIX, 1); > +} > + > +static int __init ibs_access_profiling_init(void) > +{ > + if (!boot_cpu_has(X86_FEATURE_IBS)) { > + pr_info("IBS capability is unavailable for access profiling\n"); Probably worth saying that is because the chip doesn't have it! This reads to similar to the perf case above where we just pinched it for other usecases. > + return 0; > + } > + > + ibs_s = alloc_percpu_gfp(struct ibs_sample_pcpu, __GFP_ZERO); > + if (!ibs_s) > + return 0; > + > + INIT_WORK(&ibs_work, ibs_work_handler); > + init_irq_work(&ibs_irq_work, ibs_irq_handler); > + > + /* Uses IBS Op sampling */ > + ibs_config = IBS_OP_CNT_CTL | IBS_OP_ENABLE; > + ibs_caps = cpuid_eax(IBS_CPUID_FEATURES); > + if (ibs_caps & IBS_CAPS_ZEN4) > + ibs_config |= IBS_OP_L3MISSONLY; > + > + register_nmi_handler(NMI_LOCAL, ibs_overflow_handler, 0, "ibs"); > + > + cpuhp_setup_state(CPUHP_AP_PERF_X86_AMD_IBS_STARTING, > + "x86/amd/ibs_access_profile:starting", > + x86_amd_ibs_access_profile_startup, > + x86_amd_ibs_access_profile_teardown); > + > + pr_info("IBS setup for memory access profiling\n"); > + return 0; > +} > + > +arch_initcall(ibs_access_profiling_init);