From: Neeraj Kumar <s.neeraj@samsung.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: linux-cxl@vger.kernel.org, linux-mm@kvack.org,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
linuxarm@huawei.com, tongtiangen@huawei.com,
Yicong Yang <yangyicong@huawei.com>,
Niyas Sait <niyas.sait@huawei.com>,
ajayjoshi@micron.com, Vandana Salve <vsalve@micron.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Dave Jiang <dave.jiang@intel.com>,
Alison Schofield <alison.schofield@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Gregory Price <gourry@gourry.net>,
Huang Ying <ying.huang@intel.com>,
Vishak G <vishak.g@samsung.com>,
Krishna Kanth Reddy <krish.reddy@samsung.com>,
Alok Rathore <alok.rathore@samsung.com>,
gost.dev@samsung.com
Subject: Re: [RFC PATCH 4/4] hwtrace: Document CXL Hotness Monitoring Unit driver
Date: Fri, 3 Jan 2025 10:49:02 +0530 [thread overview]
Message-ID: <1983025922.01735899902414.JavaMail.epsvc@epcpadp1new> (raw)
In-Reply-To: <20241121101845.1815660-5-Jonathan.Cameron@huawei.com>
[-- Attachment #1: Type: text/plain, Size: 8710 bytes --]
On 21/11/24 10:18AM, Jonathan Cameron wrote:
>Add basic documentation to describe the CXL HMU and the
>perf AUX buffer based interfaces.
>
>Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>---
> Documentation/trace/cxl-hmu.rst | 197 ++++++++++++++++++++++++++++++++
> Documentation/trace/index.rst | 1 +
> 2 files changed, 198 insertions(+)
>
>diff --git a/Documentation/trace/cxl-hmu.rst b/Documentation/trace/cxl-hmu.rst
>new file mode 100644
>index 000000000000..f07a50ba608c
>--- /dev/null
>+++ b/Documentation/trace/cxl-hmu.rst
>@@ -0,0 +1,197 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+==================================
>+CXL Hotness Monitoring Unit Driver
>+==================================
>+
>+CXL r3.2 introduced the CXL Hotness Monitoring Unit (CHMU). A CHMU allows
>+software running on a CXL Host to identify hot memory ranges, that is those with
>+higher access frequency relative to other memory ranges.
>+
>+A given Logical Device (presentation of a CXL memory device seen by a particular
>+host) can provide 1 or more CHMU each of which supports 1 or more separately
>+programmable CHMU Instances (CHMUI). These CHMUI are mostly independent with
>+the exception that there can be restrictions on them tracking the same memory
>+regions. The CHMUs are always completely independent.
>+The naming of the units is cxl_hmu_memX.Y.Z where memX matches the naming
>+of the memory device in /sys/bus/cxl/devices/, Y is the CHMU index and
>+Z is the CHMUI index with the CHMU.
>+
>+Each CHMUI provides a ring buffer structure known as the Hot List from which the
>+host an read back entries that describe the hotness of particular region of
>+memory (Hot List Units). The Hot List Unit combines a Unit Address and an access
>+count for the particular address. Unit address to DPA requires multiplication
>+by the unit size. Thus, for large unit sizes the device may support higher
>+counts. It is these Hot List Units that the driver provides via a perf AUX
>+buffer by copying them from PCI BAR space.
>+
>+The unit size at which hotness is measured is configurable for each CHMUI and
>+all measurement is done in Device Physical Address space. To relate this to
>+Host Physical Address space the HDM (Host-Managed Device Memory) decoder
>+configuration must be taken into account to reflect the placement in a
>+CXL Fixed Memory Window and any interleaving.
>+
>+The CHMUI can support interrupts on fills above a watermark, or on overflow
>+of the hotlist.
>+
>+A CHMUI can support two different basic modes of operation. Epoch and
>+Always On. These affect what is placed on the hotlist. Note that the actual
>+implementation of tracking is implementation defined and likely to be
>+inherently imprecise in that the hottest pages may not be discovered due to
>+resource exhaustion and the hotness counts may not represent accurately how
>+hot they are. The specification allows for a very high degree of flexibility
>+in implementation, important as it is likely that a number of different
>+hardware implementations will be chosen to suit particular silicon and accuracy
>+budgets.
>+
>+Operation and configuration
>+===========================
>+
>+An example command line is::
>+
>+ $perf record -a -e cxl_hmu_mem0.0.0/epoch_type=0,access_type=6,\
>+ hotness_threshold=1024,epoch_multiplier=4,epoch_scale=4,range_base=0,\
>+ range_size=1024,randomized_downsampling=0,downsampling_factor=32,\
>+ hotness_granual=12
>+
>+ $perf report --dump-raw-traces
Typo: --dump-raw-trace
>+
>+which will produce a list of hotlist entries, one per line with a short header
>+to provide sufficient information to interpret the entries::
>+
>+ . ... CXL_HMU data: size 33512 bytes
>+ Header 0: units: 29c counter_width 10
>+ Header 1 : deadbeef
>+ 0000000000000283
>+ 0000000000010364
>+ 0000000000020366
>+ 000000000003033c
>+ 0000000000040343
>+ 00000000000502ff
>+ 000000000006030d
>+ 000000000007031a
>+ ...
>+
>+The least significant counter_width bits (here 16, hex 10) are the counter
>+value, all higher bits are the unit index. Multiply by the unit size
>+to get a Device Physical Address.
>+
>+The parameters are as follows:
>+
>+epoch_type
>+----------
>+
>+Two values may be supported::
>+
>+ 0 - Epoch based operation
>+ 1 - Always on operation
>+
>+
>+0. Epoch Based Operation
>+~~~~~~~~~~~~~~~~~~~~~~~~
>+
>+An Epoch is a period of time after which a counter is assessed for hotness.
>+
>+The device may have a global sense of an Epoch but it may also operate them on
>+a per counter, or per region of device basis. This is a function of the
>+implementation and is not controllable, but is discoverable. In a global Epoch
>+scheme at start of each Epoch all counters are zeroed / deallocated. Counters
>+are then allocated in a hardware specific manner and accesses counted. At the
>+completion of the Epoch the counters are compared with a threshold and entries
>+with a count above a configurable threshold are added to the hotlist. A new
>+Epoch is then begun with all counters cleared.
>+
>+In non-global Epoch scheme, when the Epoch of a given counter begins is not
>+specified. An example might be an Epoch for counter only starting on first
>+touch to the relevant memory region. When a local Epoch ends the counter is
>+compared to the threshold and if appropriate added to the hotlist.
>+
>+Note, in Epoch Based Operation, the counter in the hotlist entry provides
>+information on how hot the memory is as the counter for the full Epoch is
>+provided.
>+
>+1. Always on Operation
>+~~~~~~~~~~~~~~~~~~~~~~
>+
>+In this mode, counters may all be reset before enabling the CHMUI. Then
>+counters are allocated to particular memory units via an hardware specific
>+method, perhaps on first touch. When a counter passes the configurable
>+hotness threshold an entry is added to the hotlist and that counter is freed
>+for reuse.
>+
>+In this scheme the count provided in the hotlist entry is not useful as it will
>+depend only on the configured threshold.
>+
>+access_type
>+-----------
>+
>+The parameter controls which access are counted::
>+
>+ 1 - Non-TEE read only
>+ 2 - Non-TEE write only
>+ 3 - Non-TEE read and write
>+ 4 - TEE and Non-TEE read only
>+ 5 - TEE and Non-TEE write only
>+ 6 - TEE and Non-tee read and write
>+
>+
>+TEE here refers to a trusted execution environment, specifically one that
>+results in the T bit being set in the CXL transactions.
>+
>+
>+hotness_granual
>+---------------
>+
>+Unit size at which tracking is performed. Must be at least 256 bytes but
>+hardware may only support some sizes. Expressed as a power of 2. e.g. 12 = 4kiB.
>+
>+hotness_threshold
>+-----------------
>+
>+This is the minimum counter value that must be reached for the unit to count as
>+hot and be added to the hotlist.
>+
>+The possible range may be dependent on the unit size as a larger unit size
>+requires more bits on the hotlist entry leaving fewer available for the hotness
>+counter.
>+
>+epoch_multiplier and epoch_scale
>+--------------------------------
>+
>+The length of an epoch (in epoch mode) is controlled by these two parameters
>+with the decoded epoch_scale multiplied by the epoch_multiplier to give the
>+overall epoch length.
>+
>+epoch_scale::
>+
>+ 1 - 100 usecs
>+ 2 - 1 msec
>+ 3 - 10 msecs
>+ 4 - 100 msecs
>+ 5 - 1 second
>+
>+range_base and range_scale
>+--------------------------
>+
>+Expressed in terms of the unit size set via hotness_granual. Each CHMUI has a
>+bitmap that controls what Device Physical Address spaces is tracked. Each bit
>+represents 256MiB of DPA space.
>+
>+This interface provides a simple base and size in units of 256MiB to configure
>+this bitmap. All bits in the specified range will be set.
>+
>+downsampling_factor
>+-------------------
>+
>+Hardware may be incapable of counting accesses at full speed or it may be
>+desirable to count over a longer period during which the counters would
>+overflow. This control allows selection of a down sampling factor expressed
>+as a power of 2 between 1 and 32768. Default is minimum supported downsampling
>+factor.
>+
>+randomized_downsampling
>+-----------------------
>+
>+To avoid problems with downsampling when accesses are periodic this option
>+allows for an implementation defined randomization of the sampling interval,
>+whilst remaining close to the specified downsampling_factor.
>diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
>index 0b300901fd75..b35ed8e9dfa9 100644
>--- a/Documentation/trace/index.rst
>+++ b/Documentation/trace/index.rst
>@@ -36,3 +36,4 @@ Linux Tracing Technologies
> user_events
> rv/index
> hisi-ptt
>+ cxl-hmu
>--
>2.43.0
>
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
next prev parent reply other threads:[~2025-01-03 10:25 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-21 10:18 [RFC PATCH 0/4] CXL Hotness Monitoring Unit perf driver Jonathan Cameron
2024-11-21 10:18 ` [RFC PATCH 1/4] cxl: Register devices for CXL Hotness Monitoring Units (CHMU) Jonathan Cameron
[not found] ` <CGME20250103052421epcas5p4a1a917ba5d367dfccec91d4522666ca0@epcas5p4.samsung.com>
2025-01-03 5:16 ` Neeraj Kumar
2025-01-03 12:07 ` Jonathan Cameron
2025-06-19 1:47 ` Yuquan Wang
2025-06-19 10:11 ` Jonathan Cameron
2025-08-08 8:45 ` Yuquan Wang
2024-11-21 10:18 ` [RFC PATCH 2/4] cxl: Hotness Monitoring Unit via a Perf AUX Buffer Jonathan Cameron
2024-11-21 10:18 ` [RFC PATCH 3/4] perf: Add support for CXL Hotness Monitoring Units (CHMU) Jonathan Cameron
2024-11-21 10:18 ` [RFC PATCH 4/4] hwtrace: Document CXL Hotness Monitoring Unit driver Jonathan Cameron
[not found] ` <CGME20250103052702epcas5p3f7eea83ac70ba7147e0de7fb30f90a62@epcas5p3.samsung.com>
2025-01-03 5:19 ` Neeraj Kumar [this message]
2024-11-21 13:47 ` [RFC PATCH 0/4] CXL Hotness Monitoring Unit perf driver Jonathan Cameron
2024-11-21 14:24 ` Gregory Price
2024-11-21 14:58 ` Jonathan Cameron
2024-11-21 15:49 ` Gregory Price
2024-11-22 20:08 ` SeongJae Park
2024-11-27 16:34 ` Jonathan Cameron
2024-12-04 12:35 ` [EXT] " Ajay Joshi
[not found] ` <CGME20250103053521epcas5p30cd4abba59d695664335b03ba806c56d@epcas5p3.samsung.com>
2025-01-03 5:27 ` Neeraj Kumar
2025-01-15 13:42 ` Jonathan Cameron
2025-06-19 3:59 ` Yuquan Wang
2025-06-19 10:49 ` Jonathan Cameron
2025-01-24 17:40 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1983025922.01735899902414.JavaMail.epsvc@epcpadp1new \
--to=s.neeraj@samsung.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=acme@kernel.org \
--cc=ajayjoshi@micron.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=alison.schofield@intel.com \
--cc=alok.rathore@samsung.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=gost.dev@samsung.com \
--cc=gourry@gourry.net \
--cc=ira.weiny@intel.com \
--cc=krish.reddy@samsung.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=niyas.sait@huawei.com \
--cc=peterz@infradead.org \
--cc=tongtiangen@huawei.com \
--cc=vishak.g@samsung.com \
--cc=vsalve@micron.com \
--cc=yangyicong@huawei.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox