linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Neeraj Kumar <s.neeraj@samsung.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: linux-cxl@vger.kernel.org, linux-mm@kvack.org,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@huawei.com, tongtiangen@huawei.com,
	Yicong Yang <yangyicong@huawei.com>,
	Niyas Sait <niyas.sait@huawei.com>,
	ajayjoshi@micron.com, Vandana Salve <vsalve@micron.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Dave Jiang <dave.jiang@intel.com>,
	Alison Schofield <alison.schofield@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Gregory Price <gourry@gourry.net>,
	Huang Ying <ying.huang@intel.com>,
	Vishak G <vishak.g@samsung.com>,
	Krishna Kanth Reddy <krish.reddy@samsung.com>,
	Alok Rathore <alok.rathore@samsung.com>,
	gost.dev@samsung.com
Subject: Re: [RFC PATCH 4/4] hwtrace: Document CXL Hotness Monitoring Unit driver
Date: Fri, 3 Jan 2025 10:49:02 +0530	[thread overview]
Message-ID: <1983025922.01735899902414.JavaMail.epsvc@epcpadp1new> (raw)
In-Reply-To: <20241121101845.1815660-5-Jonathan.Cameron@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 8710 bytes --]

On 21/11/24 10:18AM, Jonathan Cameron wrote:
>Add basic documentation to describe the CXL HMU and the
>perf AUX buffer based interfaces.
>
>Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>---
> Documentation/trace/cxl-hmu.rst | 197 ++++++++++++++++++++++++++++++++
> Documentation/trace/index.rst   |   1 +
> 2 files changed, 198 insertions(+)
>
>diff --git a/Documentation/trace/cxl-hmu.rst b/Documentation/trace/cxl-hmu.rst
>new file mode 100644
>index 000000000000..f07a50ba608c
>--- /dev/null
>+++ b/Documentation/trace/cxl-hmu.rst
>@@ -0,0 +1,197 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+==================================
>+CXL Hotness Monitoring Unit Driver
>+==================================
>+
>+CXL r3.2 introduced the CXL Hotness Monitoring Unit (CHMU). A CHMU allows
>+software running on a CXL Host to identify hot memory ranges, that is those with
>+higher access frequency relative to other memory ranges.
>+
>+A given Logical Device (presentation of a CXL memory device seen by a particular
>+host) can provide 1 or more CHMU each of which supports 1 or more separately
>+programmable CHMU Instances (CHMUI). These CHMUI are mostly independent with
>+the exception that there can be restrictions on them tracking the same memory
>+regions. The CHMUs are always completely independent.
>+The naming of the units is cxl_hmu_memX.Y.Z where memX matches the naming
>+of the memory device in /sys/bus/cxl/devices/, Y is the CHMU index and
>+Z is the CHMUI index with the CHMU.
>+
>+Each CHMUI provides a ring buffer structure known as the Hot List from which the
>+host an read back entries that describe the hotness of particular region of
>+memory (Hot List Units). The Hot List Unit combines a Unit Address and an access
>+count for the particular address. Unit address to DPA requires multiplication
>+by the unit size. Thus, for large unit sizes the device may support higher
>+counts. It is these Hot List Units that the driver provides via a perf AUX
>+buffer by copying them from PCI BAR space.
>+
>+The unit size at which hotness is measured is configurable for each CHMUI and
>+all measurement is done in Device Physical Address space. To relate this to
>+Host Physical Address space the HDM (Host-Managed Device Memory) decoder
>+configuration must be taken into account to reflect the placement in a
>+CXL Fixed Memory Window and any interleaving.
>+
>+The CHMUI can support interrupts on fills above a watermark, or on overflow
>+of the hotlist.
>+
>+A CHMUI can support two different basic modes of operation. Epoch and
>+Always On. These affect what is placed on the hotlist. Note that the actual
>+implementation of tracking is implementation defined and likely to be
>+inherently imprecise in that the hottest pages may not be discovered due to
>+resource exhaustion and the hotness counts may not represent accurately how
>+hot they are. The specification allows for a very high degree of flexibility
>+in implementation, important as it is likely that a number of different
>+hardware implementations will be chosen to suit particular silicon and accuracy
>+budgets.
>+
>+Operation and configuration
>+===========================
>+
>+An example command line is::
>+
>+  $perf record -a  -e cxl_hmu_mem0.0.0/epoch_type=0,access_type=6,\
>+  hotness_threshold=1024,epoch_multiplier=4,epoch_scale=4,range_base=0,\
>+  range_size=1024,randomized_downsampling=0,downsampling_factor=32,\
>+  hotness_granual=12
>+
>+  $perf report --dump-raw-traces

Typo: --dump-raw-trace

>+
>+which will produce a list of hotlist entries, one per line with a short header
>+to provide sufficient information to interpret the entries::
>+
>+  . ... CXL_HMU data: size 33512 bytes
>+  Header 0: units: 29c counter_width 10
>+  Header 1 : deadbeef
>+  0000000000000283
>+  0000000000010364
>+  0000000000020366
>+  000000000003033c
>+  0000000000040343
>+  00000000000502ff
>+  000000000006030d
>+  000000000007031a
>+  ...
>+
>+The least significant counter_width bits (here 16, hex 10) are the counter
>+value, all higher bits are the unit index.  Multiply by the unit size
>+to get a Device Physical Address.
>+
>+The parameters are as follows:
>+
>+epoch_type
>+----------
>+
>+Two values may be supported::
>+
>+  0 - Epoch based operation
>+  1 - Always on operation
>+
>+
>+0. Epoch Based Operation
>+~~~~~~~~~~~~~~~~~~~~~~~~
>+
>+An Epoch is a period of time after which a counter is assessed for hotness.
>+
>+The device may have a global sense of an Epoch but it may also operate them on
>+a per counter, or per region of device basis. This is a function of the
>+implementation and is not controllable, but is discoverable. In a global Epoch
>+scheme at start of each Epoch all counters are zeroed / deallocated. Counters
>+are then allocated in a hardware specific manner and accesses counted. At the
>+completion of the Epoch the counters are compared with a threshold and entries
>+with a count above a configurable threshold are added to the hotlist. A new
>+Epoch is then begun with all counters cleared.
>+
>+In non-global Epoch scheme, when the Epoch of a given counter begins is not
>+specified. An example might be an Epoch for counter only starting on first
>+touch to the relevant memory region.  When a local Epoch ends the counter is
>+compared to the threshold and if appropriate added to the hotlist.
>+
>+Note, in Epoch Based Operation, the counter in the hotlist entry provides
>+information on how hot the memory is as the counter for the full Epoch is
>+provided.
>+
>+1. Always on Operation
>+~~~~~~~~~~~~~~~~~~~~~~
>+
>+In this mode, counters may all be reset before enabling the CHMUI. Then
>+counters are allocated to particular memory units via an hardware specific
>+method, perhaps on first touch.  When a counter passes the configurable
>+hotness threshold an entry is added to the hotlist and that counter is freed
>+for reuse.
>+
>+In this scheme the count provided in the hotlist entry is not useful as it will
>+depend only on the configured threshold.
>+
>+access_type
>+-----------
>+
>+The parameter controls which access are counted::
>+
>+  1 - Non-TEE read only
>+  2 - Non-TEE write only
>+  3 - Non-TEE read and write
>+  4 - TEE and Non-TEE read only
>+  5 - TEE and Non-TEE write only
>+  6 - TEE and Non-tee read and write
>+
>+
>+TEE here refers to a trusted execution environment, specifically one that
>+results in the T bit being set in the CXL transactions.
>+
>+
>+hotness_granual
>+---------------
>+
>+Unit size at which tracking is performed.  Must be at least 256 bytes but
>+hardware may only support some sizes. Expressed as a power of 2. e.g. 12 = 4kiB.
>+
>+hotness_threshold
>+-----------------
>+
>+This is the minimum counter value that must be reached for the unit to count as
>+hot and be added to the hotlist.
>+
>+The possible range may be dependent on the unit size as a larger unit size
>+requires more bits on the hotlist entry leaving fewer available for the hotness
>+counter.
>+
>+epoch_multiplier and epoch_scale
>+--------------------------------
>+
>+The length of an epoch (in epoch mode) is controlled by these two parameters
>+with the decoded epoch_scale multiplied by the epoch_multiplier to give the
>+overall epoch length.
>+
>+epoch_scale::
>+
>+  1 - 100 usecs
>+  2 - 1 msec
>+  3 - 10 msecs
>+  4 - 100 msecs
>+  5 - 1 second
>+
>+range_base and range_scale
>+--------------------------
>+
>+Expressed in terms of the unit size set via hotness_granual. Each CHMUI has a
>+bitmap that controls what Device Physical Address spaces is tracked. Each bit
>+represents 256MiB of DPA space.
>+
>+This interface provides a simple base and size in units of 256MiB to configure
>+this bitmap. All bits in the specified range will be set.
>+
>+downsampling_factor
>+-------------------
>+
>+Hardware may be incapable of counting accesses at full speed or it may be
>+desirable to count over a longer period during which the counters would
>+overflow.  This control allows selection of a down sampling factor expressed
>+as a power of 2 between 1 and 32768.  Default is minimum supported downsampling
>+factor.
>+
>+randomized_downsampling
>+-----------------------
>+
>+To avoid problems with downsampling when accesses are periodic this option
>+allows for an implementation defined randomization of the sampling interval,
>+whilst remaining close to the specified downsampling_factor.
>diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
>index 0b300901fd75..b35ed8e9dfa9 100644
>--- a/Documentation/trace/index.rst
>+++ b/Documentation/trace/index.rst
>@@ -36,3 +36,4 @@ Linux Tracing Technologies
>    user_events
>    rv/index
>    hisi-ptt
>+   cxl-hmu
>-- 
>2.43.0
>

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



  parent reply	other threads:[~2025-01-03 10:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-21 10:18 [RFC PATCH 0/4] CXL Hotness Monitoring Unit perf driver Jonathan Cameron
2024-11-21 10:18 ` [RFC PATCH 1/4] cxl: Register devices for CXL Hotness Monitoring Units (CHMU) Jonathan Cameron
     [not found]   ` <CGME20250103052421epcas5p4a1a917ba5d367dfccec91d4522666ca0@epcas5p4.samsung.com>
2025-01-03  5:16     ` Neeraj Kumar
2025-01-03 12:07       ` Jonathan Cameron
2025-06-19  1:47   ` Yuquan Wang
2025-06-19 10:11     ` Jonathan Cameron
2025-08-08  8:45   ` Yuquan Wang
2024-11-21 10:18 ` [RFC PATCH 2/4] cxl: Hotness Monitoring Unit via a Perf AUX Buffer Jonathan Cameron
2024-11-21 10:18 ` [RFC PATCH 3/4] perf: Add support for CXL Hotness Monitoring Units (CHMU) Jonathan Cameron
2024-11-21 10:18 ` [RFC PATCH 4/4] hwtrace: Document CXL Hotness Monitoring Unit driver Jonathan Cameron
     [not found]   ` <CGME20250103052702epcas5p3f7eea83ac70ba7147e0de7fb30f90a62@epcas5p3.samsung.com>
2025-01-03  5:19     ` Neeraj Kumar [this message]
2024-11-21 13:47 ` [RFC PATCH 0/4] CXL Hotness Monitoring Unit perf driver Jonathan Cameron
2024-11-21 14:24 ` Gregory Price
2024-11-21 14:58   ` Jonathan Cameron
2024-11-21 15:49     ` Gregory Price
2024-11-22 20:08     ` SeongJae Park
2024-11-27 16:34 ` Jonathan Cameron
2024-12-04 12:35   ` [EXT] " Ajay Joshi
     [not found]   ` <CGME20250103053521epcas5p30cd4abba59d695664335b03ba806c56d@epcas5p3.samsung.com>
2025-01-03  5:27     ` Neeraj Kumar
2025-01-15 13:42       ` Jonathan Cameron
2025-06-19  3:59         ` Yuquan Wang
2025-06-19 10:49           ` Jonathan Cameron
2025-01-24 17:40 ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1983025922.01735899902414.JavaMail.epsvc@epcpadp1new \
    --to=s.neeraj@samsung.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=acme@kernel.org \
    --cc=ajayjoshi@micron.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alison.schofield@intel.com \
    --cc=alok.rathore@samsung.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=gost.dev@samsung.com \
    --cc=gourry@gourry.net \
    --cc=ira.weiny@intel.com \
    --cc=krish.reddy@samsung.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=niyas.sait@huawei.com \
    --cc=peterz@infradead.org \
    --cc=tongtiangen@huawei.com \
    --cc=vishak.g@samsung.com \
    --cc=vsalve@micron.com \
    --cc=yangyicong@huawei.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox