[PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers
@ 2024-09-11  9:04 shiju.jose
  2024-09-11  9:04 ` [PATCH v12 01/17] EDAC: Add support for EDAC device features control shiju.jose
                   ` (16 more replies)
  0 siblings, 17 replies; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Previously known as "ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers".

Augmenting EDAC for controlling RAS features
============================================
The proposed expansion of EDAC for controlling RAS features and
exposing feature's control attributes to userspace in sysfs.
Some Examples:
 - Scrub control
 - Error Check Scrub (ECS) control
 - ACPI RAS2 features
 - ACPI Address Range Scrubbing (ARS)
 - Post Package Repair (PPR) etc.

High level design is illustrated in the following diagram.

         _______________________________________________
        |   Userspace - Rasdaemon                       |
        |  ____________                                 |
        | | RAS CXL    |       _____________            | 
        | | Err Handler|----->|             |           |
        | |____________|      | RAS Dynamic |           |
        |  ____________       | Scrub, PPR  |           |
        | | RAS Memory |----->| Controller  |           |
        | | Err Handler|      |_____________|           |
        | |____________|           |                    |
        |__________________________|____________________|                              
                                   |
                                   |
    _______________________________|______________________________
   |   Kernel EDAC based SubSystem | for RAS Features Control     |
   | ______________________________|____________________________  |
   || EDAC Core          Sysfs EDAC| Bus                        | |
   ||    __________________________|_______     _____________   | |
   ||   |/sys/bus/edac/devices/<dev>/scrub/|   | EDAC Device |  | |
   ||   |/sys/bus/edac/devices/<dev>/ecs*/ |<->| EDAC MC     |  | |
   ||   |/sys/bus/edac/devices/<dev>/ars/  |   | EDAC Sysfs  |  | |
   ||   |/sys/bus/edac/devices/<dev>/ppr/  |   | EDAC Module |  | |
   ||   |__________________________________|   |_____________|  | |
   ||                               | EDAC Bus                  | |
   ||               Get             |                           | |
   ||    __________ Feature's       |             __________    | |
   ||   |          |Descs  _________|______      |          |   | |
   ||   |EDAC Scrub|<-----| EDAC Device    |---->| EDAC ARS |   | |
   ||   |__________|      | Driver- RAS    |     |__________|   | |
   ||    __________       | Feature Control|      __________    | |
   ||   |          |<-----|________________|---->|          |   | |
   ||   |EDAC ECS  |   Register RAS | Features   | EDAC PPR |   | |
   ||   |__________|                |            |__________|   | |
   ||         ______________________|___________________        | |
   ||_________|_____________|_____________|____________|________| |
   |   _______|____    _____|______   ____|______   ___|_____     |
   |  |            |  | CXL Mem    | |           | | Client  |    |
   |  | ACPI RAS2  |  | Driver PPR,| | ACPI ARS  | | PPR     |    |
   |  | Driver     |  | Scrub,ECS  | | Driver    | | Driver  |    |
   |  |____________|  |____________| |___________| |_________|    |
   |        |              |              |           |           |
   |________|______________|______________|___________|___________|
            |              |              |           |          
     _______|______________|______________|___________|___________
    |     __|______________|_ ____________|___________|_____      |
    |    |                                                  |     |
    |    |            Platform HW and Firmware              |     |
    |    |__________________________________________________|     |
    |_____________________________________________________________|                             

1. EDAC RAS Features components - Create feature specific descriptors.
   for example, EDAC scrub, EDAC ECS, EDAC PPR, EDAC ARS in the above diagram. 
2. EDAC control RAS Feature driver - Get feature's attr descriptors from the 
   EDAC RAS feature component and registers device's RAS features with
   EDAC bus and expose the feature's sysfs attributes under the sysfs
   EDAC bus.
3. RAS dynamic scrub controller - Userspace sample module added in the
   rasdaemon to start scrubbing when excess number of related errors
   are reported in a short span of time.

The added EDAC feature specific components (e.g. EDAC scrub, EDAC ECS,
EDAC PPR etc) do callbacks to  the parent driver (e.g. CXL driver,
ACPI RAS driver etc) for the controls rather than just letting the
caller deal with it because of the following reasons.
1. Enforces a common API across multiple implementations can do that
   via review, but that's not generally gone well in the long run for
   subsystems that have done it (several have later moved to callback
   and feature list based approaches).
2. Gives a path for 'intercepting' in the EDAC feature driver.
   An example for this is that we could intercept PPR repair calls
   and sanity check that the memory in question is offline before
   passing back to the underlying code.  Sure we could rely on doing
   that via some additional calls from the parent driver, but the
   ABI will get messier.
3. (Speculative) we may get in kernel users of some features in the
   long run.

More details of the common RAS features are described in the following
sections.

Memory Scrubbing
================
Increasing DRAM size and cost has made memory subsystem reliability
an important concern. These modules are used where potentially
corrupted data could cause expensive or fatal issues. Memory errors are
one of the top hardware failures that cause server and workload crashes.

Memory scrub is a feature where an ECC engine reads data from
each memory media location, corrects with an ECC if necessary and
writes the corrected data back to the same memory media location.

The memory DIMMs could be scrubbed at a configurable rate to detect
uncorrected memory errors and attempts to recover from detected memory
errors providing the following benefits.
- Proactively scrubbing memory DIMMs reduces the chance of a correctable
  error becoming uncorrectable.
- Once detected, uncorrected errors caught in unallocated memory pages are
  isolated and prevented from being allocated to an application or the OS.
- The probability of software/hardware products encountering memory
  errors is reduced.
Some details of background can be found in Reference [5].

There are 2 types of memory scrubbing,
1. Background (patrol) scrubbing of the RAM whilest the RAM is otherwise
   idle.
2. On-demand scrubbing for a specific address range/region of memory.

There are several types of interfaces to HW memory scrubbers
identified such as ACPI NVDIMM ARS(Address Range Scrub), CXL memory
device patrol scrub, CXL DDR5 ECS, ACPI RAS2 memory scrubbing.

The scrub control varies between different memory scrubbers. To allow
for standard userspace tooling there is a need to present these controls
with a standard ABI.

Introduce generic memory EDAC scrub control which allows user to
control underlying scrubbers in the system via generic sysfs scrub
control interface.

Use case of common scrub control feature
========================================
1. There are several types of interfaces to HW memory scrubbers identified
   such as ACPI NVDIMM ARS(Address Range Scrub), CXL memory device patrol
   scrub, CXL DDR5 ECS, ACPI RAS2 memory scrubbing features and software
   based memory scrubber(discussed in the community Reference [5]).
   Also some scrubbers support controlling (background) patrol scrubbing
   (ACPI RAS2, CXL) and/or on-demand scrubbing(ACPI RAS2, ACPI ARS).
   However the scrub controls varies between memory scrubbers. Thus there
   is a requirement for a standard generic sysfs scrub controls exposed
   to userspace for the seamless control of the HW/SW scrubbers in
   the system by admin/scripts/tools etc.
2. Scrub controls in user space allow the user to disable the scrubbing
   in case disabling of the background patrol scrubbing or changing the
   scrub rate are needed for other purposes such as performance-aware
   operations which requires the background operations to be turned off
   or reduced.
3. Allows to perform on-demand scrubbing for specific address range if
   supported by the scrubber.
4. User space tools controls scrub the memory DIMMs regularly at a
   configurable scrub rate using the sysfs scrub controls discussed help,
   - to detect uncorrectable memory errors early before user accessing memory,
     which helps to recover the detected memory errors.
   - reduces the chance of a correctable error becoming uncorrectable.
5. Policy control for hotplugged memory. There is not necessarily a system
   wide bios or similar in the loop to control the scrub settings on a CXL
   device that wasn't there at boot. What that setting should be is a policy
   decision as we are trading of reliability vs performance - hence it should
   be in control of userspace. As such, 'an' interface is needed. Seems more
   sensible to try and unify it with other similar interfaces than spin
   yet another one.

The draft version of userspace code for dynamic scrub control, based
on frequency of memory errors reported to userspace, is added in
rasdaemon and enabled, tested for CXL device based patrol scrubbing feature
and ACPI RAS2 based scrubbing feature.

https://github.com/shijujose4/rasdaemon/tree/ras_feature_control

ToDO: For PPR, rasdaemon collates records and decides to replace a row if there
are lots of corrected errors, or a single uncorrected error or error record
received with maintenance request flag set as in CXL DRAM error record.

Comparison of scrubbing features
================================
 ................................................................
 .              .   ACPI    . CXL patrol.  CXL ECS  .  ARS      .
 .  Name        .   RAS2    . scrub     .           .           .
 ................................................................
 .              .           .           .           .           .
 . On-demand    . Supported . No        . No        . Supported .
 . Scrubbing    .           .           .           .           .
 .              .           .           .           .           .  
 ................................................................
 .              .           .           .           .           .
 . Background   . Supported . Supported . Supported . No        .
 . scrubbing    .           .           .           .           .
 .              .           .           .           .           .
 ................................................................
 .              .           .           .           .           .
 . Mode of      . Scrub ctrl. per device. per memory.  Unknown  .
 . scrubbing    . per NUMA  .           . media     .           .
 .              . domain.   .           .           .           .
 ................................................................
 .              .           .           .           .           . 
 . Query scrub  . Supported . Supported . Supported . Supported .       
 . capabilities .           .           .           .           .
 .              .           .           .           .           .
 ................................................................
 .              .           .           .           .           . 
 . Setting      . Supported . No        . No        . Supported .       
 . address range.           .           .           .           .
 .              .           .           .           .           .
 ................................................................
 .              .           .           .           .           . 
 . Setting      . Supported . Supported . No        . No        .       
 . scrub rate   .           .           .           .           .
 .              .           .           .           .           .
 ................................................................
 .              .           .           .           .           . 
 . Unit for     . Not       . in hours  . No        . No        .       
 . scrub rate   . Defined   .           .           .           .
 .              .           .           .           .           .
 ................................................................
 .              . Supported .           .           .           .
 . Scrub        . on-demand . No        . No        . Supported .
 . status/      . scrubbing .           .           .           .
 . Completion   . only      .           .           .           .
 ................................................................
 . UC error     .           .CXL general.CXL general. ACPI UCE  .
 . reporting    . Exception .media/DRAM .media/DRAM . notify and.
 .              .           .event/media.event/media. query     .
 .              .           .scan?      .scan?      . ARS status.
 ................................................................
 .              .           .           .           .           .      
 . Clear UC     .  No       . No        .  No       . Supported .
 . error        .           .           .           .           .
 .              .           .           .           .           .  
 ................................................................
 .              .           .           .           .           .
 . Translate    . No        . No        . No        . Supported .
 . *(1)SPA to   .           .           .           .           .
 . *(2)DPA      .           .           .           .           .  
 ................................................................

*(1) - SPA - System Physical Address. See section 9.19.7.8
       Function Index 5 - Translate SPA of ACPI spec r6.5.  
*(2) - DPA - Device Physical Address. See section 9.19.7.8
       Function Index 5 - Translate SPA of ACPI spec r6.5.  

CXL Memory Scrubbing features
=============================
CXL spec r3.1 section 8.2.9.9.11.1 describes the memory device patrol scrub
control feature. The device patrol scrub proactively locates and makes
corrections to errors in regular cycle. The patrol scrub control allows the
request to configure patrol scrubber's input configurations.

The patrol scrub control allows the requester to specify the number of
hours in which the patrol scrub cycles must be completed, provided that
the requested number is not less than the minimum number of hours for the
patrol scrub cycle that the device is capable of. In addition, the patrol
scrub controls allow the host to disable and enable the feature in case
disabling of the feature is needed for other purposes such as
performance-aware operations which require the background operations to be
turned off.

The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
Specification (JESD79-5) and allows the DRAM to internally read, correct
single-bit errors, and write back corrected data bits to the DRAM array
while providing transparency to error counts.

The DDR5 device contains number of memory media FRUs per device. The
DDR5 ECS feature and thus the ECS control driver supports configuring
the ECS parameters per FRU.

ACPI RAS2 Hardware-based Memory Scrubbing
=========================================
ACPI spec 6.5 section 5.2.21 ACPI RAS2 describes ACPI RAS2 table
provides interfaces for platform RAS features and supports independent
RAS controls and capabilities for a given RAS feature for multiple
instances of the same component in a given system.
Memory RAS features apply to RAS capabilities, controls and operations
that are specific to memory. RAS2 PCC sub-spaces for memory-specific RAS
features have a Feature Type of 0x00 (Memory).

The platform can use the hardware-based memory scrubbing feature to expose
controls and capabilities associated with hardware-based memory scrub
engines. The RAS2 memory scrubbing feature supports following as per spec,
 - Independent memory scrubbing controls for each NUMA domain, identified
   using its proximity domain.
   Note: However AmpereComputing has single entry repeated as they have
         centralized controls.
 - Provision for background (patrol) scrubbing of the entire memory system,
   as well as on-demand scrubbing for a specific region of memory.

ACPI Address Range Scrubbing(ARS)
================================
ARS allows the platform to communicate memory errors to system software.
This capability allows system software to prevent accesses to addresses
with uncorrectable errors in memory. ARS functions manage all NVDIMMs
present in the system. Only one scrub can be in progress system wide
at any given time.
Following functions are supported as per the specification.
1. Query ARS Capabilities for a given address range, indicates platform
   supports the ACPI NVDIMM Root Device Unconsumed Error Notification.
2. Start ARS triggers an Address Range Scrub for the given memory range.
   Address scrubbing can be done for volatile memory, persistent memory,
   or both.
3. Query ARS Status command allows software to get the status of ARS,  
   including the progress of ARS and ARS error record.
4. Clear Uncorrectable Error.
5. Translate SPA
6. ARS Error Inject etc.
Note: Support for ARS is not added in this series because to reduce the
line of code for review and could be added after initial code is merged. 
We'd like feedback on whether this is of interest to ARS community?

Post Package Repair(PPR)
========================
PPR (Post Package Repair) maintenance operation requests the memory device
to perform a repair operation on its media if supported. A memory device
may support two types of PPR: Hard PPR (hPPR), for a permanent row repair,
and Soft PPR (sPPR), for a temporary row repair. sPPR is much faster than
hPPR, but the repair is lost with a power cycle. During the execution of a
PPR maintenance operation, a memory device, may or may not retain data and
may or may not be able to process memory requests correctly. sPPR maintenance
operation may be executed at runtime, if data is retained and memory requests
are correctly processed. hPPR maintenance operation may be executed only at
boot because data would not be retained.

Use cases of common PPR control feature
=======================================
1. The Soft PPR (sPPR) and Hard PPR (hPPR) share similar control interfaces,
thus there is a requirement for a standard generic sysfs PPR controls exposed
to userspace for the seamless control of the PPR features in the system by the
admin/scripts/tools etc.
2. When a CXL device identifies a failure on a memory component, the device
may inform the host about the need for a PPR maintenance operation by using
an event record, where the maintenance needed flag is set. The event record
specifies the DPA that should be repaired. Kernel reports the corresponding
cxl general media or DRAM trace event to userspace. The userspace tool,
for reg. rasdaemon initiate a PPR maintenance operation in response to a
device request using the sysfs PPR control.
3. User space tools, for eg. rasdaemon, do request PPR on a memory region
when uncorrected memory error or excess corrected memory errors reported
on that memory.
4. Likely multiple instances of PPR present per memory device.

Series adds,
1. Generic EDAC RAS feature driver, EDAC scrub driver, EDAC ECS driver
   supports memory scrub control, ECS control and PPR control, other
   RAS features in the system.
2. Support for CXL feature mailbox commands, which is used by
   CXL device scrubbing and PPR features. 
3. CXL scrub driver supporting patrol scrub control (device and
   region based).
4. CXL ECS driver supporting ECS control feature.
5. ACPI RAS2 driver adds OS interface for RAS2 communication through
   PCC mailbox and extracts ACPI RAS2 feature table (RAS2) and
   create platform device for the RAS memory features, which binds
   to the memory ACPI RAS2 driver.
7. Memory ACPI RAS2 driver gets the PCC subspace for communicating
   with the ACPI compliant platform supports ACPI RAS2. Add callback
   functions and registers with EDAC scrub to support user to
   control the HW patrol scrubbers exposed to the kernel via the
   ACPI RAS2 table.
8. Support for CXL maintenance mailbox command, which is used by
   CXL device PPR feature.   
9. CXL PPR driver supporting PPR control feature.
10. Note: There are other PPR drivers to come.

Open Questions based on feedbacks from the community:
1. Leo: Standardize unit for scrub rate, for example ACPI RAS2 does not define
   unit for the scrub rate. RAS2 clarification needed. 
2. Jonathan: Any need for discoverability of capability to scan different regions,
   such as global PA space to userspace. Left as future extension.
3. Jiaqi:
   - STOP_PATROL_SCRUBBER from RAS2 must be blocked and, must not be exposed to
     OS/userspace. Stopping patrol scrubber is unacceptable for platform where
     OEM has enabled patrol scrubber, because the patrol scrubber is a key part
     of logging and is repurposed for other RAS actions.
   If the OEM does not want to expose this control, they should lock it down so the
   interface is not exposed to the OS. These features are optional afterall.
   - "Requested Address Range"/"Actual Address Range" (region to scrub) is a
      similarly bad thing to expose in RAS2.
   If the OEM does not want to expose this, they should lock it down so the
   interface is not exposed to the OS. These features are optional afterall.
4. Borislav: 
   - How the scrub control exposed to userspace will be used?
     POC added in rasdaemon with dynamic scrub control for CXL memory media
     errors and memory errors reported to userspace.
     https://github.com/shijujose4/rasdaemon/tree/scrub_control_6_june_2024
   - Is the scrub interface is sufficient for the use cases?
   - Who is going to use scrub controls tools/admin/scripts?
     1) Rasdaemon for dynamic control
     2) Udev script for more static 'defaults' on hotplug etc.
5. PPR   
   - For PPR, rasdaemon collates records and decides to replace a row if there
     are lots of corrected errors, or a single uncorrected error or error record
     received with maintenance request flag set as in CXL DRAM error record.
   - sPPR more or less startup only (so faking hPPR) or actually useful
     in a running system (if not the safe version that keeps everything
     running whilst replacement is ongoing)
   - Is future proofing for multiple PPR units useful given we've mashed
     together hPPR and sPPR for CXL.

Implementation
==============
1. Linux kernel
Latest version of kernel implementation of RAS features control is available in,
https://github.com/shijujose4/linux/tree/edac-enhancement-ras-features

2. QEMU emulation
2.1. The QEMU series to support the CXL specific scrub features is available in,
     https://gitlab.com/qemu-project/qemu.git

2.2. QEMU patch supports the CXL PPR feature is available in,
     https://lore.kernel.org/all/20240730045722.71482-1-dave@stgolabs.net/

3. Userspace rasdaemon
The draft version of userspace code for dynamic scrub control, based on
frequency of memory errors reported to userspace, is added in rasdaemon
and enabled, tested for CXL device based patrol scrubbing feature and
ACPI RAS2 based scrubbing feature.
https://github.com/shijujose4/rasdaemon/tree/ras_feature_control

ToDO: For PPR, rasdaemon collates records and decides to replace a row if there
are lots of corrected errors, or a single uncorrected error or error
record received with maintenance request flag set as in CXL DRAM error
record.

References:
1. ACPI spec r6.5 section 5.2.21 ACPI RAS2.
2. ACPI spec r6.5 section 9.19.7.2 ARS.
3. CXL spec  r3.1 8.2.9.9.11.1 Device patrol scrub control feature
4. CXL spec  r3.1 8.2.9.9.11.2 DDR5 ECS feature
5. CXL spec  r3.1 8.2.9.7.1.1 PPR Maintenance Operations
6. CXL spec  r3.1 8.2.9.7.2.1 sPPR Feature Discovery and Configuration
7. CXL spec  r3.1 8.2.9.7.2.2 hPPR Feature Discovery and Configuration
8. Background information about kernel support for memory scan, memory
   error detection and ACPI RASF.
   https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@google.com/
9. Discussions on RASF:
   https://lore.kernel.org/lkml/20230915172818.761-1-shiju.jose@huawei.com/#r 

Changes
=======
v11 -> v12:
1. Changes and Fixes for feedback from Boris mainly for
    patch "EDAC: Add support for EDAC device features control"
    and other generic comments.

2. Took CXL patches from Dave Jiang for "Add Get Supported Features
   command for kernel usage" and other related patches. Merged helper
   functions from this series to the above patch. Modifications of
   CXL code in this series due to refactoring of CXL mailbox in Dave's
   patches.

3. Modified EDAC scrub control code to support multiple scrub instances
   per device.

v10 -> v11:
1. Feedback from Borislav:
   - Add generic EDAC code for control device features to
     /drivers/edac/edac_device.c.
   - Add common structure in edac for device feature's data.

2. Some more optimizations in generic EDAC code for control
   device features.

3. Changes for feedback from Fan for ACPI RAS2 memory driver.

4. Add support for control memory PPR (Post Package Repair) features
   in EDAC.

5. Add support for maintenance command in the CXL mailbox code,
   which is needed for support PPR features in CXL driver.  

6. Add support for control memory PPR (Post Package Repair) features
   and do perform PPR maintenance operation in CXL driver.

7. Rename drivers/cxl/core/memscrub.c to drivers/cxl/core/memfeature.c

v9 -> v10:
1. Feedback from Mauro Carvalho Chehab:
   - Changes suggested in EDAC RAS feature driver.
     use uppercase for enums, if else to switch-case, documentation for
     static scrub and ecs init functions etc.
   - Changes suggested in EDAC scrub.
     unit of scrub cycle hour to seconds.
     attribute node cycle_in_hours_avaiable to min_cycle_duration and 
     max_cycle_duration.
     attribute node cycle_in_hours to current_cycle_duration.
     Use base 0 for kstrtou64() and kstrtol() functions.
     etc.
   - Changes suggested in EDAC ECS.
     uppercase for enums
     add ABI documentation. etc

2. Feedback from Fan:
   - Changes suggested in EDAC RAS feature driver.
     use uppercase for enums, change if...else to switch-case. 
     some optimization in edac_ras_dev_register() function
     add missing goto free_ctx
   - Changes suggested in the code for feature commands.  
   - CXL driver scrub and ECS code
     use uppercase for enums, fix typo, use enum type for mode
     fix lonf lines etc.

v8 -> v9:
1. Feedback from Borislav:
   - Add scrub control driver to the EDAC on feedback from Borislav.
   - Changed DEVICE_ATTR_..() static.
   - Changed the write permissions for scrub control sysfs files as
     root-only.
2. Feedback from Fan:
   - Optimized cxl_get_feature() function by using min() and removed
     feat_out_min_size.
   - Removed unreached return from cxl_set_feature() function.
   - Changed the term  "rate" to "cycle_in_hours" in all the
     scrub control code.
   - Allow cxl_mem_probe() continue if cxl_mem_patrol_scrub_init() fail,
     with just a debug warning.

3. Feedback from Jonathan:
   - Removed patch __free() based cleanup function for acpi_put_table.
     and added fix in the acpi RAS2 driver.

4. Feedback from Dan Williams:
   - Allow cxl_mem_probe() continue if cxl_mem_patrol_scrub_init() fail,
     with just a debug warning.
   - Add support for CXL region based scrub control.

5. Feedback from Daniel Ferguson on RAS2 drivers:
    In the ACPI RAS2 driver,
  - Incorporated the changes given for clearing error reported.
  - Incorporated the changes given for check the Set RAS Capability
    status and return an appropriate error.
    In the RAS2 memory driver,
  - Added more checks for start/stop bg and on-demand scrubbing
    so that addr range in cache do not get cleared and restrict
    permitted operations during scrubbing.

History for v1 to v8 is avaiable here.
https://lore.kernel.org/lkml/20240726160556.2079-1-shiju.jose@huawei.com/

Dave Jiang (4):
  cxl: Move mailbox related bits to the same context
  cxl: Fix comment regarding cxl_query_cmd() return data
  cxl: Refactor user ioctl command path from mds to mailbox
  cxl: Add Get Supported Features command for kernel usage

Jonathan Cameron (1):
  platform: Add __free() based cleanup function for platform_device_put

Shiju Jose (12):
  EDAC: Add support for EDAC device features control
  EDAC: Add EDAC scrub control driver
  EDAC: Add EDAC ECS control driver
  cxl/mbox: Add GET_FEATURE mailbox command
  cxl/mbox: Add SET_FEATURE mailbox command
  cxl/memfeature: Add CXL memory device patrol scrub control feature
  cxl/memfeature: Add CXL memory device ECS control feature
  ACPI:RAS2: Add ACPI RAS2 driver
  ras: mem: Add memory ACPI RAS2 driver
  EDAC: Add EDAC PPR control driver
  cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command
  cxl/memfeature: Add CXL memory device PPR control feature

 Documentation/ABI/testing/sysfs-edac-ecs   |   78 ++
 Documentation/ABI/testing/sysfs-edac-ppr   |   69 ++
 Documentation/ABI/testing/sysfs-edac-scrub |   69 ++
 Documentation/edac/edac-scrub.rst          |  115 +++
 MAINTAINERS                                |    1 +
 drivers/acpi/Kconfig                       |   10 +
 drivers/acpi/Makefile                      |    1 +
 drivers/acpi/ras2.c                        |  391 +++++++
 drivers/cxl/Kconfig                        |   18 +
 drivers/cxl/core/Makefile                  |    1 +
 drivers/cxl/core/core.h                    |    6 +-
 drivers/cxl/core/mbox.c                    |  485 +++++++--
 drivers/cxl/core/memdev.c                  |   57 +-
 drivers/cxl/core/memfeature.c              | 1084 ++++++++++++++++++++
 drivers/cxl/core/region.c                  |    6 +
 drivers/cxl/cxlmem.h                       |  190 +++-
 drivers/cxl/mem.c                          |    4 +
 drivers/cxl/pci.c                          |   68 +-
 drivers/cxl/pmem.c                         |   10 +-
 drivers/cxl/security.c                     |   18 +-
 drivers/edac/Makefile                      |    1 +
 drivers/edac/edac_device.c                 |  213 ++++
 drivers/edac/edac_ecs.c                    |  376 +++++++
 drivers/edac/edac_ppr.c                    |  255 +++++
 drivers/edac/edac_scrub.c                  |  377 +++++++
 drivers/ras/Kconfig                        |   10 +
 drivers/ras/Makefile                       |    1 +
 drivers/ras/acpi_ras2.c                    |  412 ++++++++
 include/acpi/ras2_acpi.h                   |   60 ++
 include/linux/cxl/mailbox.h                |   68 ++
 include/linux/edac.h                       |  148 +++
 include/linux/platform_device.h            |    1 +
 include/uapi/linux/cxl_mem.h               |    1 +
 tools/testing/cxl/test/mem.c               |   27 +-
 34 files changed, 4442 insertions(+), 189 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-edac-ecs
 create mode 100644 Documentation/ABI/testing/sysfs-edac-ppr
 create mode 100644 Documentation/ABI/testing/sysfs-edac-scrub
 create mode 100644 Documentation/edac/edac-scrub.rst
 create mode 100755 drivers/acpi/ras2.c
 create mode 100644 drivers/cxl/core/memfeature.c
 create mode 100755 drivers/edac/edac_ecs.c
 create mode 100755 drivers/edac/edac_ppr.c
 create mode 100755 drivers/edac/edac_scrub.c
 create mode 100644 drivers/ras/acpi_ras2.c
 create mode 100644 include/acpi/ras2_acpi.h
 create mode 100644 include/linux/cxl/mailbox.h

-- 
2.34.1

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 01/17] EDAC: Add support for EDAC device features control
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-13 16:40   ` Borislav Petkov
  2024-09-11  9:04 ` [PATCH v12 02/17] EDAC: Add EDAC scrub control driver shiju.jose
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add generic EDAC device features control supports registering
RAS features supported in the system. Driver exposes features
control attributes to userspace in
/sys/bus/edac/devices/<dev-name>/<ras-feature>/

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/edac/edac_device.c | 202 +++++++++++++++++++++++++++++++++++++
 include/linux/edac.h       |  55 ++++++++++
 2 files changed, 257 insertions(+)

diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
index 621dc2a5d034..e4a5d010ea2d 100644
--- a/drivers/edac/edac_device.c
+++ b/drivers/edac/edac_device.c
@@ -570,3 +570,205 @@ void edac_device_handle_ue_count(struct edac_device_ctl_info *edac_dev,
 		      block ? block->name : "N/A", count, msg);
 }
 EXPORT_SYMBOL_GPL(edac_device_handle_ue_count);
+
+/* EDAC device feature */
+static void edac_dev_release(struct device *dev)
+{
+	struct edac_dev_feat_ctx *ctx = container_of(dev, struct edac_dev_feat_ctx, dev);
+
+	kfree(ctx->ppr);
+	kfree(ctx->scrub);
+	kfree(ctx->dev.groups);
+	kfree(ctx);
+}
+
+const struct device_type edac_dev_type = {
+	.name = "edac_dev",
+	.release = edac_dev_release,
+};
+
+static void edac_dev_unreg(void *data)
+{
+	device_unregister(data);
+}
+
+/**
+ * edac_dev_feature_init - Init a RAS feature
+ * @parent: client device.
+ * @dev_data: pointer to the edac_dev_data structure, which contains
+ * client device specific info.
+ * @feat: pointer to struct edac_dev_feature.
+ * @attr_groups: pointer to attribute group's container.
+ *
+ * Returns number of scrub features attribute groups on success,
+ * error otherwise.
+ */
+static int edac_dev_feat_init(struct device *parent,
+			      struct edac_dev_data *dev_data,
+			      const struct edac_dev_feature *ras_feat,
+			      const struct attribute_group **attr_groups)
+{
+	int num;
+
+	switch (ras_feat->ft_type) {
+	case RAS_FEAT_SCRUB:
+		dev_data->scrub_ops = ras_feat->scrub_ops;
+		dev_data->private = ras_feat->ctx;
+		return 1;
+	case RAS_FEAT_ECS:
+		num = ras_feat->ecs_info.num_media_frus;
+		dev_data->ecs_ops = ras_feat->ecs_ops;
+		dev_data->private = ras_feat->ctx;
+		return num;
+	case RAS_FEAT_PPR:
+		dev_data->ppr_ops = ras_feat->ppr_ops;
+		dev_data->private = ras_feat->ctx;
+		return 1;
+	default:
+		return -EINVAL;
+	}
+}
+
+/**
+ * edac_dev_register - register device for RAS features with EDAC
+ * @parent: client device.
+ * @name: client device's name.
+ * @private: parent driver's data to store in the context if any.
+ * @num_features: number of RAS features to register.
+ * @ras_features: list of RAS features to register.
+ *
+ * Returns 0 on success, error otherwise.
+ * The new edac_dev_feat_ctx would be freed automatically.
+ */
+int edac_dev_register(struct device *parent, char *name,
+		      void *private, int num_features,
+		      const struct edac_dev_feature *ras_features)
+{
+	const struct attribute_group **ras_attr_groups;
+	struct edac_dev_data *dev_data;
+	struct edac_dev_feat_ctx *ctx;
+	int scrub_cnt = 0, scrub_inst = 0;
+	int ppr_cnt = 0, ppr_inst = 0;
+	int attr_gcnt = 0;
+	int ret, feat;
+
+	if (!parent || !name || !num_features || !ras_features)
+		return -EINVAL;
+
+	/* Double parse to make space for attributes */
+	for (feat = 0; feat < num_features; feat++) {
+		switch (ras_features[feat].ft_type) {
+		case RAS_FEAT_SCRUB:
+			attr_gcnt++;
+			scrub_cnt++;
+			break;
+		case RAS_FEAT_PPR:
+			attr_gcnt++;
+			ppr_cnt++;
+			break;
+		case RAS_FEAT_ECS:
+			attr_gcnt += ras_features[feat].ecs_info.num_media_frus;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	ctx->dev.parent = parent;
+	ctx->private = private;
+
+	ras_attr_groups = kcalloc(attr_gcnt + 1, sizeof(*ras_attr_groups), GFP_KERNEL);
+	if (!ras_attr_groups) {
+		ret = -ENOMEM;
+		goto ctx_free;
+	}
+
+	if (scrub_cnt) {
+		ctx->scrub = kcalloc(scrub_cnt, sizeof(*(ctx->scrub)), GFP_KERNEL);
+		if (!ctx->scrub) {
+			ret = -ENOMEM;
+			goto groups_free;
+		}
+	}
+
+	if (ppr_cnt) {
+		ctx->ppr = kcalloc(ppr_cnt, sizeof(*(ctx->ppr)), GFP_KERNEL);
+		if (!ctx->ppr) {
+			ret = -ENOMEM;
+			goto groups_free;
+		}
+	}
+
+	attr_gcnt = 0;
+	for (feat = 0; feat < num_features; feat++, ras_features++) {
+		switch (ras_features->ft_type) {
+		case RAS_FEAT_SCRUB:
+			if (!ras_features->scrub_ops)
+				continue;
+			if (scrub_inst != ras_features->instance)
+				goto data_mem_free;
+			dev_data = &ctx->scrub[scrub_inst];
+			dev_data->instance = scrub_inst;
+			scrub_inst++;
+			break;
+		case RAS_FEAT_ECS:
+			if (!ras_features->ecs_ops)
+				continue;
+			dev_data = &ctx->ecs;
+			break;
+		case RAS_FEAT_PPR:
+			if (!ras_features->ppr_ops)
+				continue;
+			if (ppr_inst != ras_features->instance)
+				goto data_mem_free;
+			dev_data = &ctx->ppr[ppr_inst];
+			dev_data->instance = ppr_inst;
+			ppr_inst++;
+			break;
+		default:
+			ret = -EINVAL;
+			goto data_mem_free;
+		}
+		ret = edac_dev_feat_init(parent, dev_data, ras_features,
+					 &ras_attr_groups[attr_gcnt]);
+		if (ret < 0)
+			goto data_mem_free;
+
+		attr_gcnt += ret;
+	}
+
+	ras_attr_groups[attr_gcnt] = NULL;
+	ctx->dev.bus = edac_get_sysfs_subsys();
+	ctx->dev.type = &edac_dev_type;
+	ctx->dev.groups = ras_attr_groups;
+	dev_set_drvdata(&ctx->dev, ctx);
+
+	ret = dev_set_name(&ctx->dev, name);
+	if (ret)
+		goto data_mem_free;
+
+	ret = device_register(&ctx->dev);
+	if (ret) {
+		put_device(&ctx->dev);
+		goto data_mem_free;
+		return ret;
+	}
+
+	return devm_add_action_or_reset(parent, edac_dev_unreg, &ctx->dev);
+
+data_mem_free:
+	if (ppr_cnt)
+		kfree(ctx->ppr);
+	if (scrub_cnt)
+		kfree(ctx->scrub);
+groups_free:
+	kfree(ras_attr_groups);
+ctx_free:
+	kfree(ctx);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(edac_dev_register);
diff --git a/include/linux/edac.h b/include/linux/edac.h
index b4ee8961e623..b337254cf5b8 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -661,4 +661,59 @@ static inline struct dimm_info *edac_get_dimm(struct mem_ctl_info *mci,
 
 	return mci->dimms[index];
 }
+
+/* EDAC device features */
+
+#define EDAC_FEAT_NAME_LEN	128
+
+/* RAS feature type */
+enum edac_dev_feat {
+	RAS_FEAT_SCRUB,
+	RAS_FEAT_ECS,
+	RAS_FEAT_PPR,
+	RAS_FEAT_MAX
+};
+
+struct edac_ecs_ex_info {
+	u16 num_media_frus;
+};
+
+/*
+ * EDAC device feature information structure
+ */
+struct edac_dev_data {
+	union {
+		const struct edac_scrub_ops *scrub_ops;
+		const struct edac_ecs_ops *ecs_ops;
+		const struct edac_ppr_ops *ppr_ops;
+	};
+	u8 instance;
+	void *private;
+};
+
+struct device;
+
+struct edac_dev_feat_ctx {
+	struct device dev;
+	void *private;
+	struct edac_dev_data *scrub;
+	struct edac_dev_data ecs;
+	struct edac_dev_data *ppr;
+};
+
+struct edac_dev_feature {
+	enum edac_dev_feat ft_type;
+	u8 instance;
+	union {
+		const struct edac_scrub_ops *scrub_ops;
+		const struct edac_ecs_ops *ecs_ops;
+		const struct edac_ppr_ops *ppr_ops;
+	};
+	void *ctx;
+	struct edac_ecs_ex_info ecs_info;
+};
+
+int edac_dev_register(struct device *parent, char *dev_name,
+		      void *parent_pvt_data, int num_features,
+		      const struct edac_dev_feature *ras_features);
 #endif /* _LINUX_EDAC_H_ */
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 02/17] EDAC: Add EDAC scrub control driver
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
  2024-09-11  9:04 ` [PATCH v12 01/17] EDAC: Add support for EDAC device features control shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-13 17:25   ` Borislav Petkov
  2024-09-26 23:04   ` Fan Ni
  2024-09-11  9:04 ` [PATCH v12 03/17] EDAC: Add EDAC ECS " shiju.jose
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add generic EDAC scrub control driver supports configuring the memory scrubbers
in the system. The device with scrub feature, get the scrub descriptor from the
EDAC scrub and registers with the EDAC RAS feature driver, which adds the sysfs
scrub control interface. The scrub control attributes for a scrub instance are
available to userspace in /sys/bus/edac/devices/<dev-name>/scrub*/.

Generic EDAC scrub driver and the common sysfs scrub interface promotes
unambiguous access from the userspace irrespective of the underlying scrub
devices.

The sysfs scrub attribute nodes would be present only if the client driver
has implemented the corresponding attribute callback function and pass in ops
to the EDAC RAS feature driver during registration.

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/ABI/testing/sysfs-edac-scrub |  69 ++++
 drivers/edac/Makefile                      |   1 +
 drivers/edac/edac_device.c                 |   6 +-
 drivers/edac/edac_scrub.c                  | 377 +++++++++++++++++++++
 include/linux/edac.h                       |  30 ++
 5 files changed, 482 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-edac-scrub
 create mode 100755 drivers/edac/edac_scrub.c

diff --git a/Documentation/ABI/testing/sysfs-edac-scrub b/Documentation/ABI/testing/sysfs-edac-scrub
new file mode 100644
index 000000000000..f465cc91423f
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-edac-scrub
@@ -0,0 +1,69 @@
+What:		/sys/bus/edac/devices/<dev-name>/scrub*
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		The sysfs EDAC bus devices /<dev-name>/scrub* subdirectory
+		belongs to an instance of memory scrub control feature,
+		where <dev-name> directory corresponds to a device/memory
+		region registered with the EDAC scrub driver and thus
+		registered with the generic EDAC RAS driver.
+		The sysfs scrub attr nodes would be present only if the
+		client driver has implemented the corresponding attr
+		callback function and pass in ops to the EDAC RAS feature
+		driver during registration.
+
+What:		/sys/bus/edac/devices/<dev-name>/scrub*/addr_range_base
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) The base of the address range of the memory region
+		to be scrubbed (on-demand scrubbing).
+
+What:		/sys/bus/edac/devices/<dev-name>/scrub*/addr_range_size
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) The size of the address range of the memory region
+		to be scrubbed (on-demand scrubbing).
+
+What:		/sys/bus/edac/devices/<dev-name>/scrub*/enable_background
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Start/Stop background(patrol) scrubbing if supported.
+
+What:		/sys/bus/edac/devices/<dev-name>/scrub*/enable_on_demand
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Start/Stop on-demand scrubbing the memory region
+		if supported.
+
+What:		/sys/bus/edac/devices/<dev-name>/scrub*/min_cycle_duration
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) Supported minimum scrub cycle duration in seconds
+		by the memory scrubber.
+
+What:		/sys/bus/edac/devices/<dev-name>/scrub*/max_cycle_duration
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) Supported maximum scrub cycle duration in seconds
+		by the memory scrubber.
+
+What:		/sys/bus/edac/devices/<dev-name>/scrub*/current_cycle_duration
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) The current scrub cycle duration in seconds and must be
+		within the supported range by the memory scrubber.
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 4edfb83ffbee..fbf0e39ec678 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
 
 edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
 edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
+edac_core-y	+= edac_scrub.o
 
 edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
 
diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
index e4a5d010ea2d..6381896b6424 100644
--- a/drivers/edac/edac_device.c
+++ b/drivers/edac/edac_device.c
@@ -608,12 +608,16 @@ static int edac_dev_feat_init(struct device *parent,
 			      const struct edac_dev_feature *ras_feat,
 			      const struct attribute_group **attr_groups)
 {
-	int num;
+	int num, ret;
 
 	switch (ras_feat->ft_type) {
 	case RAS_FEAT_SCRUB:
 		dev_data->scrub_ops = ras_feat->scrub_ops;
 		dev_data->private = ras_feat->ctx;
+		ret = edac_scrub_get_desc(parent, attr_groups,
+					  ras_feat->instance);
+		if (ret)
+			return ret;
 		return 1;
 	case RAS_FEAT_ECS:
 		num = ras_feat->ecs_info.num_media_frus;
diff --git a/drivers/edac/edac_scrub.c b/drivers/edac/edac_scrub.c
new file mode 100755
index 000000000000..3f8f37629acf
--- /dev/null
+++ b/drivers/edac/edac_scrub.c
@@ -0,0 +1,377 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Generic EDAC scrub driver supports controlling the memory
+ * scrubbers in the system and the common sysfs scrub interface
+ * promotes unambiguous access from the userspace.
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ */
+
+#define pr_fmt(fmt)     "EDAC SCRUB: " fmt
+
+#include <linux/edac.h>
+
+enum edac_scrub_attributes {
+	SCRUB_ADDR_RANGE_BASE,
+	SCRUB_ADDR_RANGE_SIZE,
+	SCRUB_ENABLE_BACKGROUND,
+	SCRUB_ENABLE_ON_DEMAND,
+	SCRUB_MIN_CYCLE_DURATION,
+	SCRUB_MAX_CYCLE_DURATION,
+	SCRUB_CURRENT_CYCLE_DURATION,
+	SCRUB_MAX_ATTRS
+};
+
+struct edac_scrub_dev_attr {
+	struct device_attribute dev_attr;
+	u8 instance;
+};
+
+struct edac_scrub_context {
+	char name[EDAC_FEAT_NAME_LEN];
+	struct edac_scrub_dev_attr scrub_dev_attr[SCRUB_MAX_ATTRS];
+	struct attribute *scrub_attrs[SCRUB_MAX_ATTRS + 1];
+	struct attribute_group group;
+};
+
+#define to_scrub_dev_attr(_dev_attr)      \
+		container_of(_dev_attr, struct edac_scrub_dev_attr, dev_attr)
+
+static ssize_t addr_range_base_show(struct device *ras_feat_dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	u64 base, size;
+	int ret;
+
+	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "0x%llx\n", base);
+}
+
+static ssize_t addr_range_size_show(struct device *ras_feat_dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	u64 base, size;
+	int ret;
+
+	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "0x%llx\n", size);
+}
+
+static ssize_t addr_range_base_store(struct device *ras_feat_dev,
+				     struct device_attribute *attr,
+				     const char *buf, size_t len)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	u64 base, size;
+	int ret;
+
+	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
+	if (ret)
+		return ret;
+
+	ret = kstrtou64(buf, 0, &base);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->write_range(ras_feat_dev->parent, ctx->scrub[inst].private, base, size);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t addr_range_size_store(struct device *ras_feat_dev,
+				     struct device_attribute *attr,
+				     const char *buf,
+				     size_t len)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	u64 base, size;
+	int ret;
+
+	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
+	if (ret)
+		return ret;
+
+	ret = kstrtou64(buf, 0, &size);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->write_range(ras_feat_dev->parent, ctx->scrub[inst].private, base, size);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t enable_background_store(struct device *ras_feat_dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t len)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	bool enable;
+	int ret;
+
+	ret = kstrtobool(buf, &enable);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->set_enabled_bg(ras_feat_dev->parent, ctx->scrub[inst].private, enable);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t enable_background_show(struct device *ras_feat_dev,
+				      struct device_attribute *attr, char *buf)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	bool enable;
+	int ret;
+
+	ret = ops->get_enabled_bg(ras_feat_dev->parent, ctx->scrub[inst].private, &enable);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%d\n", enable);
+}
+
+static ssize_t enable_on_demand_show(struct device *ras_feat_dev,
+				     struct device_attribute *attr, char *buf)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	bool enable;
+	int ret;
+
+	ret = ops->get_enabled_od(ras_feat_dev->parent, ctx->scrub[inst].private, &enable);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%d\n", enable);
+}
+
+static ssize_t enable_on_demand_store(struct device *ras_feat_dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t len)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	bool enable;
+	int ret;
+
+	ret = kstrtobool(buf, &enable);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->set_enabled_od(ras_feat_dev->parent, ctx->scrub[inst].private, enable);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t min_cycle_duration_show(struct device *ras_feat_dev,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	u32 val;
+	int ret;
+
+	ret = ops->min_cycle_read(ras_feat_dev->parent, ctx->scrub[inst].private, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t max_cycle_duration_show(struct device *ras_feat_dev,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	u32 val;
+	int ret;
+
+	ret = ops->max_cycle_read(ras_feat_dev->parent, ctx->scrub[inst].private, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t current_cycle_duration_show(struct device *ras_feat_dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	u32 val;
+	int ret;
+
+	ret = ops->cycle_duration_read(ras_feat_dev->parent, ctx->scrub[inst].private, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t current_cycle_duration_store(struct device *ras_feat_dev,
+					    struct device_attribute *attr,
+					    const char *buf, size_t len)
+{
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+	long val;
+	int ret;
+
+	ret = kstrtol(buf, 0, &val);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->cycle_duration_write(ras_feat_dev->parent, ctx->scrub[inst].private, val);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static umode_t scrub_attr_visible(struct kobject *kobj,
+				  struct attribute *a, int attr_id)
+{
+	struct device *ras_feat_dev = kobj_to_dev(kobj);
+	struct device_attribute *dev_attr =
+				container_of(a, struct device_attribute, attr);
+	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(dev_attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
+
+	switch (attr_id) {
+	case SCRUB_ADDR_RANGE_BASE:
+	case SCRUB_ADDR_RANGE_SIZE:
+		if (ops->read_range && ops->write_range)
+			return a->mode;
+		if (ops->read_range)
+			return 0444;
+		return 0;
+	case SCRUB_ENABLE_BACKGROUND:
+		if (ops->get_enabled_bg && ops->set_enabled_bg)
+			return a->mode;
+		if (ops->get_enabled_bg)
+			return 0444;
+		return 0;
+	case SCRUB_ENABLE_ON_DEMAND:
+		if (ops->get_enabled_od && ops->set_enabled_od)
+			return a->mode;
+		if (ops->get_enabled_od)
+			return 0444;
+		return 0;
+	case SCRUB_MIN_CYCLE_DURATION:
+		return ops->min_cycle_read ? a->mode : 0;
+	case SCRUB_MAX_CYCLE_DURATION:
+		return ops->max_cycle_read ? a->mode : 0;
+	case SCRUB_CURRENT_CYCLE_DURATION:
+		if (ops->cycle_duration_read && ops->cycle_duration_write)
+			return a->mode;
+		if (ops->cycle_duration_read)
+			return 0444;
+		return 0;
+	default:
+		return 0;
+	}
+}
+
+#define EDAC_SCRUB_ATTR_RO(_name, _instance)       \
+	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_RO(_name), \
+				     .instance = _instance })
+
+#define EDAC_SCRUB_ATTR_WO(_name, _instance)       \
+	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_WO(_name), \
+				     .instance = _instance })
+
+#define EDAC_SCRUB_ATTR_RW(_name, _instance)       \
+	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_RW(_name), \
+				     .instance = _instance })
+
+static int scrub_create_desc(struct device *scrub_dev,
+			     const struct attribute_group **attr_groups,
+			     u8 instance)
+{
+	struct edac_scrub_context *scrub_ctx;
+	struct attribute_group *group;
+	int i;
+
+	scrub_ctx = devm_kzalloc(scrub_dev, sizeof(*scrub_ctx), GFP_KERNEL);
+	if (!scrub_ctx)
+		return -ENOMEM;
+
+	group = &scrub_ctx->group;
+	scrub_ctx->scrub_dev_attr[0] = EDAC_SCRUB_ATTR_RW(addr_range_base, instance);
+	scrub_ctx->scrub_dev_attr[1] = EDAC_SCRUB_ATTR_RW(addr_range_size, instance);
+	scrub_ctx->scrub_dev_attr[2] = EDAC_SCRUB_ATTR_RW(enable_background, instance);
+	scrub_ctx->scrub_dev_attr[3] = EDAC_SCRUB_ATTR_RW(enable_on_demand, instance);
+	scrub_ctx->scrub_dev_attr[4] = EDAC_SCRUB_ATTR_RO(min_cycle_duration, instance);
+	scrub_ctx->scrub_dev_attr[5] = EDAC_SCRUB_ATTR_RO(max_cycle_duration, instance);
+	scrub_ctx->scrub_dev_attr[6] = EDAC_SCRUB_ATTR_RW(current_cycle_duration, instance);
+	for (i = 0; i < SCRUB_MAX_ATTRS; i++)
+		scrub_ctx->scrub_attrs[i] = &scrub_ctx->scrub_dev_attr[i].dev_attr.attr;
+
+	sprintf(scrub_ctx->name, "%s%d", "scrub", instance);
+	group->name = scrub_ctx->name;
+	group->attrs = scrub_ctx->scrub_attrs;
+	group->is_visible  = scrub_attr_visible;
+
+	attr_groups[0] = group;
+
+	return 0;
+}
+
+/**
+ * edac_scrub_get_desc - get EDAC scrub descriptors
+ * @scrub_dev: client device, with scrub support
+ * @attr_groups: pointer to attrribute group container
+ * @instance: device's scrub instance number.
+ *
+ * Returns 0 on success, error otherwise.
+ */
+int edac_scrub_get_desc(struct device *scrub_dev,
+			const struct attribute_group **attr_groups,
+			u8 instance)
+{
+	if (!scrub_dev || !attr_groups)
+		return -EINVAL;
+
+	return scrub_create_desc(scrub_dev, attr_groups, instance);
+}
diff --git a/include/linux/edac.h b/include/linux/edac.h
index b337254cf5b8..aae8262b9863 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -674,6 +674,36 @@ enum edac_dev_feat {
 	RAS_FEAT_MAX
 };
 
+/**
+ * struct scrub_ops - scrub device operations (all elements optional)
+ * @read_range: read base and offset of scrubbing range.
+ * @write_range: set the base and offset of the scrubbing range.
+ * @get_enabled_bg: check if currently performing background scrub.
+ * @set_enabled_bg: start or stop a bg-scrub.
+ * @get_enabled_od: check if currently performing on-demand scrub.
+ * @set_enabled_od: start or stop an on-demand scrub.
+ * @min_cycle_read: minimum supported scrub cycle duration in seconds.
+ * @max_cycle_read: maximum supported scrub cycle duration in seconds.
+ * @cycle_duration_read: get the scrub cycle duration in seconds.
+ * @cycle_duration_write: set the scrub cycle duration in seconds.
+ */
+struct edac_scrub_ops {
+	int (*read_range)(struct device *dev, void *drv_data, u64 *base, u64 *size);
+	int (*write_range)(struct device *dev, void *drv_data, u64 base, u64 size);
+	int (*get_enabled_bg)(struct device *dev, void *drv_data, bool *enable);
+	int (*set_enabled_bg)(struct device *dev, void *drv_data, bool enable);
+	int (*get_enabled_od)(struct device *dev, void *drv_data, bool *enable);
+	int (*set_enabled_od)(struct device *dev, void *drv_data, bool enable);
+	int (*min_cycle_read)(struct device *dev, void *drv_data,  u32 *min);
+	int (*max_cycle_read)(struct device *dev, void *drv_data,  u32 *max);
+	int (*cycle_duration_read)(struct device *dev, void *drv_data, u32 *cycle);
+	int (*cycle_duration_write)(struct device *dev, void *drv_data, u32 cycle);
+};
+
+int edac_scrub_get_desc(struct device *scrub_dev,
+			const struct attribute_group **attr_groups,
+			u8 instance);
+
 struct edac_ecs_ex_info {
 	u16 num_media_frus;
 };
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 03/17] EDAC: Add EDAC ECS control driver
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
  2024-09-11  9:04 ` [PATCH v12 01/17] EDAC: Add support for EDAC device features control shiju.jose
  2024-09-11  9:04 ` [PATCH v12 02/17] EDAC: Add EDAC scrub control driver shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-27 16:28   ` Fan Ni
  2024-09-11  9:04 ` [PATCH v12 04/17] cxl: Move mailbox related bits to the same context shiju.jose
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add EDAC ECS (Error Check Scrub) control driver supports configuring
the memory device's ECS feature.

The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
Specification (JESD79-5) and allows the DRAM to internally read, correct
single-bit errors, and write back corrected data bits to the DRAM array
while providing transparency to error counts.

The DDR5 device contains number of memory media FRUs per device. The
DDR5 ECS feature and thus the ECS control driver supports configuring
the ECS parameters per FRU.

The memory devices support ECS feature register with the EDAC ECS driver
and thus with the generic EDAC RAS feature driver, which adds the sysfs
ECS control interface. The ECS control attributes are exposed to
userspace in /sys/bus/edac/devices/<dev-name>/ecs_fruX/.

Generic EDAC ECS driver and the common sysfs ECS interface promotes
unambiguous control from the userspace irrespective of the underlying
devices, support ECS feature.

The support for ECS feature is added separately because the DDR5 ECS
features control attributes are dissimilar from those of the scrub
feature.

The sysfs ECS attr nodes would be present only if the client driver
has implemented the corresponding attr callback function and pass
in ops to the EDAC RAS feature driver during registration.

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/ABI/testing/sysfs-edac-ecs |  78 +++++
 drivers/edac/Makefile                    |   2 +-
 drivers/edac/edac_device.c               |   3 +
 drivers/edac/edac_ecs.c                  | 376 +++++++++++++++++++++++
 include/linux/edac.h                     |  33 ++
 5 files changed, 491 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-edac-ecs
 create mode 100755 drivers/edac/edac_ecs.c

diff --git a/Documentation/ABI/testing/sysfs-edac-ecs b/Documentation/ABI/testing/sysfs-edac-ecs
new file mode 100644
index 000000000000..1eb35acd4e5e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-edac-ecs
@@ -0,0 +1,78 @@
+What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		The sysfs EDAC bus devices /<dev-name>/ecs_fru* subdirectory
+		belongs to the memory media ECS (Error Check Scrub) control
+		feature, where <dev-name> directory corresponds to a device
+		registered with the EDAC ECS driver and thus registered with
+		the generic EDAC RAS driver too.
+		/ecs_fru* belongs to the media FRUs (Field replaceable unit)
+		under the memory device.
+		The sysfs ECS attr nodes would be present only if the client
+		driver has implemented the corresponding attr callback
+		function and pass in ops to the EDAC RAS feature driver
+		during registration.
+
+What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/log_entry_type
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) The log entry type of how the DDR5 ECS log is reported.
+		00b - per DRAM.
+		01b - per memory media FRU.
+
+What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/log_entry_type_per_dram
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) True if current log entry type is per DRAM.
+
+What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/log_entry_type_per_memory_media
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) True if current log entry type is per memory media FRU.
+
+What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/mode
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) The mode of how the DDR5 ECS counts the errors.
+		0 - ECS counts rows with errors.
+		1 - ECS counts codewords with errors.
+
+What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/mode_counts_rows
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) True if current mode is ECS counts rows with errors.
+
+What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/mode_counts_codewords
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) True if current mode is ECS counts codewords with errors.
+
+What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/reset
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(WO) ECS reset ECC counter.
+		0 - normal, ECC counter running actively.
+		1 - reset ECC counter to the default value.
+
+What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/threshold
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) ECS threshold count per GB of memory cells.
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index fbf0e39ec678..62115eff6a9a 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
 
 edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
 edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
-edac_core-y	+= edac_scrub.o
+edac_core-y	+= edac_scrub.o edac_ecs.o
 
 edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
 
diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
index 6381896b6424..9cac9ae75080 100644
--- a/drivers/edac/edac_device.c
+++ b/drivers/edac/edac_device.c
@@ -623,6 +623,9 @@ static int edac_dev_feat_init(struct device *parent,
 		num = ras_feat->ecs_info.num_media_frus;
 		dev_data->ecs_ops = ras_feat->ecs_ops;
 		dev_data->private = ras_feat->ctx;
+		ret = edac_ecs_get_desc(parent, attr_groups, num);
+		if (ret)
+			return ret;
 		return num;
 	case RAS_FEAT_PPR:
 		dev_data->ppr_ops = ras_feat->ppr_ops;
diff --git a/drivers/edac/edac_ecs.c b/drivers/edac/edac_ecs.c
new file mode 100755
index 000000000000..50915ab1e769
--- /dev/null
+++ b/drivers/edac/edac_ecs.c
@@ -0,0 +1,376 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ECS driver supporting controlling on die error check scrub
+ * (e.g. DDR5 ECS). The common sysfs ECS interface promotes
+ * unambiguous access from the userspace.
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ */
+
+#define pr_fmt(fmt)     "EDAC ECS: " fmt
+
+#include <linux/edac.h>
+
+#define EDAC_ECS_FRU_NAME "ecs_fru"
+
+enum edac_ecs_attributes {
+	ECS_LOG_ENTRY_TYPE,
+	ECS_LOG_ENTRY_TYPE_PER_DRAM,
+	ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA,
+	ECS_MODE,
+	ECS_MODE_COUNTS_ROWS,
+	ECS_MODE_COUNTS_CODEWORDS,
+	ECS_RESET,
+	ECS_THRESHOLD,
+	ECS_MAX_ATTRS
+};
+
+struct edac_ecs_dev_attr {
+	struct device_attribute dev_attr;
+	int fru_id;
+};
+
+struct edac_ecs_fru_context {
+	char name[EDAC_FEAT_NAME_LEN];
+	struct edac_ecs_dev_attr ecs_dev_attr[ECS_MAX_ATTRS];
+	struct attribute *ecs_attrs[ECS_MAX_ATTRS + 1];
+	struct attribute_group group;
+};
+
+struct edac_ecs_context {
+	u16 num_media_frus;
+	struct edac_ecs_fru_context *fru_ctxs;
+};
+
+#define to_ecs_dev_attr(_dev_attr)	\
+	container_of(_dev_attr, struct edac_ecs_dev_attr, dev_attr)
+
+static ssize_t log_entry_type_show(struct device *ras_feat_dev,
+				   struct device_attribute *attr,
+				   char *buf)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	u32 val;
+	int ret;
+
+	ret = ops->get_log_entry_type(ras_feat_dev->parent, ctx->ecs.private,
+				      ecs_dev_attr->fru_id, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t log_entry_type_store(struct device *ras_feat_dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t len)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	long val;
+	int ret;
+
+	ret = kstrtol(buf, 0, &val);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->set_log_entry_type(ras_feat_dev->parent, ctx->ecs.private,
+				      ecs_dev_attr->fru_id, val);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t log_entry_type_per_dram_show(struct device *ras_feat_dev,
+					    struct device_attribute *attr,
+					    char *buf)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	u32 val;
+	int ret;
+
+	ret = ops->get_log_entry_type_per_dram(ras_feat_dev->parent, ctx->ecs.private,
+					       ecs_dev_attr->fru_id, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t log_entry_type_per_memory_media_show(struct device *ras_feat_dev,
+						    struct device_attribute *attr,
+						    char *buf)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	u32 val;
+	int ret;
+
+	ret = ops->get_log_entry_type_per_memory_media(ras_feat_dev->parent,
+						       ctx->ecs.private,
+						       ecs_dev_attr->fru_id, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t mode_show(struct device *ras_feat_dev,
+			 struct device_attribute *attr,
+			 char *buf)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	u32 val;
+	int ret;
+
+	ret = ops->get_mode(ras_feat_dev->parent, ctx->ecs.private,
+			    ecs_dev_attr->fru_id, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t mode_store(struct device *ras_feat_dev,
+			  struct device_attribute *attr,
+			  const char *buf, size_t len)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	long val;
+	int ret;
+
+	ret = kstrtol(buf, 0, &val);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->set_mode(ras_feat_dev->parent, ctx->ecs.private,
+			    ecs_dev_attr->fru_id, val);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t mode_counts_rows_show(struct device *ras_feat_dev,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	u32 val;
+	int ret;
+
+	ret = ops->get_mode_counts_rows(ras_feat_dev->parent, ctx->ecs.private,
+					ecs_dev_attr->fru_id, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t mode_counts_codewords_show(struct device *ras_feat_dev,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	u32 val;
+	int ret;
+
+	ret = ops->get_mode_counts_codewords(ras_feat_dev->parent, ctx->ecs.private,
+					     ecs_dev_attr->fru_id, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t reset_store(struct device *ras_feat_dev,
+			   struct device_attribute *attr,
+			   const char *buf, size_t len)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	long val;
+	int ret;
+
+	ret = kstrtol(buf, 0, &val);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->reset(ras_feat_dev->parent, ctx->ecs.private,
+			 ecs_dev_attr->fru_id, val);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t threshold_show(struct device *ras_feat_dev,
+			      struct device_attribute *attr, char *buf)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	int ret;
+	u32 val;
+
+	ret = ops->get_threshold(ras_feat_dev->parent, ctx->ecs.private,
+				 ecs_dev_attr->fru_id, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t threshold_store(struct device *ras_feat_dev,
+			       struct device_attribute *attr,
+			       const char *buf, size_t len)
+{
+	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+	long val;
+	int ret;
+
+	ret = kstrtol(buf, 0, &val);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->set_threshold(ras_feat_dev->parent, ctx->ecs.private,
+				 ecs_dev_attr->fru_id, val);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static umode_t ecs_attr_visible(struct kobject *kobj,
+				struct attribute *a, int attr_id)
+{
+	struct device *ras_feat_dev = kobj_to_dev(kobj);
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
+
+	switch (attr_id) {
+	case ECS_LOG_ENTRY_TYPE:
+		if (ops->get_log_entry_type && ops->set_log_entry_type)
+			return a->mode;
+		if (ops->get_log_entry_type)
+			return 0444;
+		return 0;
+	case ECS_LOG_ENTRY_TYPE_PER_DRAM:
+		return ops->get_log_entry_type_per_dram ? a->mode : 0;
+	case ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA:
+		return ops->get_log_entry_type_per_memory_media ? a->mode : 0;
+	case ECS_MODE:
+		if (ops->get_mode && ops->set_mode)
+			return a->mode;
+		if (ops->get_mode)
+			return 0444;
+		return 0;
+	case ECS_MODE_COUNTS_ROWS:
+		return ops->get_mode_counts_rows ? a->mode : 0;
+	case ECS_MODE_COUNTS_CODEWORDS:
+		return ops->get_mode_counts_codewords ? a->mode : 0;
+	case ECS_RESET:
+		return ops->reset ? a->mode : 0;
+	case ECS_THRESHOLD:
+		if (ops->get_threshold && ops->set_threshold)
+			return a->mode;
+		if (ops->get_threshold)
+			return 0444;
+		return 0;
+	default:
+		return 0;
+	}
+}
+
+#define EDAC_ECS_ATTR_RO(_name, _fru_id)       \
+	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RO(_name), \
+				     .fru_id = _fru_id })
+
+#define EDAC_ECS_ATTR_WO(_name, _fru_id)       \
+	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_WO(_name), \
+				     .fru_id = _fru_id })
+
+#define EDAC_ECS_ATTR_RW(_name, _fru_id)       \
+	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RW(_name), \
+				     .fru_id = _fru_id })
+
+static int ecs_create_desc(struct device *ecs_dev,
+			   const struct attribute_group **attr_groups,
+			   u16 num_media_frus)
+{
+	struct edac_ecs_context *ecs_ctx;
+	u32 fru;
+
+	ecs_ctx = devm_kzalloc(ecs_dev, sizeof(*ecs_ctx), GFP_KERNEL);
+	if (!ecs_ctx)
+		return -ENOMEM;
+
+	ecs_ctx->num_media_frus = num_media_frus;
+	ecs_ctx->fru_ctxs = devm_kcalloc(ecs_dev, num_media_frus,
+					 sizeof(*ecs_ctx->fru_ctxs),
+					 GFP_KERNEL);
+	if (!ecs_ctx->fru_ctxs)
+		return -ENOMEM;
+
+	for (fru = 0; fru < num_media_frus; fru++) {
+		struct edac_ecs_fru_context *fru_ctx = &ecs_ctx->fru_ctxs[fru];
+		struct attribute_group *group = &fru_ctx->group;
+		int i;
+
+		fru_ctx->ecs_dev_attr[0] = EDAC_ECS_ATTR_RW(log_entry_type, fru);
+		fru_ctx->ecs_dev_attr[1] = EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru);
+		fru_ctx->ecs_dev_attr[2] = EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru);
+		fru_ctx->ecs_dev_attr[3] = EDAC_ECS_ATTR_RW(mode, fru);
+		fru_ctx->ecs_dev_attr[4] = EDAC_ECS_ATTR_RO(mode_counts_rows, fru);
+		fru_ctx->ecs_dev_attr[5] = EDAC_ECS_ATTR_RO(mode_counts_codewords, fru);
+		fru_ctx->ecs_dev_attr[6] = EDAC_ECS_ATTR_WO(reset, fru);
+		fru_ctx->ecs_dev_attr[7] = EDAC_ECS_ATTR_RW(threshold, fru);
+		for (i = 0; i < ECS_MAX_ATTRS; i++)
+			fru_ctx->ecs_attrs[i] = &fru_ctx->ecs_dev_attr[i].dev_attr.attr;
+
+		sprintf(fru_ctx->name, "%s%d", EDAC_ECS_FRU_NAME, fru);
+		group->name = fru_ctx->name;
+		group->attrs = fru_ctx->ecs_attrs;
+		group->is_visible  = ecs_attr_visible;
+
+		attr_groups[fru] = group;
+	}
+
+	return 0;
+}
+
+/**
+ * edac_ecs_get_desc - get EDAC ECS descriptors
+ * @ecs_dev: client device, supports ECS feature
+ * @attr_groups: pointer to attrribute group container
+ * @num_media_frus: number of media FRUs in the device
+ *
+ * Returns 0 on success, error otherwise.
+ */
+int edac_ecs_get_desc(struct device *ecs_dev,
+		      const struct attribute_group **attr_groups,
+		      u16 num_media_frus)
+{
+	if (!ecs_dev || !attr_groups || !num_media_frus)
+		return -EINVAL;
+
+	return ecs_create_desc(ecs_dev, attr_groups, num_media_frus);
+}
diff --git a/include/linux/edac.h b/include/linux/edac.h
index aae8262b9863..90cb90cf5272 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -704,10 +704,43 @@ int edac_scrub_get_desc(struct device *scrub_dev,
 			const struct attribute_group **attr_groups,
 			u8 instance);
 
+/**
+ * struct ecs_ops - ECS device operations (all elements optional)
+ * @get_log_entry_type: read the log entry type value.
+ * @set_log_entry_type: set the log entry type value.
+ * @get_log_entry_type_per_dram: read the log entry type per dram value.
+ * @get_log_entry_type_memory_media: read the log entry type per memory media value.
+ * @get_mode: read the mode value.
+ * @set_mode: set the mode value.
+ * @get_mode_counts_rows: read the mode counts rows value.
+ * @get_mode_counts_codewords: read the mode counts codewords value.
+ * @reset: reset the ECS counter.
+ * @get_threshold: read the threshold value.
+ * @set_threshold: set the threshold value.
+ */
+struct edac_ecs_ops {
+	int (*get_log_entry_type)(struct device *dev, void *drv_data, int fru_id, u32 *val);
+	int (*set_log_entry_type)(struct device *dev, void *drv_data, int fru_id, u32 val);
+	int (*get_log_entry_type_per_dram)(struct device *dev, void *drv_data,
+					   int fru_id, u32 *val);
+	int (*get_log_entry_type_per_memory_media)(struct device *dev, void *drv_data,
+						   int fru_id, u32 *val);
+	int (*get_mode)(struct device *dev, void *drv_data, int fru_id, u32 *val);
+	int (*set_mode)(struct device *dev, void *drv_data, int fru_id, u32 val);
+	int (*get_mode_counts_rows)(struct device *dev, void *drv_data, int fru_id, u32 *val);
+	int (*get_mode_counts_codewords)(struct device *dev, void *drv_data, int fru_id, u32 *val);
+	int (*reset)(struct device *dev, void *drv_data, int fru_id, u32 val);
+	int (*get_threshold)(struct device *dev, void *drv_data, int fru_id, u32 *threshold);
+	int (*set_threshold)(struct device *dev, void *drv_data, int fru_id, u32 threshold);
+};
+
 struct edac_ecs_ex_info {
 	u16 num_media_frus;
 };
 
+int edac_ecs_get_desc(struct device *ecs_dev,
+		      const struct attribute_group **attr_groups,
+		      u16 num_media_frus);
 /*
  * EDAC device feature information structure
  */
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 04/17] cxl: Move mailbox related bits to the same context
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (2 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 03/17] EDAC: Add EDAC ECS " shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-11 17:20   ` Dave Jiang
  2024-09-11  9:04 ` [PATCH v12 05/17] cxl: Fix comment regarding cxl_query_cmd() return data shiju.jose
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Dave Jiang <dave.jiang@intel.com>

Create a new 'struct cxl_mailbox' and move all mailbox related bits to
it. This allows isolation of all CXL mailbox data in order to export
some of the calls to external caller (fwctl) and avoid exporting of
CXL driver specific bits such has device states.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 MAINTAINERS                  |  1 +
 drivers/cxl/core/mbox.c      | 48 ++++++++++++++++++---------
 drivers/cxl/core/memdev.c    | 18 +++++++----
 drivers/cxl/cxlmem.h         | 49 ++--------------------------
 drivers/cxl/pci.c            | 58 +++++++++++++++++++--------------
 drivers/cxl/pmem.c           |  4 ++-
 include/linux/cxl/mailbox.h  | 63 ++++++++++++++++++++++++++++++++++++
 tools/testing/cxl/test/mem.c | 27 ++++++++++------
 8 files changed, 163 insertions(+), 105 deletions(-)
 create mode 100644 include/linux/cxl/mailbox.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 878dcd23b331..227c2b214f00 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5619,6 +5619,7 @@ F:	Documentation/driver-api/cxl
 F:	drivers/cxl/
 F:	include/linux/einj-cxl.h
 F:	include/linux/cxl-event.h
+F:	include/linux/cxl/
 F:	include/uapi/linux/cxl_mem.h
 F:	tools/testing/cxl/
 
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index e5cdeafdf76e..216937ef9e07 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -244,16 +244,17 @@ static const char *cxl_mem_opcode_to_name(u16 opcode)
 int cxl_internal_send_cmd(struct cxl_memdev_state *mds,
 			  struct cxl_mbox_cmd *mbox_cmd)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	size_t out_size, min_out;
 	int rc;
 
-	if (mbox_cmd->size_in > mds->payload_size ||
-	    mbox_cmd->size_out > mds->payload_size)
+	if (mbox_cmd->size_in > cxl_mbox->payload_size ||
+	    mbox_cmd->size_out > cxl_mbox->payload_size)
 		return -E2BIG;
 
 	out_size = mbox_cmd->size_out;
 	min_out = mbox_cmd->min_out;
-	rc = mds->mbox_send(mds, mbox_cmd);
+	rc = cxl_mbox->mbox_send(cxl_mbox, mbox_cmd);
 	/*
 	 * EIO is reserved for a payload size mismatch and mbox_send()
 	 * may not return this error.
@@ -353,6 +354,7 @@ static int cxl_mbox_cmd_ctor(struct cxl_mbox_cmd *mbox,
 			     struct cxl_memdev_state *mds, u16 opcode,
 			     size_t in_size, size_t out_size, u64 in_payload)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	*mbox = (struct cxl_mbox_cmd) {
 		.opcode = opcode,
 		.size_in = in_size,
@@ -374,7 +376,7 @@ static int cxl_mbox_cmd_ctor(struct cxl_mbox_cmd *mbox,
 
 	/* Prepare to handle a full payload for variable sized output */
 	if (out_size == CXL_VARIABLE_PAYLOAD)
-		mbox->size_out = mds->payload_size;
+		mbox->size_out = cxl_mbox->payload_size;
 	else
 		mbox->size_out = out_size;
 
@@ -398,6 +400,8 @@ static int cxl_to_mem_cmd_raw(struct cxl_mem_command *mem_cmd,
 			      const struct cxl_send_command *send_cmd,
 			      struct cxl_memdev_state *mds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+
 	if (send_cmd->raw.rsvd)
 		return -EINVAL;
 
@@ -406,7 +410,7 @@ static int cxl_to_mem_cmd_raw(struct cxl_mem_command *mem_cmd,
 	 * gets passed along without further checking, so it must be
 	 * validated here.
 	 */
-	if (send_cmd->out.size > mds->payload_size)
+	if (send_cmd->out.size > cxl_mbox->payload_size)
 		return -EINVAL;
 
 	if (!cxl_mem_raw_command_allowed(send_cmd->raw.opcode))
@@ -494,6 +498,7 @@ static int cxl_validate_cmd_from_user(struct cxl_mbox_cmd *mbox_cmd,
 				      struct cxl_memdev_state *mds,
 				      const struct cxl_send_command *send_cmd)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mem_command mem_cmd;
 	int rc;
 
@@ -505,7 +510,7 @@ static int cxl_validate_cmd_from_user(struct cxl_mbox_cmd *mbox_cmd,
 	 * supports, but output can be arbitrarily large (simply write out as
 	 * much data as the hardware provides).
 	 */
-	if (send_cmd->in.size > mds->payload_size)
+	if (send_cmd->in.size > cxl_mbox->payload_size)
 		return -EINVAL;
 
 	/* Sanitize and construct a cxl_mem_command */
@@ -591,6 +596,7 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev_state *mds,
 					u64 out_payload, s32 *size_out,
 					u32 *retval)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct device *dev = mds->cxlds.dev;
 	int rc;
 
@@ -601,7 +607,7 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev_state *mds,
 		cxl_mem_opcode_to_name(mbox_cmd->opcode),
 		mbox_cmd->opcode, mbox_cmd->size_in);
 
-	rc = mds->mbox_send(mds, mbox_cmd);
+	rc = cxl_mbox->mbox_send(cxl_mbox, mbox_cmd);
 	if (rc)
 		goto out;
 
@@ -659,11 +665,12 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s)
 static int cxl_xfer_log(struct cxl_memdev_state *mds, uuid_t *uuid,
 			u32 *size, u8 *out)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	u32 remaining = *size;
 	u32 offset = 0;
 
 	while (remaining) {
-		u32 xfer_size = min_t(u32, remaining, mds->payload_size);
+		u32 xfer_size = min_t(u32, remaining, cxl_mbox->payload_size);
 		struct cxl_mbox_cmd mbox_cmd;
 		struct cxl_mbox_get_log log;
 		int rc;
@@ -752,17 +759,18 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel)
 
 static struct cxl_mbox_get_supported_logs *cxl_get_gsl(struct cxl_memdev_state *mds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_get_supported_logs *ret;
 	struct cxl_mbox_cmd mbox_cmd;
 	int rc;
 
-	ret = kvmalloc(mds->payload_size, GFP_KERNEL);
+	ret = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
 	if (!ret)
 		return ERR_PTR(-ENOMEM);
 
 	mbox_cmd = (struct cxl_mbox_cmd) {
 		.opcode = CXL_MBOX_OP_GET_SUPPORTED_LOGS,
-		.size_out = mds->payload_size,
+		.size_out = cxl_mbox->payload_size,
 		.payload_out = ret,
 		/* At least the record number field must be valid */
 		.min_out = 2,
@@ -910,6 +918,7 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
 				  enum cxl_event_log_type log,
 				  struct cxl_get_event_payload *get_pl)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_clear_event_payload *payload;
 	u16 total = le16_to_cpu(get_pl->record_count);
 	u8 max_handles = CXL_CLEAR_EVENT_MAX_HANDLES;
@@ -920,8 +929,8 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
 	int i;
 
 	/* Payload size may limit the max handles */
-	if (pl_size > mds->payload_size) {
-		max_handles = (mds->payload_size - sizeof(*payload)) /
+	if (pl_size > cxl_mbox->payload_size) {
+		max_handles = (cxl_mbox->payload_size - sizeof(*payload)) /
 			      sizeof(__le16);
 		pl_size = struct_size(payload, handles, max_handles);
 	}
@@ -979,6 +988,7 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
 static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
 				    enum cxl_event_log_type type)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
 	struct device *dev = mds->cxlds.dev;
 	struct cxl_get_event_payload *payload;
@@ -995,7 +1005,7 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
 			.payload_in = &log_type,
 			.size_in = sizeof(log_type),
 			.payload_out = payload,
-			.size_out = mds->payload_size,
+			.size_out = cxl_mbox->payload_size,
 			.min_out = struct_size(payload, records, 0),
 		};
 
@@ -1328,6 +1338,7 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		       struct cxl_region *cxlr)
 {
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_poison_out *po;
 	struct cxl_mbox_poison_in pi;
 	int nr_records = 0;
@@ -1346,7 +1357,7 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 			.opcode = CXL_MBOX_OP_GET_POISON,
 			.size_in = sizeof(pi),
 			.payload_in = &pi,
-			.size_out = mds->payload_size,
+			.size_out = cxl_mbox->payload_size,
 			.payload_out = po,
 			.min_out = struct_size(po, record, 0),
 		};
@@ -1382,7 +1393,9 @@ static void free_poison_buf(void *buf)
 /* Get Poison List output buffer is protected by mds->poison.lock */
 static int cxl_poison_alloc_buf(struct cxl_memdev_state *mds)
 {
-	mds->poison.list_out = kvmalloc(mds->payload_size, GFP_KERNEL);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+
+	mds->poison.list_out = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
 	if (!mds->poison.list_out)
 		return -ENOMEM;
 
@@ -1411,6 +1424,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL);
 struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
 {
 	struct cxl_memdev_state *mds;
+	struct cxl_mailbox *cxl_mbox;
 
 	mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
 	if (!mds) {
@@ -1418,7 +1432,9 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
 		return ERR_PTR(-ENOMEM);
 	}
 
-	mutex_init(&mds->mbox_mutex);
+	cxl_mbox = &mds->cxlds.cxl_mbox;
+	mutex_init(&cxl_mbox->mbox_mutex);
+
 	mutex_init(&mds->event.log_lock);
 	mds->cxlds.dev = dev;
 	mds->cxlds.reg_map.host = dev;
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 0277726afd04..05bb84cb1274 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -58,7 +58,7 @@ static ssize_t payload_max_show(struct device *dev,
 
 	if (!mds)
 		return sysfs_emit(buf, "\n");
-	return sysfs_emit(buf, "%zu\n", mds->payload_size);
+	return sysfs_emit(buf, "%zu\n", cxlds->cxl_mbox.payload_size);
 }
 static DEVICE_ATTR_RO(payload_max);
 
@@ -124,15 +124,16 @@ static ssize_t security_state_show(struct device *dev,
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
 	unsigned long state = mds->security.state;
 	int rc = 0;
 
 	/* sync with latest submission state */
-	mutex_lock(&mds->mbox_mutex);
+	mutex_lock(&cxl_mbox->mbox_mutex);
 	if (mds->security.sanitize_active)
 		rc = sysfs_emit(buf, "sanitize\n");
-	mutex_unlock(&mds->mbox_mutex);
+	mutex_unlock(&cxl_mbox->mbox_mutex);
 	if (rc)
 		return rc;
 
@@ -829,12 +830,13 @@ static enum fw_upload_err cxl_fw_prepare(struct fw_upload *fwl, const u8 *data,
 {
 	struct cxl_memdev_state *mds = fwl->dd_handle;
 	struct cxl_mbox_transfer_fw *transfer;
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 
 	if (!size)
 		return FW_UPLOAD_ERR_INVALID_SIZE;
 
 	mds->fw.oneshot = struct_size(transfer, data, size) <
-			    mds->payload_size;
+			    cxl_mbox->payload_size;
 
 	if (cxl_mem_get_fw_info(mds))
 		return FW_UPLOAD_ERR_HW_ERROR;
@@ -854,6 +856,7 @@ static enum fw_upload_err cxl_fw_write(struct fw_upload *fwl, const u8 *data,
 {
 	struct cxl_memdev_state *mds = fwl->dd_handle;
 	struct cxl_dev_state *cxlds = &mds->cxlds;
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
 	struct cxl_memdev *cxlmd = cxlds->cxlmd;
 	struct cxl_mbox_transfer_fw *transfer;
 	struct cxl_mbox_cmd mbox_cmd;
@@ -877,7 +880,7 @@ static enum fw_upload_err cxl_fw_write(struct fw_upload *fwl, const u8 *data,
 	 * sizeof(*transfer) is 128.  These constraints imply that @cur_size
 	 * will always be 128b aligned.
 	 */
-	cur_size = min_t(size_t, size, mds->payload_size - sizeof(*transfer));
+	cur_size = min_t(size_t, size, cxl_mbox->payload_size - sizeof(*transfer));
 
 	remaining = size - cur_size;
 	size_in = struct_size(transfer, data, cur_size);
@@ -1059,16 +1062,17 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, CXL);
 static void sanitize_teardown_notifier(void *data)
 {
 	struct cxl_memdev_state *mds = data;
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct kernfs_node *state;
 
 	/*
 	 * Prevent new irq triggered invocations of the workqueue and
 	 * flush inflight invocations.
 	 */
-	mutex_lock(&mds->mbox_mutex);
+	mutex_lock(&cxl_mbox->mbox_mutex);
 	state = mds->security.sanitize_node;
 	mds->security.sanitize_node = NULL;
-	mutex_unlock(&mds->mbox_mutex);
+	mutex_unlock(&cxl_mbox->mbox_mutex);
 
 	cancel_delayed_work_sync(&mds->security.poll_dwork);
 	sysfs_put(state);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index afb53d058d62..19609b708b09 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -8,6 +8,7 @@
 #include <linux/rcuwait.h>
 #include <linux/cxl-event.h>
 #include <linux/node.h>
+#include <linux/cxl/mailbox.h>
 #include "cxl.h"
 
 /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -105,42 +106,6 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
 	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
 }
 
-/**
- * struct cxl_mbox_cmd - A command to be submitted to hardware.
- * @opcode: (input) The command set and command submitted to hardware.
- * @payload_in: (input) Pointer to the input payload.
- * @payload_out: (output) Pointer to the output payload. Must be allocated by
- *		 the caller.
- * @size_in: (input) Number of bytes to load from @payload_in.
- * @size_out: (input) Max number of bytes loaded into @payload_out.
- *            (output) Number of bytes generated by the device. For fixed size
- *            outputs commands this is always expected to be deterministic. For
- *            variable sized output commands, it tells the exact number of bytes
- *            written.
- * @min_out: (input) internal command output payload size validation
- * @poll_count: (input) Number of timeouts to attempt.
- * @poll_interval_ms: (input) Time between mailbox background command polling
- *                    interval timeouts.
- * @return_code: (output) Error code returned from hardware.
- *
- * This is the primary mechanism used to send commands to the hardware.
- * All the fields except @payload_* correspond exactly to the fields described in
- * Command Register section of the CXL 2.0 8.2.8.4.5. @payload_in and
- * @payload_out are written to, and read from the Command Payload Registers
- * defined in CXL 2.0 8.2.8.4.8.
- */
-struct cxl_mbox_cmd {
-	u16 opcode;
-	void *payload_in;
-	void *payload_out;
-	size_t size_in;
-	size_t size_out;
-	size_t min_out;
-	int poll_count;
-	int poll_interval_ms;
-	u16 return_code;
-};
-
 /*
  * Per CXL 3.0 Section 8.2.8.4.5.1
  */
@@ -438,6 +403,7 @@ struct cxl_dev_state {
 	struct resource ram_res;
 	u64 serial;
 	enum cxl_devtype type;
+	struct cxl_mailbox cxl_mbox;
 };
 
 /**
@@ -448,11 +414,8 @@ struct cxl_dev_state {
  * the functionality related to that like Identify Memory Device and Get
  * Partition Info
  * @cxlds: Core driver state common across Type-2 and Type-3 devices
- * @payload_size: Size of space for payload
- *                (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
  * @lsa_size: Size of Label Storage Area
  *                (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
- * @mbox_mutex: Mutex to synchronize mailbox access.
  * @firmware_version: Firmware version for the memory device.
  * @enabled_cmds: Hardware commands found enabled in CEL.
  * @exclusive_cmds: Commands that are kernel-internal only
@@ -470,17 +433,13 @@ struct cxl_dev_state {
  * @poison: poison driver state info
  * @security: security driver state info
  * @fw: firmware upload / activation state
- * @mbox_wait: RCU wait for mbox send completely
- * @mbox_send: @dev specific transport for transmitting mailbox commands
  *
  * See CXL 3.0 8.2.9.8.2 Capacity Configuration and Label Storage for
  * details on capacity parameters.
  */
 struct cxl_memdev_state {
 	struct cxl_dev_state cxlds;
-	size_t payload_size;
 	size_t lsa_size;
-	struct mutex mbox_mutex; /* Protects device mailbox and firmware */
 	char firmware_version[0x10];
 	DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
 	DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
@@ -500,10 +459,6 @@ struct cxl_memdev_state {
 	struct cxl_poison_state poison;
 	struct cxl_security_state security;
 	struct cxl_fw_state fw;
-
-	struct rcuwait mbox_wait;
-	int (*mbox_send)(struct cxl_memdev_state *mds,
-			 struct cxl_mbox_cmd *cmd);
 };
 
 static inline struct cxl_memdev_state *
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 4be35dc22202..faf6f5a49368 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -124,6 +124,7 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
 	u16 opcode;
 	struct cxl_dev_id *dev_id = id;
 	struct cxl_dev_state *cxlds = dev_id->cxlds;
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
 
 	if (!cxl_mbox_background_complete(cxlds))
@@ -132,13 +133,13 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
 	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
 	opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
 	if (opcode == CXL_MBOX_OP_SANITIZE) {
-		mutex_lock(&mds->mbox_mutex);
+		mutex_lock(&cxl_mbox->mbox_mutex);
 		if (mds->security.sanitize_node)
 			mod_delayed_work(system_wq, &mds->security.poll_dwork, 0);
-		mutex_unlock(&mds->mbox_mutex);
+		mutex_unlock(&cxl_mbox->mbox_mutex);
 	} else {
 		/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
-		rcuwait_wake_up(&mds->mbox_wait);
+		rcuwait_wake_up(&cxl_mbox->mbox_wait);
 	}
 
 	return IRQ_HANDLED;
@@ -152,8 +153,9 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
 	struct cxl_memdev_state *mds =
 		container_of(work, typeof(*mds), security.poll_dwork.work);
 	struct cxl_dev_state *cxlds = &mds->cxlds;
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
 
-	mutex_lock(&mds->mbox_mutex);
+	mutex_lock(&cxl_mbox->mbox_mutex);
 	if (cxl_mbox_background_complete(cxlds)) {
 		mds->security.poll_tmo_secs = 0;
 		if (mds->security.sanitize_node)
@@ -167,7 +169,7 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
 		mds->security.poll_tmo_secs = min(15 * 60, timeout);
 		schedule_delayed_work(&mds->security.poll_dwork, timeout * HZ);
 	}
-	mutex_unlock(&mds->mbox_mutex);
+	mutex_unlock(&cxl_mbox->mbox_mutex);
 }
 
 /**
@@ -192,17 +194,20 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
  * not need to coordinate with each other. The driver only uses the primary
  * mailbox.
  */
-static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
+static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
 				   struct cxl_mbox_cmd *mbox_cmd)
 {
-	struct cxl_dev_state *cxlds = &mds->cxlds;
+	struct cxl_dev_state *cxlds = container_of(cxl_mbox,
+						   struct cxl_dev_state,
+						   cxl_mbox);
+	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
 	void __iomem *payload = cxlds->regs.mbox + CXLDEV_MBOX_PAYLOAD_OFFSET;
 	struct device *dev = cxlds->dev;
 	u64 cmd_reg, status_reg;
 	size_t out_len;
 	int rc;
 
-	lockdep_assert_held(&mds->mbox_mutex);
+	lockdep_assert_held(&cxl_mbox->mbox_mutex);
 
 	/*
 	 * Here are the steps from 8.2.8.4 of the CXL 2.0 spec.
@@ -315,10 +320,10 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
 
 		timeout = mbox_cmd->poll_interval_ms;
 		for (i = 0; i < mbox_cmd->poll_count; i++) {
-			if (rcuwait_wait_event_timeout(&mds->mbox_wait,
-				       cxl_mbox_background_complete(cxlds),
-				       TASK_UNINTERRUPTIBLE,
-				       msecs_to_jiffies(timeout)) > 0)
+			if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
+						       cxl_mbox_background_complete(cxlds),
+						       TASK_UNINTERRUPTIBLE,
+						       msecs_to_jiffies(timeout)) > 0)
 				break;
 		}
 
@@ -360,7 +365,7 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
 		 */
 		size_t n;
 
-		n = min3(mbox_cmd->size_out, mds->payload_size, out_len);
+		n = min3(mbox_cmd->size_out, cxl_mbox->payload_size, out_len);
 		memcpy_fromio(mbox_cmd->payload_out, payload, n);
 		mbox_cmd->size_out = n;
 	} else {
@@ -370,14 +375,14 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
 	return 0;
 }
 
-static int cxl_pci_mbox_send(struct cxl_memdev_state *mds,
+static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
 			     struct cxl_mbox_cmd *cmd)
 {
 	int rc;
 
-	mutex_lock_io(&mds->mbox_mutex);
-	rc = __cxl_pci_mbox_send_cmd(mds, cmd);
-	mutex_unlock(&mds->mbox_mutex);
+	mutex_lock_io(&cxl_mbox->mbox_mutex);
+	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, cmd);
+	mutex_unlock(&cxl_mbox->mbox_mutex);
 
 	return rc;
 }
@@ -385,6 +390,7 @@ static int cxl_pci_mbox_send(struct cxl_memdev_state *mds,
 static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
 {
 	struct cxl_dev_state *cxlds = &mds->cxlds;
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
 	const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
 	struct device *dev = cxlds->dev;
 	unsigned long timeout;
@@ -392,6 +398,7 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
 	u64 md_status;
 	u32 ctrl;
 
+	cxl_mbox->host = dev;
 	timeout = jiffies + mbox_ready_timeout * HZ;
 	do {
 		md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
@@ -417,8 +424,8 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
 		return -ETIMEDOUT;
 	}
 
-	mds->mbox_send = cxl_pci_mbox_send;
-	mds->payload_size =
+	cxl_mbox->mbox_send = cxl_pci_mbox_send;
+	cxl_mbox->payload_size =
 		1 << FIELD_GET(CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK, cap);
 
 	/*
@@ -428,16 +435,16 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
 	 * there's no point in going forward. If the size is too large, there's
 	 * no harm is soft limiting it.
 	 */
-	mds->payload_size = min_t(size_t, mds->payload_size, SZ_1M);
-	if (mds->payload_size < 256) {
+	cxl_mbox->payload_size = min_t(size_t, cxl_mbox->payload_size, SZ_1M);
+	if (cxl_mbox->payload_size < 256) {
 		dev_err(dev, "Mailbox is too small (%zub)",
-			mds->payload_size);
+			cxl_mbox->payload_size);
 		return -ENXIO;
 	}
 
-	dev_dbg(dev, "Mailbox payload sized %zu", mds->payload_size);
+	dev_dbg(dev, "Mailbox payload sized %zu", cxl_mbox->payload_size);
 
-	rcuwait_init(&mds->mbox_wait);
+	rcuwait_init(&cxl_mbox->mbox_wait);
 	INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mbox_sanitize_work);
 
 	/* background command interrupts are optional */
@@ -578,9 +585,10 @@ static void free_event_buf(void *buf)
  */
 static int cxl_mem_alloc_event_buf(struct cxl_memdev_state *mds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_get_event_payload *buf;
 
-	buf = kvmalloc(mds->payload_size, GFP_KERNEL);
+	buf = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
 	if (!buf)
 		return -ENOMEM;
 	mds->event.buf = buf;
diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c
index 4ef93da22335..3985ff9ce70e 100644
--- a/drivers/cxl/pmem.c
+++ b/drivers/cxl/pmem.c
@@ -102,13 +102,15 @@ static int cxl_pmem_get_config_size(struct cxl_memdev_state *mds,
 				    struct nd_cmd_get_config_size *cmd,
 				    unsigned int buf_len)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+
 	if (sizeof(*cmd) > buf_len)
 		return -EINVAL;
 
 	*cmd = (struct nd_cmd_get_config_size){
 		.config_size = mds->lsa_size,
 		.max_xfer =
-			mds->payload_size - sizeof(struct cxl_mbox_set_lsa),
+			cxl_mbox->payload_size - sizeof(struct cxl_mbox_set_lsa),
 	};
 
 	return 0;
diff --git a/include/linux/cxl/mailbox.h b/include/linux/cxl/mailbox.h
new file mode 100644
index 000000000000..654df6175828
--- /dev/null
+++ b/include/linux/cxl/mailbox.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2024 Intel Corporation. */
+#ifndef __CXL_MBOX_H__
+#define __CXL_MBOX_H__
+
+#include <linux/auxiliary_bus.h>
+
+/**
+ * struct cxl_mbox_cmd - A command to be submitted to hardware.
+ * @opcode: (input) The command set and command submitted to hardware.
+ * @payload_in: (input) Pointer to the input payload.
+ * @payload_out: (output) Pointer to the output payload. Must be allocated by
+ *		 the caller.
+ * @size_in: (input) Number of bytes to load from @payload_in.
+ * @size_out: (input) Max number of bytes loaded into @payload_out.
+ *            (output) Number of bytes generated by the device. For fixed size
+ *            outputs commands this is always expected to be deterministic. For
+ *            variable sized output commands, it tells the exact number of bytes
+ *            written.
+ * @min_out: (input) internal command output payload size validation
+ * @poll_count: (input) Number of timeouts to attempt.
+ * @poll_interval_ms: (input) Time between mailbox background command polling
+ *                    interval timeouts.
+ * @return_code: (output) Error code returned from hardware.
+ *
+ * This is the primary mechanism used to send commands to the hardware.
+ * All the fields except @payload_* correspond exactly to the fields described in
+ * Command Register section of the CXL 2.0 8.2.8.4.5. @payload_in and
+ * @payload_out are written to, and read from the Command Payload Registers
+ * defined in CXL 2.0 8.2.8.4.8.
+ */
+struct cxl_mbox_cmd {
+	u16 opcode;
+	void *payload_in;
+	void *payload_out;
+	size_t size_in;
+	size_t size_out;
+	size_t min_out;
+	int poll_count;
+	int poll_interval_ms;
+	u16 return_code;
+};
+
+/**
+ * struct cxl_mailbox - context for CXL mailbox operations
+ * @host: device that hosts the mailbox
+ * @adev: auxiliary device for fw-ctl
+ * @payload_size: Size of space for payload
+ *                (CXL 3.1 8.2.8.4.3 Mailbox Capabilities Register)
+ * @mbox_mutex: mutex protects device mailbox and firmware
+ * @mbox_wait: rcuwait for mailbox
+ * @mbox_send: @dev specific transport for transmitting mailbox commands
+ */
+struct cxl_mailbox {
+	struct device *host;
+	struct auxiliary_device adev; /* For fw-ctl */
+	size_t payload_size;
+	struct mutex mbox_mutex; /* lock to protect mailbox context */
+	struct rcuwait mbox_wait;
+	int (*mbox_send)(struct cxl_mailbox *cxl_mbox, struct cxl_mbox_cmd *cmd);
+};
+
+#endif
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 129f179b0ac5..1829b626bb40 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -8,6 +8,7 @@
 #include <linux/delay.h>
 #include <linux/sizes.h>
 #include <linux/bits.h>
+#include <linux/cxl/mailbox.h>
 #include <asm/unaligned.h>
 #include <crypto/sha2.h>
 #include <cxlmem.h>
@@ -534,6 +535,7 @@ static int mock_gsl(struct cxl_mbox_cmd *cmd)
 
 static int mock_get_log(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_get_log *gl = cmd->payload_in;
 	u32 offset = le32_to_cpu(gl->offset);
 	u32 length = le32_to_cpu(gl->length);
@@ -542,7 +544,7 @@ static int mock_get_log(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd)
 
 	if (cmd->size_in < sizeof(*gl))
 		return -EINVAL;
-	if (length > mds->payload_size)
+	if (length > cxl_mbox->payload_size)
 		return -EINVAL;
 	if (offset + length > sizeof(mock_cel))
 		return -EINVAL;
@@ -617,12 +619,13 @@ void cxl_mockmem_sanitize_work(struct work_struct *work)
 {
 	struct cxl_memdev_state *mds =
 		container_of(work, typeof(*mds), security.poll_dwork.work);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 
-	mutex_lock(&mds->mbox_mutex);
+	mutex_lock(&cxl_mbox->mbox_mutex);
 	if (mds->security.sanitize_node)
 		sysfs_notify_dirent(mds->security.sanitize_node);
 	mds->security.sanitize_active = false;
-	mutex_unlock(&mds->mbox_mutex);
+	mutex_unlock(&cxl_mbox->mbox_mutex);
 
 	dev_dbg(mds->cxlds.dev, "sanitize complete\n");
 }
@@ -631,6 +634,7 @@ static int mock_sanitize(struct cxl_mockmem_data *mdata,
 			 struct cxl_mbox_cmd *cmd)
 {
 	struct cxl_memdev_state *mds = mdata->mds;
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	int rc = 0;
 
 	if (cmd->size_in != 0)
@@ -648,14 +652,14 @@ static int mock_sanitize(struct cxl_mockmem_data *mdata,
 		return -ENXIO;
 	}
 
-	mutex_lock(&mds->mbox_mutex);
+	mutex_lock(&cxl_mbox->mbox_mutex);
 	if (schedule_delayed_work(&mds->security.poll_dwork,
 				  msecs_to_jiffies(mdata->sanitize_timeout))) {
 		mds->security.sanitize_active = true;
 		dev_dbg(mds->cxlds.dev, "sanitize issued\n");
 	} else
 		rc = -EBUSY;
-	mutex_unlock(&mds->mbox_mutex);
+	mutex_unlock(&cxl_mbox->mbox_mutex);
 
 	return rc;
 }
@@ -1333,10 +1337,13 @@ static int mock_activate_fw(struct cxl_mockmem_data *mdata,
 	return -EINVAL;
 }
 
-static int cxl_mock_mbox_send(struct cxl_memdev_state *mds,
+static int cxl_mock_mbox_send(struct cxl_mailbox *cxl_mbox,
 			      struct cxl_mbox_cmd *cmd)
 {
-	struct cxl_dev_state *cxlds = &mds->cxlds;
+	struct cxl_dev_state *cxlds = container_of(cxl_mbox,
+						   struct cxl_dev_state,
+						   cxl_mbox);
+	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
 	struct device *dev = cxlds->dev;
 	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
 	int rc = -EIO;
@@ -1460,6 +1467,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	struct cxl_memdev_state *mds;
 	struct cxl_dev_state *cxlds;
 	struct cxl_mockmem_data *mdata;
+	struct cxl_mailbox *cxl_mbox;
 	int rc;
 
 	mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL);
@@ -1487,9 +1495,10 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	if (IS_ERR(mds))
 		return PTR_ERR(mds);
 
+	cxl_mbox = &mds->cxlds.cxl_mbox;
 	mdata->mds = mds;
-	mds->mbox_send = cxl_mock_mbox_send;
-	mds->payload_size = SZ_4K;
+	cxl_mbox->mbox_send = cxl_mock_mbox_send;
+	cxl_mbox->payload_size = SZ_4K;
 	mds->event.buf = (struct cxl_get_event_payload *) mdata->event_buf;
 	INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mockmem_sanitize_work);
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 05/17] cxl: Fix comment regarding cxl_query_cmd() return data
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (3 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 04/17] cxl: Move mailbox related bits to the same context shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-11  9:04 ` [PATCH v12 06/17] cxl: Refactor user ioctl command path from mds to mailbox shiju.jose
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Dave Jiang <dave.jiang@intel.com>

The code indicates that the min of n_commands and total commands
is returned. The comment incorrectly says it's the max(). Correct
comment to min().

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/cxl/core/mbox.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 216937ef9e07..5cbb76c4aa16 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -547,7 +547,7 @@ int cxl_query_cmd(struct cxl_memdev *cxlmd,
 		return put_user(ARRAY_SIZE(cxl_mem_commands), &q->n_commands);
 
 	/*
-	 * otherwise, return max(n_commands, total commands) cxl_command_info
+	 * otherwise, return min(n_commands, total commands) cxl_command_info
 	 * structures.
 	 */
 	cxl_for_each_cmd(cmd) {
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 06/17] cxl: Refactor user ioctl command path from mds to mailbox
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (4 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 05/17] cxl: Fix comment regarding cxl_query_cmd() return data shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-11  9:04 ` [PATCH v12 07/17] cxl: Add Get Supported Features command for kernel usage shiju.jose
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Dave Jiang <dave.jiang@intel.com>

With 'struct cxl_mailbox' context introduced, the helper functions
cxl_query_cmd() and cxl_send_cmd() can take a cxl_mailbox directly
rather than a cxl_memdev parameter. Refactor to use cxl_mailbox
directly.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/cxl/core/core.h     |   6 +-
 drivers/cxl/core/mbox.c     | 120 ++++++++++++++++++------------------
 drivers/cxl/core/memdev.c   |  39 ++++++++----
 drivers/cxl/cxlmem.h        |   6 +-
 drivers/cxl/pci.c           |   6 +-
 drivers/cxl/pmem.c          |   6 +-
 drivers/cxl/security.c      |  18 ++++--
 include/linux/cxl/mailbox.h |   5 ++
 8 files changed, 116 insertions(+), 90 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 72a506c9dbd0..2b9f80d61ee9 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -4,6 +4,8 @@
 #ifndef __CXL_CORE_H__
 #define __CXL_CORE_H__
 
+#include <linux/cxl/mailbox.h>
+
 extern const struct device_type cxl_nvdimm_bridge_type;
 extern const struct device_type cxl_nvdimm_type;
 extern const struct device_type cxl_pmu_type;
@@ -65,9 +67,9 @@ static inline void cxl_region_exit(void)
 
 struct cxl_send_command;
 struct cxl_mem_query_commands;
-int cxl_query_cmd(struct cxl_memdev *cxlmd,
+int cxl_query_cmd(struct cxl_mailbox *cxl_mbox,
 		  struct cxl_mem_query_commands __user *q);
-int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s);
+int cxl_send_cmd(struct cxl_mailbox *cxl_mailbox, struct cxl_send_command __user *s);
 void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
 				   resource_size_t length);
 
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 5cbb76c4aa16..fa1ee495a4e3 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -225,7 +225,7 @@ static const char *cxl_mem_opcode_to_name(u16 opcode)
 
 /**
  * cxl_internal_send_cmd() - Kernel internal interface to send a mailbox command
- * @mds: The driver data for the operation
+ * @cxl_mbox: CXL mailbox context for the operation
  * @mbox_cmd: initialized command to execute
  *
  * Context: Any context.
@@ -241,10 +241,9 @@ static const char *cxl_mem_opcode_to_name(u16 opcode)
  * error. While this distinction can be useful for commands from userspace, the
  * kernel will only be able to use results when both are successful.
  */
-int cxl_internal_send_cmd(struct cxl_memdev_state *mds,
+int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
 			  struct cxl_mbox_cmd *mbox_cmd)
 {
-	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	size_t out_size, min_out;
 	int rc;
 
@@ -350,40 +349,40 @@ static bool cxl_payload_from_user_allowed(u16 opcode, void *payload_in)
 	return true;
 }
 
-static int cxl_mbox_cmd_ctor(struct cxl_mbox_cmd *mbox,
-			     struct cxl_memdev_state *mds, u16 opcode,
+static int cxl_mbox_cmd_ctor(struct cxl_mbox_cmd *mbox_cmd,
+			     struct cxl_mailbox *cxl_mbox, u16 opcode,
 			     size_t in_size, size_t out_size, u64 in_payload)
 {
-	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
-	*mbox = (struct cxl_mbox_cmd) {
+	*mbox_cmd = (struct cxl_mbox_cmd) {
 		.opcode = opcode,
 		.size_in = in_size,
 	};
 
 	if (in_size) {
-		mbox->payload_in = vmemdup_user(u64_to_user_ptr(in_payload),
-						in_size);
-		if (IS_ERR(mbox->payload_in))
-			return PTR_ERR(mbox->payload_in);
-
-		if (!cxl_payload_from_user_allowed(opcode, mbox->payload_in)) {
-			dev_dbg(mds->cxlds.dev, "%s: input payload not allowed\n",
+		mbox_cmd->payload_in = vmemdup_user(u64_to_user_ptr(in_payload),
+						    in_size);
+		if (IS_ERR(mbox_cmd->payload_in))
+			return PTR_ERR(mbox_cmd->payload_in);
+
+		if (!cxl_payload_from_user_allowed(opcode,
+						   mbox_cmd->payload_in)) {
+			dev_dbg(cxl_mbox->host, "%s: input payload not allowed\n",
 				cxl_mem_opcode_to_name(opcode));
-			kvfree(mbox->payload_in);
+			kvfree(mbox_cmd->payload_in);
 			return -EBUSY;
 		}
 	}
 
 	/* Prepare to handle a full payload for variable sized output */
 	if (out_size == CXL_VARIABLE_PAYLOAD)
-		mbox->size_out = cxl_mbox->payload_size;
+		mbox_cmd->size_out = cxl_mbox->payload_size;
 	else
-		mbox->size_out = out_size;
+		mbox_cmd->size_out = out_size;
 
-	if (mbox->size_out) {
-		mbox->payload_out = kvzalloc(mbox->size_out, GFP_KERNEL);
-		if (!mbox->payload_out) {
-			kvfree(mbox->payload_in);
+	if (mbox_cmd->size_out) {
+		mbox_cmd->payload_out = kvzalloc(mbox_cmd->size_out, GFP_KERNEL);
+		if (!mbox_cmd->payload_out) {
+			kvfree(mbox_cmd->payload_in);
 			return -ENOMEM;
 		}
 	}
@@ -398,10 +397,8 @@ static void cxl_mbox_cmd_dtor(struct cxl_mbox_cmd *mbox)
 
 static int cxl_to_mem_cmd_raw(struct cxl_mem_command *mem_cmd,
 			      const struct cxl_send_command *send_cmd,
-			      struct cxl_memdev_state *mds)
+			      struct cxl_mailbox *cxl_mbox)
 {
-	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
-
 	if (send_cmd->raw.rsvd)
 		return -EINVAL;
 
@@ -416,7 +413,7 @@ static int cxl_to_mem_cmd_raw(struct cxl_mem_command *mem_cmd,
 	if (!cxl_mem_raw_command_allowed(send_cmd->raw.opcode))
 		return -EPERM;
 
-	dev_WARN_ONCE(mds->cxlds.dev, true, "raw command path used\n");
+	dev_WARN_ONCE(cxl_mbox->host, true, "raw command path used\n");
 
 	*mem_cmd = (struct cxl_mem_command) {
 		.info = {
@@ -432,7 +429,7 @@ static int cxl_to_mem_cmd_raw(struct cxl_mem_command *mem_cmd,
 
 static int cxl_to_mem_cmd(struct cxl_mem_command *mem_cmd,
 			  const struct cxl_send_command *send_cmd,
-			  struct cxl_memdev_state *mds)
+			  struct cxl_mailbox *cxl_mbox)
 {
 	struct cxl_mem_command *c = &cxl_mem_commands[send_cmd->id];
 	const struct cxl_command_info *info = &c->info;
@@ -447,11 +444,11 @@ static int cxl_to_mem_cmd(struct cxl_mem_command *mem_cmd,
 		return -EINVAL;
 
 	/* Check that the command is enabled for hardware */
-	if (!test_bit(info->id, mds->enabled_cmds))
+	if (!test_bit(info->id, cxl_mbox->enabled_cmds))
 		return -ENOTTY;
 
 	/* Check that the command is not claimed for exclusive kernel use */
-	if (test_bit(info->id, mds->exclusive_cmds))
+	if (test_bit(info->id, cxl_mbox->exclusive_cmds))
 		return -EBUSY;
 
 	/* Check the input buffer is the expected size */
@@ -495,10 +492,9 @@ static int cxl_to_mem_cmd(struct cxl_mem_command *mem_cmd,
  * safe to send to the hardware.
  */
 static int cxl_validate_cmd_from_user(struct cxl_mbox_cmd *mbox_cmd,
-				      struct cxl_memdev_state *mds,
+				      struct cxl_mailbox *cxl_mbox,
 				      const struct cxl_send_command *send_cmd)
 {
-	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mem_command mem_cmd;
 	int rc;
 
@@ -515,24 +511,23 @@ static int cxl_validate_cmd_from_user(struct cxl_mbox_cmd *mbox_cmd,
 
 	/* Sanitize and construct a cxl_mem_command */
 	if (send_cmd->id == CXL_MEM_COMMAND_ID_RAW)
-		rc = cxl_to_mem_cmd_raw(&mem_cmd, send_cmd, mds);
+		rc = cxl_to_mem_cmd_raw(&mem_cmd, send_cmd, cxl_mbox);
 	else
-		rc = cxl_to_mem_cmd(&mem_cmd, send_cmd, mds);
+		rc = cxl_to_mem_cmd(&mem_cmd, send_cmd, cxl_mbox);
 
 	if (rc)
 		return rc;
 
 	/* Sanitize and construct a cxl_mbox_cmd */
-	return cxl_mbox_cmd_ctor(mbox_cmd, mds, mem_cmd.opcode,
+	return cxl_mbox_cmd_ctor(mbox_cmd, cxl_mbox, mem_cmd.opcode,
 				 mem_cmd.info.size_in, mem_cmd.info.size_out,
 				 send_cmd->in.payload);
 }
 
-int cxl_query_cmd(struct cxl_memdev *cxlmd,
+int cxl_query_cmd(struct cxl_mailbox *cxl_mbox,
 		  struct cxl_mem_query_commands __user *q)
 {
-	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
-	struct device *dev = &cxlmd->dev;
+	struct device *dev = cxl_mbox->host;
 	struct cxl_mem_command *cmd;
 	u32 n_commands;
 	int j = 0;
@@ -553,9 +548,9 @@ int cxl_query_cmd(struct cxl_memdev *cxlmd,
 	cxl_for_each_cmd(cmd) {
 		struct cxl_command_info info = cmd->info;
 
-		if (test_bit(info.id, mds->enabled_cmds))
+		if (test_bit(info.id, cxl_mbox->enabled_cmds))
 			info.flags |= CXL_MEM_COMMAND_FLAG_ENABLED;
-		if (test_bit(info.id, mds->exclusive_cmds))
+		if (test_bit(info.id, cxl_mbox->exclusive_cmds))
 			info.flags |= CXL_MEM_COMMAND_FLAG_EXCLUSIVE;
 
 		if (copy_to_user(&q->commands[j++], &info, sizeof(info)))
@@ -570,7 +565,7 @@ int cxl_query_cmd(struct cxl_memdev *cxlmd,
 
 /**
  * handle_mailbox_cmd_from_user() - Dispatch a mailbox command for userspace.
- * @mds: The driver data for the operation
+ * @mailbox: The mailbox context for the operation.
  * @mbox_cmd: The validated mailbox command.
  * @out_payload: Pointer to userspace's output payload.
  * @size_out: (Input) Max payload size to copy out.
@@ -591,13 +586,12 @@ int cxl_query_cmd(struct cxl_memdev *cxlmd,
  *
  * See cxl_send_cmd().
  */
-static int handle_mailbox_cmd_from_user(struct cxl_memdev_state *mds,
+static int handle_mailbox_cmd_from_user(struct cxl_mailbox *cxl_mbox,
 					struct cxl_mbox_cmd *mbox_cmd,
 					u64 out_payload, s32 *size_out,
 					u32 *retval)
 {
-	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
-	struct device *dev = mds->cxlds.dev;
+	struct device *dev = cxl_mbox->host;
 	int rc;
 
 	dev_dbg(dev,
@@ -634,10 +628,9 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev_state *mds,
 	return rc;
 }
 
-int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s)
+int cxl_send_cmd(struct cxl_mailbox *cxl_mbox, struct cxl_send_command __user *s)
 {
-	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
-	struct device *dev = &cxlmd->dev;
+	struct device *dev = cxl_mbox->host;
 	struct cxl_send_command send;
 	struct cxl_mbox_cmd mbox_cmd;
 	int rc;
@@ -647,11 +640,11 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s)
 	if (copy_from_user(&send, s, sizeof(send)))
 		return -EFAULT;
 
-	rc = cxl_validate_cmd_from_user(&mbox_cmd, mds, &send);
+	rc = cxl_validate_cmd_from_user(&mbox_cmd, cxl_mbox, &send);
 	if (rc)
 		return rc;
 
-	rc = handle_mailbox_cmd_from_user(mds, &mbox_cmd, send.out.payload,
+	rc = handle_mailbox_cmd_from_user(cxl_mbox, &mbox_cmd, send.out.payload,
 					  &send.out.size, &send.retval);
 	if (rc)
 		return rc;
@@ -689,7 +682,7 @@ static int cxl_xfer_log(struct cxl_memdev_state *mds, uuid_t *uuid,
 			.payload_out = out,
 		};
 
-		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 
 		/*
 		 * The output payload length that indicates the number
@@ -725,6 +718,7 @@ static int cxl_xfer_log(struct cxl_memdev_state *mds, uuid_t *uuid,
  */
 static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_cel_entry *cel_entry;
 	const int cel_entries = size / sizeof(*cel_entry);
 	struct device *dev = mds->cxlds.dev;
@@ -738,7 +732,7 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel)
 		int enabled = 0;
 
 		if (cmd) {
-			set_bit(cmd->info.id, mds->enabled_cmds);
+			set_bit(cmd->info.id, cxl_mbox->enabled_cmds);
 			enabled++;
 		}
 
@@ -775,7 +769,7 @@ static struct cxl_mbox_get_supported_logs *cxl_get_gsl(struct cxl_memdev_state *
 		/* At least the record number field must be valid */
 		.min_out = 2,
 	};
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc < 0) {
 		kvfree(ret);
 		return ERR_PTR(rc);
@@ -808,6 +802,7 @@ static const uuid_t log_uuid[] = {
  */
 int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_get_supported_logs *gsl;
 	struct device *dev = mds->cxlds.dev;
 	struct cxl_mem_command *cmd;
@@ -846,7 +841,7 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
 		/* In case CEL was bogus, enable some default commands. */
 		cxl_for_each_cmd(cmd)
 			if (cmd->flags & CXL_CMD_FLAG_FORCE_ENABLE)
-				set_bit(cmd->info.id, mds->enabled_cmds);
+				set_bit(cmd->info.id, cxl_mbox->enabled_cmds);
 
 		/* Found the required CEL */
 		rc = 0;
@@ -964,7 +959,7 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
 
 		if (i == max_handles) {
 			payload->nr_recs = i;
-			rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+			rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 			if (rc)
 				goto free_pl;
 			i = 0;
@@ -975,7 +970,7 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
 	if (i) {
 		payload->nr_recs = i;
 		mbox_cmd.size_in = struct_size(payload, handles, i);
-		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 		if (rc)
 			goto free_pl;
 	}
@@ -1009,7 +1004,7 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
 			.min_out = struct_size(payload, records, 0),
 		};
 
-		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 		if (rc) {
 			dev_err_ratelimited(dev,
 				"Event log '%d': Failed to query event records : %d",
@@ -1080,6 +1075,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, CXL);
  */
 static int cxl_mem_get_partition_info(struct cxl_memdev_state *mds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_get_partition_info pi;
 	struct cxl_mbox_cmd mbox_cmd;
 	int rc;
@@ -1089,7 +1085,7 @@ static int cxl_mem_get_partition_info(struct cxl_memdev_state *mds)
 		.size_out = sizeof(pi),
 		.payload_out = &pi,
 	};
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc)
 		return rc;
 
@@ -1116,6 +1112,7 @@ static int cxl_mem_get_partition_info(struct cxl_memdev_state *mds)
  */
 int cxl_dev_state_identify(struct cxl_memdev_state *mds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	/* See CXL 2.0 Table 175 Identify Memory Device Output Payload */
 	struct cxl_mbox_identify id;
 	struct cxl_mbox_cmd mbox_cmd;
@@ -1130,7 +1127,7 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds)
 		.size_out = sizeof(id),
 		.payload_out = &id,
 	};
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc < 0)
 		return rc;
 
@@ -1170,11 +1167,12 @@ static int __cxl_mem_sanitize(struct cxl_memdev_state *mds, u16 cmd)
 	};
 	struct cxl_mbox_cmd mbox_cmd = { .opcode = cmd };
 	struct cxl_dev_state *cxlds = &mds->cxlds;
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
 
 	if (cmd != CXL_MBOX_OP_SANITIZE && cmd != CXL_MBOX_OP_SECURE_ERASE)
 		return -EINVAL;
 
-	rc = cxl_internal_send_cmd(mds, &sec_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &sec_cmd);
 	if (rc < 0) {
 		dev_err(cxlds->dev, "Failed to get security state : %d", rc);
 		return rc;
@@ -1193,7 +1191,7 @@ static int __cxl_mem_sanitize(struct cxl_memdev_state *mds, u16 cmd)
 	    sec_out & CXL_PMEM_SEC_STATE_LOCKED)
 		return -EINVAL;
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc < 0) {
 		dev_err(cxlds->dev, "Failed to sanitize device : %d", rc);
 		return rc;
@@ -1310,6 +1308,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_mem_create_range_info, CXL);
 
 int cxl_set_timestamp(struct cxl_memdev_state *mds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_cmd mbox_cmd;
 	struct cxl_mbox_set_timestamp_in pi;
 	int rc;
@@ -1321,7 +1320,7 @@ int cxl_set_timestamp(struct cxl_memdev_state *mds)
 		.payload_in = &pi,
 	};
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	/*
 	 * Command is optional. Devices may have another way of providing
 	 * a timestamp, or may return all 0s in timestamp fields.
@@ -1362,7 +1361,7 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 			.min_out = struct_size(po, record, 0),
 		};
 
-		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 		if (rc)
 			break;
 
@@ -1438,6 +1437,7 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
 	mutex_init(&mds->event.log_lock);
 	mds->cxlds.dev = dev;
 	mds->cxlds.reg_map.host = dev;
+	mds->cxlds.cxl_mbox.host = dev;
 	mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
 	mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
 	mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 05bb84cb1274..9f0fe698414d 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -279,6 +279,7 @@ static int cxl_validate_poison_dpa(struct cxl_memdev *cxlmd, u64 dpa)
 int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
 {
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_inject_poison inject;
 	struct cxl_poison_record record;
 	struct cxl_mbox_cmd mbox_cmd;
@@ -308,7 +309,7 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
 		.size_in = sizeof(inject),
 		.payload_in = &inject,
 	};
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc)
 		goto out;
 
@@ -334,6 +335,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_inject_poison, CXL);
 int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
 {
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_clear_poison clear;
 	struct cxl_poison_record record;
 	struct cxl_mbox_cmd mbox_cmd;
@@ -372,7 +374,7 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
 		.payload_in = &clear,
 	};
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc)
 		goto out;
 
@@ -564,9 +566,11 @@ EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, CXL);
 void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 				unsigned long *cmds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+
 	down_write(&cxl_memdev_rwsem);
-	bitmap_or(mds->exclusive_cmds, mds->exclusive_cmds, cmds,
-		  CXL_MEM_COMMAND_ID_MAX);
+	bitmap_or(cxl_mbox->exclusive_cmds, cxl_mbox->exclusive_cmds,
+		  cmds, CXL_MEM_COMMAND_ID_MAX);
 	up_write(&cxl_memdev_rwsem);
 }
 EXPORT_SYMBOL_NS_GPL(set_exclusive_cxl_commands, CXL);
@@ -579,9 +583,11 @@ EXPORT_SYMBOL_NS_GPL(set_exclusive_cxl_commands, CXL);
 void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 				  unsigned long *cmds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+
 	down_write(&cxl_memdev_rwsem);
-	bitmap_andnot(mds->exclusive_cmds, mds->exclusive_cmds, cmds,
-		      CXL_MEM_COMMAND_ID_MAX);
+	bitmap_andnot(cxl_mbox->exclusive_cmds, cxl_mbox->exclusive_cmds,
+		      cmds, CXL_MEM_COMMAND_ID_MAX);
 	up_write(&cxl_memdev_rwsem);
 }
 EXPORT_SYMBOL_NS_GPL(clear_exclusive_cxl_commands, CXL);
@@ -656,11 +662,14 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
 static long __cxl_memdev_ioctl(struct cxl_memdev *cxlmd, unsigned int cmd,
 			       unsigned long arg)
 {
+	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+
 	switch (cmd) {
 	case CXL_MEM_QUERY_COMMANDS:
-		return cxl_query_cmd(cxlmd, (void __user *)arg);
+		return cxl_query_cmd(cxl_mbox, (void __user *)arg);
 	case CXL_MEM_SEND_COMMAND:
-		return cxl_send_cmd(cxlmd, (void __user *)arg);
+		return cxl_send_cmd(cxl_mbox, (void __user *)arg);
 	default:
 		return -ENOTTY;
 	}
@@ -715,6 +724,7 @@ static int cxl_memdev_release_file(struct inode *inode, struct file *file)
  */
 static int cxl_mem_get_fw_info(struct cxl_memdev_state *mds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_get_fw_info info;
 	struct cxl_mbox_cmd mbox_cmd;
 	int rc;
@@ -725,7 +735,7 @@ static int cxl_mem_get_fw_info(struct cxl_memdev_state *mds)
 		.payload_out = &info,
 	};
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc < 0)
 		return rc;
 
@@ -749,6 +759,7 @@ static int cxl_mem_get_fw_info(struct cxl_memdev_state *mds)
  */
 static int cxl_mem_activate_fw(struct cxl_memdev_state *mds, int slot)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_activate_fw activate;
 	struct cxl_mbox_cmd mbox_cmd;
 
@@ -765,7 +776,7 @@ static int cxl_mem_activate_fw(struct cxl_memdev_state *mds, int slot)
 	activate.action = CXL_FW_ACTIVATE_OFFLINE;
 	activate.slot = slot;
 
-	return cxl_internal_send_cmd(mds, &mbox_cmd);
+	return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 }
 
 /**
@@ -780,6 +791,7 @@ static int cxl_mem_activate_fw(struct cxl_memdev_state *mds, int slot)
  */
 static int cxl_mem_abort_fw_xfer(struct cxl_memdev_state *mds)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_transfer_fw *transfer;
 	struct cxl_mbox_cmd mbox_cmd;
 	int rc;
@@ -799,7 +811,7 @@ static int cxl_mem_abort_fw_xfer(struct cxl_memdev_state *mds)
 
 	transfer->action = CXL_FW_TRANSFER_ACTION_ABORT;
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	kfree(transfer);
 	return rc;
 }
@@ -924,7 +936,7 @@ static enum fw_upload_err cxl_fw_write(struct fw_upload *fwl, const u8 *data,
 		.poll_count = 30,
 	};
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc < 0) {
 		rc = FW_UPLOAD_ERR_RW_ERROR;
 		goto out_free;
@@ -991,10 +1003,11 @@ static void cxl_remove_fw_upload(void *fwl)
 int devm_cxl_setup_fw_upload(struct device *host, struct cxl_memdev_state *mds)
 {
 	struct cxl_dev_state *cxlds = &mds->cxlds;
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
 	struct device *dev = &cxlds->cxlmd->dev;
 	struct fw_upload *fwl;
 
-	if (!test_bit(CXL_MEM_COMMAND_ID_GET_FW_INFO, mds->enabled_cmds))
+	if (!test_bit(CXL_MEM_COMMAND_ID_GET_FW_INFO, cxl_mbox->enabled_cmds))
 		return 0;
 
 	fwl = firmware_upload_register(THIS_MODULE, dev, dev_name(dev),
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 19609b708b09..d7c6ffe2a884 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -417,8 +417,6 @@ struct cxl_dev_state {
  * @lsa_size: Size of Label Storage Area
  *                (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
  * @firmware_version: Firmware version for the memory device.
- * @enabled_cmds: Hardware commands found enabled in CEL.
- * @exclusive_cmds: Commands that are kernel-internal only
  * @total_bytes: sum of all possible capacities
  * @volatile_only_bytes: hard volatile capacity
  * @persistent_only_bytes: hard persistent capacity
@@ -441,8 +439,6 @@ struct cxl_memdev_state {
 	struct cxl_dev_state cxlds;
 	size_t lsa_size;
 	char firmware_version[0x10];
-	DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
-	DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
 	u64 total_bytes;
 	u64 volatile_only_bytes;
 	u64 persistent_only_bytes;
@@ -769,7 +765,7 @@ enum {
 	CXL_PMEM_SEC_PASS_USER,
 };
 
-int cxl_internal_send_cmd(struct cxl_memdev_state *mds,
+int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
 			  struct cxl_mbox_cmd *cmd);
 int cxl_dev_state_identify(struct cxl_memdev_state *mds);
 int cxl_await_media_ready(struct cxl_dev_state *cxlds);
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index faf6f5a49368..3c73de475bf3 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -661,6 +661,7 @@ static int cxl_event_req_irq(struct cxl_dev_state *cxlds, u8 setting)
 static int cxl_event_get_int_policy(struct cxl_memdev_state *mds,
 				    struct cxl_event_interrupt_policy *policy)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_cmd mbox_cmd = {
 		.opcode = CXL_MBOX_OP_GET_EVT_INT_POLICY,
 		.payload_out = policy,
@@ -668,7 +669,7 @@ static int cxl_event_get_int_policy(struct cxl_memdev_state *mds,
 	};
 	int rc;
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc < 0)
 		dev_err(mds->cxlds.dev,
 			"Failed to get event interrupt policy : %d", rc);
@@ -679,6 +680,7 @@ static int cxl_event_get_int_policy(struct cxl_memdev_state *mds,
 static int cxl_event_config_msgnums(struct cxl_memdev_state *mds,
 				    struct cxl_event_interrupt_policy *policy)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_cmd mbox_cmd;
 	int rc;
 
@@ -695,7 +697,7 @@ static int cxl_event_config_msgnums(struct cxl_memdev_state *mds,
 		.size_in = sizeof(*policy),
 	};
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc < 0) {
 		dev_err(mds->cxlds.dev, "Failed to set event interrupt policy : %d",
 			rc);
diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c
index 3985ff9ce70e..5453c0faa295 100644
--- a/drivers/cxl/pmem.c
+++ b/drivers/cxl/pmem.c
@@ -120,6 +120,7 @@ static int cxl_pmem_get_config_data(struct cxl_memdev_state *mds,
 				    struct nd_cmd_get_config_data_hdr *cmd,
 				    unsigned int buf_len)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_get_lsa get_lsa;
 	struct cxl_mbox_cmd mbox_cmd;
 	int rc;
@@ -141,7 +142,7 @@ static int cxl_pmem_get_config_data(struct cxl_memdev_state *mds,
 		.payload_out = cmd->out_buf,
 	};
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	cmd->status = 0;
 
 	return rc;
@@ -151,6 +152,7 @@ static int cxl_pmem_set_config_data(struct cxl_memdev_state *mds,
 				    struct nd_cmd_set_config_hdr *cmd,
 				    unsigned int buf_len)
 {
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_set_lsa *set_lsa;
 	struct cxl_mbox_cmd mbox_cmd;
 	int rc;
@@ -177,7 +179,7 @@ static int cxl_pmem_set_config_data(struct cxl_memdev_state *mds,
 		.size_in = struct_size(set_lsa, data, cmd->in_length),
 	};
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 
 	/*
 	 * Set "firmware" status (4-packed bytes at the end of the input
diff --git a/drivers/cxl/security.c b/drivers/cxl/security.c
index 21856a3f408e..95a92b87c99a 100644
--- a/drivers/cxl/security.c
+++ b/drivers/cxl/security.c
@@ -15,6 +15,7 @@ static unsigned long cxl_pmem_get_security_flags(struct nvdimm *nvdimm,
 	struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
 	struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	unsigned long security_flags = 0;
 	struct cxl_get_security_output {
 		__le32 flags;
@@ -29,7 +30,7 @@ static unsigned long cxl_pmem_get_security_flags(struct nvdimm *nvdimm,
 		.payload_out = &out,
 	};
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc < 0)
 		return 0;
 
@@ -71,6 +72,7 @@ static int cxl_pmem_security_change_key(struct nvdimm *nvdimm,
 	struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
 	struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_cmd mbox_cmd;
 	struct cxl_set_pass set_pass;
 
@@ -87,7 +89,7 @@ static int cxl_pmem_security_change_key(struct nvdimm *nvdimm,
 		.payload_in = &set_pass,
 	};
 
-	return cxl_internal_send_cmd(mds, &mbox_cmd);
+	return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 }
 
 static int __cxl_pmem_security_disable(struct nvdimm *nvdimm,
@@ -97,6 +99,7 @@ static int __cxl_pmem_security_disable(struct nvdimm *nvdimm,
 	struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
 	struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_disable_pass dis_pass;
 	struct cxl_mbox_cmd mbox_cmd;
 
@@ -112,7 +115,7 @@ static int __cxl_pmem_security_disable(struct nvdimm *nvdimm,
 		.payload_in = &dis_pass,
 	};
 
-	return cxl_internal_send_cmd(mds, &mbox_cmd);
+	return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 }
 
 static int cxl_pmem_security_disable(struct nvdimm *nvdimm,
@@ -132,11 +135,12 @@ static int cxl_pmem_security_freeze(struct nvdimm *nvdimm)
 	struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
 	struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_cmd mbox_cmd = {
 		.opcode = CXL_MBOX_OP_FREEZE_SECURITY,
 	};
 
-	return cxl_internal_send_cmd(mds, &mbox_cmd);
+	return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 }
 
 static int cxl_pmem_security_unlock(struct nvdimm *nvdimm,
@@ -145,6 +149,7 @@ static int cxl_pmem_security_unlock(struct nvdimm *nvdimm,
 	struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
 	struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	u8 pass[NVDIMM_PASSPHRASE_LEN];
 	struct cxl_mbox_cmd mbox_cmd;
 	int rc;
@@ -156,7 +161,7 @@ static int cxl_pmem_security_unlock(struct nvdimm *nvdimm,
 		.payload_in = pass,
 	};
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc < 0)
 		return rc;
 
@@ -170,6 +175,7 @@ static int cxl_pmem_security_passphrase_erase(struct nvdimm *nvdimm,
 	struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm);
 	struct cxl_memdev *cxlmd = cxl_nvd->cxlmd;
 	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_mbox_cmd mbox_cmd;
 	struct cxl_pass_erase erase;
 	int rc;
@@ -185,7 +191,7 @@ static int cxl_pmem_security_passphrase_erase(struct nvdimm *nvdimm,
 		.payload_in = &erase,
 	};
 
-	rc = cxl_internal_send_cmd(mds, &mbox_cmd);
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
 	if (rc < 0)
 		return rc;
 
diff --git a/include/linux/cxl/mailbox.h b/include/linux/cxl/mailbox.h
index 654df6175828..2380b22d7a12 100644
--- a/include/linux/cxl/mailbox.h
+++ b/include/linux/cxl/mailbox.h
@@ -3,6 +3,7 @@
 #ifndef __CXL_MBOX_H__
 #define __CXL_MBOX_H__
 
+#include <uapi/linux/cxl_mem.h>
 #include <linux/auxiliary_bus.h>
 
 /**
@@ -45,6 +46,8 @@ struct cxl_mbox_cmd {
  * struct cxl_mailbox - context for CXL mailbox operations
  * @host: device that hosts the mailbox
  * @adev: auxiliary device for fw-ctl
+ * @enabled_cmds: Hardware commands found enabled in CEL.
+ * @exclusive_cmds: Commands that are kernel-internal only
  * @payload_size: Size of space for payload
  *                (CXL 3.1 8.2.8.4.3 Mailbox Capabilities Register)
  * @mbox_mutex: mutex protects device mailbox and firmware
@@ -54,6 +57,8 @@ struct cxl_mbox_cmd {
 struct cxl_mailbox {
 	struct device *host;
 	struct auxiliary_device adev; /* For fw-ctl */
+	DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
+	DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
 	size_t payload_size;
 	struct mutex mbox_mutex; /* lock to protect mailbox context */
 	struct rcuwait mbox_wait;
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 07/17] cxl: Add Get Supported Features command for kernel usage
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (5 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 06/17] cxl: Refactor user ioctl command path from mds to mailbox shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-23 23:33   ` Dave Jiang
  2024-09-11  9:04 ` [PATCH v12 08/17] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Dave Jiang <dave.jiang@intel.com>

CXL spec r3.1 8.2.9.6.1 Get Supported Features (Opcode 0500h)
The command retrieve the list of supported device-specific features
(identified by UUID) and general information about each Feature.

The driver will retrieve the feature entries in order to make checks and
provide information for the Get Feature and Set Feature command. One of
the main piece of information retrieved are the effects a Set Feature
command would have for a particular feature.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/mbox.c      | 175 +++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h         |  51 ++++++++++
 drivers/cxl/pci.c            |   4 +
 include/uapi/linux/cxl_mem.h |   1 +
 4 files changed, 231 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index fa1ee495a4e3..fe965ec5802f 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -67,6 +67,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
 	CXL_CMD(SET_SHUTDOWN_STATE, 0x1, 0, 0),
 	CXL_CMD(GET_SCAN_MEDIA_CAPS, 0x10, 0x4, 0),
 	CXL_CMD(GET_TIMESTAMP, 0, 0x8, 0),
+	CXL_CMD(GET_SUPPORTED_FEATURES, 0x8, CXL_VARIABLE_PAYLOAD, 0),
 };
 
 /*
@@ -790,6 +791,180 @@ static const uuid_t log_uuid[] = {
 	[VENDOR_DEBUG_UUID] = DEFINE_CXL_VENDOR_DEBUG_UUID,
 };
 
+static void cxl_free_features(void *features)
+{
+	kvfree(features);
+}
+
+static int cxl_get_supported_features_count(struct cxl_dev_state *cxlds)
+{
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
+	struct cxl_mbox_get_sup_feats_out mbox_out;
+	struct cxl_mbox_get_sup_feats_in mbox_in;
+	struct cxl_mbox_cmd mbox_cmd;
+	int rc;
+
+	memset(&mbox_in, 0, sizeof(mbox_in));
+	mbox_in.count = sizeof(mbox_out);
+	memset(&mbox_out, 0, sizeof(mbox_out));
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_GET_SUPPORTED_FEATURES,
+		.size_in = sizeof(mbox_in),
+		.payload_in = &mbox_in,
+		.size_out = sizeof(mbox_out),
+		.payload_out = &mbox_out,
+		.min_out = sizeof(mbox_out),
+	};
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
+	if (rc < 0)
+		return rc;
+
+	cxlds->num_features = le16_to_cpu(mbox_out.supported_feats);
+	if (!cxlds->num_features)
+		return -ENOENT;
+
+	return 0;
+}
+
+int cxl_get_supported_features(struct cxl_dev_state *cxlds)
+{
+	int remain_feats, max_size, max_feats, start, rc;
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
+	int feat_size = sizeof(struct cxl_feat_entry);
+	struct cxl_mbox_get_sup_feats_out *mbox_out;
+	struct cxl_mbox_get_sup_feats_in mbox_in;
+	int hdr_size = sizeof(*mbox_out);
+	struct cxl_mbox_cmd mbox_cmd;
+	struct cxl_mem_command *cmd;
+	void *ptr;
+
+	/* Get supported features is optional, need to check */
+	cmd = cxl_mem_find_command(CXL_MBOX_OP_GET_SUPPORTED_FEATURES);
+	if (!cmd)
+		return -EOPNOTSUPP;
+	if (!test_bit(cmd->info.id, cxl_mbox->enabled_cmds))
+		return -EOPNOTSUPP;
+
+	rc = cxl_get_supported_features_count(cxlds);
+	if (rc)
+		return rc;
+
+	struct cxl_feat_entry *entries __free(kvfree) =
+		kvmalloc(cxlds->num_features * feat_size, GFP_KERNEL);
+
+	if (!entries)
+		return -ENOMEM;
+
+	cxlds->entries = no_free_ptr(entries);
+	rc = devm_add_action_or_reset(cxl_mbox->host, cxl_free_features,
+				      cxlds->entries);
+	if (rc)
+		return rc;
+
+	max_size = cxl_mbox->payload_size - hdr_size;
+	/* max feat entries that can fit in mailbox max payload size */
+	max_feats = max_size / feat_size;
+	ptr = &cxlds->entries[0];
+
+	mbox_out = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
+	if (!mbox_out)
+		return -ENOMEM;
+
+	start = 0;
+	remain_feats = cxlds->num_features;
+	do {
+		int retrieved, alloc_size, copy_feats;
+
+		if (remain_feats > max_feats) {
+			alloc_size = sizeof(*mbox_out) + max_feats * feat_size;
+			remain_feats = remain_feats - max_feats;
+			copy_feats = max_feats;
+		} else {
+			alloc_size = sizeof(*mbox_out) + remain_feats * feat_size;
+			copy_feats = remain_feats;
+			remain_feats = 0;
+		}
+
+		memset(&mbox_in, 0, sizeof(mbox_in));
+		mbox_in.count = alloc_size;
+		mbox_in.start_idx = start;
+		memset(mbox_out, 0, alloc_size);
+		mbox_cmd = (struct cxl_mbox_cmd) {
+			.opcode = CXL_MBOX_OP_GET_SUPPORTED_FEATURES,
+			.size_in = sizeof(mbox_in),
+			.payload_in = &mbox_in,
+			.size_out = alloc_size,
+			.payload_out = mbox_out,
+			.min_out = hdr_size,
+		};
+		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
+		if (rc < 0) {
+			kfree(mbox_out);
+			return rc;
+		}
+		if (mbox_cmd.size_out <= hdr_size) {
+			rc = -ENXIO;
+			goto err;
+		}
+
+		/*
+		 * Make sure retrieved out buffer is multiple of feature
+		 * entries.
+		 */
+		retrieved = mbox_cmd.size_out - hdr_size;
+		if (retrieved % feat_size) {
+			rc = -ENXIO;
+			goto err;
+		}
+
+		/*
+		 * If the reported output entries * defined entry size !=
+		 * retrieved output bytes, then the output package is incorrect.
+		 */
+		if (mbox_out->num_entries * feat_size != retrieved) {
+			rc = -ENXIO;
+			goto err;
+		}
+
+		memcpy(ptr, mbox_out->ents, retrieved);
+		ptr += retrieved;
+		/*
+		 * If the number of output entries is less than expected, add the
+		 * remaining entries to the next batch.
+		 */
+		remain_feats += copy_feats - mbox_out->num_entries;
+		start += mbox_out->num_entries;
+	} while (remain_feats);
+
+	kfree(mbox_out);
+	return 0;
+
+err:
+	kfree(mbox_out);
+	cxlds->num_features = 0;
+	return rc;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
+
+int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t *feat_uuid,
+				    struct cxl_feat_entry *feat_entry_out)
+{
+	struct cxl_feat_entry *feat_entry;
+	int count;
+
+	/* Check CXL dev supports the feature */
+	feat_entry = &cxlds->entries[0];
+	for (count = 0; count < cxlds->num_features; count++, feat_entry++) {
+		if (uuid_equal(&feat_entry->uuid, feat_uuid)) {
+			memcpy(feat_entry_out, feat_entry, sizeof(*feat_entry_out));
+			return 0;
+		}
+	}
+
+	return -EOPNOTSUPP;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_supported_feature_entry, CXL);
+
 /**
  * cxl_enumerate_cmds() - Enumerate commands for a device.
  * @mds: The driver data for the operation
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index d7c6ffe2a884..5d149e64c247 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -389,6 +389,8 @@ struct cxl_dpa_perf {
  * @ram_res: Active Volatile memory capacity configuration
  * @serial: PCIe Device Serial Number
  * @type: Generic Memory Class device or Vendor Specific Memory device
+ * @features: number of supported features
+ * @entries: list of supported feature entries.
  */
 struct cxl_dev_state {
 	struct device *dev;
@@ -404,6 +406,8 @@ struct cxl_dev_state {
 	u64 serial;
 	enum cxl_devtype type;
 	struct cxl_mailbox cxl_mbox;
+	int num_features;
+	struct cxl_feat_entry *entries;
 };
 
 /**
@@ -482,6 +486,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_GET_LOG_CAPS	= 0x0402,
 	CXL_MBOX_OP_CLEAR_LOG           = 0x0403,
 	CXL_MBOX_OP_GET_SUP_LOG_SUBLIST = 0x0405,
+	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
 	CXL_MBOX_OP_IDENTIFY		= 0x4000,
 	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
 	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
@@ -765,6 +770,48 @@ enum {
 	CXL_PMEM_SEC_PASS_USER,
 };
 
+/* Get Supported Features (0x500h) CXL r3.1 8.2.9.6.1 */
+struct cxl_mbox_get_sup_feats_in {
+	__le32 count;
+	__le16 start_idx;
+	u8 reserved[2];
+} __packed;
+
+/* Supported Feature Entry : Payload out attribute flags */
+#define CXL_FEAT_ENTRY_FLAG_CHANGABLE	BIT(0)
+#define CXL_FEAT_ENTRY_FLAG_DEEPEST_RESET_PERSISTENCE_MASK	GENMASK(3, 1)
+#define CXL_FEAT_ENTRY_FLAG_PERSIST_ACROSS_FIRMWARE_UPDATE	BIT(4)
+#define CXL_FEAT_ENTRY_FLAG_SUPPORT_DEFAULT_SELECTION	BIT(5)
+#define CXL_FEAT_ENTRY_FLAG_SUPPORT_SAVED_SELECTION	BIT(6)
+
+enum cxl_feat_attr_value_persistence {
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_NONE,
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_CXL_RESET,
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_HOT_RESET,
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_WARM_RESET,
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_COLD_RESET,
+	CXL_FEAT_ATTR_VALUE_PERSISTENCE_MAX
+};
+
+struct cxl_feat_entry {
+	uuid_t uuid;
+	__le16 id;
+	__le16 get_feat_size;
+	__le16 set_feat_size;
+	__le32 attr_flags;
+	u8 get_feat_ver;
+	u8 set_feat_ver;
+	__le16 set_effects;
+	u8 reserved[18];
+} __packed;
+
+struct cxl_mbox_get_sup_feats_out {
+	__le16 num_entries;
+	__le16 supported_feats;
+	u8 reserved[4];
+	struct cxl_feat_entry ents[] __counted_by(le32_to_cpu(supported_feats));
+} __packed;
+
 int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
 			  struct cxl_mbox_cmd *cmd);
 int cxl_dev_state_identify(struct cxl_memdev_state *mds);
@@ -824,4 +871,8 @@ struct cxl_hdm {
 struct seq_file;
 struct dentry *cxl_debugfs_create_dir(const char *dir);
 void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds);
+
+int cxl_get_supported_features(struct cxl_dev_state *cxlds);
+int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t *feat_uuid,
+				    struct cxl_feat_entry *feat_entry_out);
 #endif /* __CXL_MEM_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 3c73de475bf3..cec88e3a1754 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -872,6 +872,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		return rc;
 
+	rc = cxl_get_supported_features(cxlds);
+	if (rc)
+		dev_dbg(&pdev->dev, "No features enumerated.\n");
+
 	rc = cxl_set_timestamp(mds);
 	if (rc)
 		return rc;
diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
index c6c0fe27495d..bd2535962f70 100644
--- a/include/uapi/linux/cxl_mem.h
+++ b/include/uapi/linux/cxl_mem.h
@@ -50,6 +50,7 @@
 	___C(GET_LOG_CAPS, "Get Log Capabilities"),			  \
 	___C(CLEAR_LOG, "Clear Log"),					  \
 	___C(GET_SUP_LOG_SUBLIST, "Get Supported Logs Sub-List"),	  \
+	___C(GET_SUPPORTED_FEATURES, "Get Supported Features"),		  \
 	___C(MAX, "invalid / last command")
 
 #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 08/17] cxl/mbox: Add GET_FEATURE mailbox command
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (6 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 07/17] cxl: Add Get Supported Features command for kernel usage shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-30 16:17   ` Fan Ni
  2024-09-11  9:04 ` [PATCH v12 09/17] cxl/mbox: Add SET_FEATURE " shiju.jose
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for GET_FEATURE mailbox command.

CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
The settings of a feature can be retrieved using Get Feature command.
CXL spec 3.1 section 8.2.9.6.2 describes Get Feature command.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/mbox.c | 41 +++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h    | 26 ++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index fe965ec5802f..3dfe411c6556 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -965,6 +965,47 @@ int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t *f
 }
 EXPORT_SYMBOL_NS_GPL(cxl_get_supported_feature_entry, CXL);
 
+size_t cxl_get_feature(struct cxl_dev_state *cxlds, const uuid_t feat_uuid,
+		       enum cxl_get_feat_selection selection,
+		       void *feat_out, size_t feat_out_size)
+{
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
+	size_t data_to_rd_size, size_out;
+	struct cxl_mbox_get_feat_in pi;
+	struct cxl_mbox_cmd mbox_cmd;
+	size_t data_rcvd_size = 0;
+	int rc;
+
+	if (!feat_out || !feat_out_size)
+		return 0;
+
+	size_out = min(feat_out_size, cxl_mbox->payload_size);
+	pi.uuid = feat_uuid;
+	pi.selection = selection;
+	do {
+		data_to_rd_size = min(feat_out_size - data_rcvd_size,
+				      cxl_mbox->payload_size);
+		pi.offset = cpu_to_le16(data_rcvd_size);
+		pi.count = cpu_to_le16(data_to_rd_size);
+
+		mbox_cmd = (struct cxl_mbox_cmd) {
+			.opcode = CXL_MBOX_OP_GET_FEATURE,
+			.size_in = sizeof(pi),
+			.payload_in = &pi,
+			.size_out = size_out,
+			.payload_out = feat_out + data_rcvd_size,
+			.min_out = data_to_rd_size,
+		};
+		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
+		if (rc < 0 || !mbox_cmd.size_out)
+			return 0;
+		data_rcvd_size += mbox_cmd.size_out;
+	} while (data_rcvd_size < feat_out_size);
+
+	return data_rcvd_size;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
+
 /**
  * cxl_enumerate_cmds() - Enumerate commands for a device.
  * @mds: The driver data for the operation
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 5d149e64c247..57c9294bb7f3 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -487,6 +487,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_CLEAR_LOG           = 0x0403,
 	CXL_MBOX_OP_GET_SUP_LOG_SUBLIST = 0x0405,
 	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
+	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
 	CXL_MBOX_OP_IDENTIFY		= 0x4000,
 	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
 	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
@@ -812,6 +813,28 @@ struct cxl_mbox_get_sup_feats_out {
 	struct cxl_feat_entry ents[] __counted_by(le32_to_cpu(supported_feats));
 } __packed;
 
+/*
+ * Get Feature CXL 3.1 Spec 8.2.9.6.2
+ */
+
+/*
+ * Get Feature input payload
+ * CXL rev 3.1 section 8.2.9.6.2 Table 8-99
+ */
+enum cxl_get_feat_selection {
+	CXL_GET_FEAT_SEL_CURRENT_VALUE,
+	CXL_GET_FEAT_SEL_DEFAULT_VALUE,
+	CXL_GET_FEAT_SEL_SAVED_VALUE,
+	CXL_GET_FEAT_SEL_MAX
+};
+
+struct cxl_mbox_get_feat_in {
+	uuid_t uuid;
+	__le16 offset;
+	__le16 count;
+	u8 selection;
+}  __packed;
+
 int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
 			  struct cxl_mbox_cmd *cmd);
 int cxl_dev_state_identify(struct cxl_memdev_state *mds);
@@ -875,4 +898,7 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds);
 int cxl_get_supported_features(struct cxl_dev_state *cxlds);
 int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t *feat_uuid,
 				    struct cxl_feat_entry *feat_entry_out);
+size_t cxl_get_feature(struct cxl_dev_state *cxlds, const uuid_t feat_uuid,
+		       enum cxl_get_feat_selection selection,
+		       void *feat_out, size_t feat_out_size);
 #endif /* __CXL_MEM_H__ */
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 09/17] cxl/mbox: Add SET_FEATURE mailbox command
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (7 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 08/17] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-30 16:58   ` Fan Ni
  2024-09-11  9:04 ` [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for SET_FEATURE mailbox command.

CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
CXL devices supports features with changeable attributes.
The settings of a feature can be optionally modified using Set Feature
command.
CXL spec 3.1 section 8.2.9.6.3 describes Set Feature command.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/mbox.c | 73 +++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h    | 34 +++++++++++++++++++
 2 files changed, 107 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 3dfe411c6556..806b1c8087b0 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1006,6 +1006,79 @@ size_t cxl_get_feature(struct cxl_dev_state *cxlds, const uuid_t feat_uuid,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
 
+/*
+ * FEAT_DATA_MIN_PAYLOAD_SIZE - min extra number of bytes should be
+ * available in the mailbox for storing the actual feature data so that
+ * the feature data transfer would work as expected.
+ */
+#define FEAT_DATA_MIN_PAYLOAD_SIZE 10
+int cxl_set_feature(struct cxl_dev_state *cxlds,
+		    const uuid_t feat_uuid, u8 feat_version,
+		    void *feat_data, size_t feat_data_size,
+		    u8 feat_flag)
+{
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
+	struct cxl_memdev_set_feat_pi {
+		struct cxl_mbox_set_feat_hdr hdr;
+		u8 feat_data[];
+	}  __packed;
+	size_t data_in_size, data_sent_size = 0;
+	struct cxl_mbox_cmd mbox_cmd;
+	size_t hdr_size;
+	int rc = 0;
+
+	struct cxl_memdev_set_feat_pi *pi __free(kfree) =
+					kmalloc(cxl_mbox->payload_size, GFP_KERNEL);
+	pi->hdr.uuid = feat_uuid;
+	pi->hdr.version = feat_version;
+	feat_flag &= ~CXL_SET_FEAT_FLAG_DATA_TRANSFER_MASK;
+	feat_flag |= CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET;
+	hdr_size = sizeof(pi->hdr);
+	/*
+	 * Check minimum mbox payload size is available for
+	 * the feature data transfer.
+	 */
+	if (hdr_size + FEAT_DATA_MIN_PAYLOAD_SIZE > cxl_mbox->payload_size)
+		return -ENOMEM;
+
+	if ((hdr_size + feat_data_size) <= cxl_mbox->payload_size) {
+		pi->hdr.flags = cpu_to_le32(feat_flag |
+				       CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER);
+		data_in_size = feat_data_size;
+	} else {
+		pi->hdr.flags = cpu_to_le32(feat_flag |
+				       CXL_SET_FEAT_FLAG_INITIATE_DATA_TRANSFER);
+		data_in_size = cxl_mbox->payload_size - hdr_size;
+	}
+
+	do {
+		pi->hdr.offset = cpu_to_le16(data_sent_size);
+		memcpy(pi->feat_data, feat_data + data_sent_size, data_in_size);
+		mbox_cmd = (struct cxl_mbox_cmd) {
+			.opcode = CXL_MBOX_OP_SET_FEATURE,
+			.size_in = hdr_size + data_in_size,
+			.payload_in = pi,
+		};
+		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
+		if (rc < 0)
+			return rc;
+
+		data_sent_size += data_in_size;
+		if (data_sent_size >= feat_data_size)
+			return 0;
+
+		if ((feat_data_size - data_sent_size) <= (cxl_mbox->payload_size - hdr_size)) {
+			data_in_size = feat_data_size - data_sent_size;
+			pi->hdr.flags = cpu_to_le32(feat_flag |
+					       CXL_SET_FEAT_FLAG_FINISH_DATA_TRANSFER);
+		} else {
+			pi->hdr.flags = cpu_to_le32(feat_flag |
+					       CXL_SET_FEAT_FLAG_CONTINUE_DATA_TRANSFER);
+		}
+	} while (true);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_set_feature, CXL);
+
 /**
  * cxl_enumerate_cmds() - Enumerate commands for a device.
  * @mds: The driver data for the operation
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 57c9294bb7f3..b565a061a4e3 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -488,6 +488,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_GET_SUP_LOG_SUBLIST = 0x0405,
 	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
 	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
+	CXL_MBOX_OP_SET_FEATURE		= 0x0502,
 	CXL_MBOX_OP_IDENTIFY		= 0x4000,
 	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
 	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
@@ -835,6 +836,35 @@ struct cxl_mbox_get_feat_in {
 	u8 selection;
 }  __packed;
 
+/*
+ * Set Feature CXL 3.1 Spec 8.2.9.6.3
+ */
+
+/*
+ * Set Feature input payload
+ * CXL rev 3.1 section 8.2.9.6.3 Table 8-101
+ */
+/* Set Feature : Payload in flags */
+#define CXL_SET_FEAT_FLAG_DATA_TRANSFER_MASK	GENMASK(2, 0)
+enum cxl_set_feat_flag_data_transfer {
+	CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER,
+	CXL_SET_FEAT_FLAG_INITIATE_DATA_TRANSFER,
+	CXL_SET_FEAT_FLAG_CONTINUE_DATA_TRANSFER,
+	CXL_SET_FEAT_FLAG_FINISH_DATA_TRANSFER,
+	CXL_SET_FEAT_FLAG_ABORT_DATA_TRANSFER,
+	CXL_SET_FEAT_FLAG_DATA_TRANSFER_MAX
+};
+
+#define CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET	BIT(3)
+
+struct cxl_mbox_set_feat_hdr {
+	uuid_t uuid;
+	__le32 flags;
+	__le16 offset;
+	u8 version;
+	u8 rsvd[9];
+}  __packed;
+
 int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
 			  struct cxl_mbox_cmd *cmd);
 int cxl_dev_state_identify(struct cxl_memdev_state *mds);
@@ -901,4 +931,8 @@ int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t *f
 size_t cxl_get_feature(struct cxl_dev_state *cxlds, const uuid_t feat_uuid,
 		       enum cxl_get_feat_selection selection,
 		       void *feat_out, size_t feat_out_size);
+int cxl_set_feature(struct cxl_dev_state *cxlds,
+		    const uuid_t feat_uuid, u8 feat_version,
+		    void *feat_data, size_t feat_data_size,
+		    u8 feat_flag);
 #endif /* __CXL_MEM_H__ */
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol scrub control feature
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (8 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 09/17] cxl/mbox: Add SET_FEATURE " shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-30 17:38   ` Fan Ni
  2024-10-01 19:47   ` Fan Ni
  2024-09-11  9:04 ` [PATCH v12 11/17] cxl/memfeature: Add CXL memory device ECS " shiju.jose
                   ` (6 subsequent siblings)
  16 siblings, 2 replies; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
feature. The device patrol scrub proactively locates and makes corrections
to errors in regular cycle.

Allow specifying the number of hours within which the patrol scrub must be
completed, subject to minimum and maximum limits reported by the device.
Also allow disabling scrub allowing trade-off error rates against
performance.

Add support for CXL memory device based patrol scrub control.
Register with EDAC RAS control feature driver, which gets the scrub attr
descriptors from the EDAC scrub and expose sysfs scrub control attributes
to the userspace.
For example CXL device based scrub control for the CXL mem0 device is
exposed in /sys/bus/edac/devices/cxl_mem0/scrub*/

Also add support for region based CXL memory patrol scrub control.
CXL memory region may be interleaved across one or more CXL memory devices.
For example region based scrub control for CXL region1 is exposed in
/sys/bus/edac/devices/cxl_region1/scrub*/

Open Questions:
Q1: CXL 3.1 spec defined patrol scrub control feature at CXL memory devices
with supporting set scrub cycle and enable/disable scrub. but not based on
HPA range. Thus presently scrub control for a region is implemented based
on all associated CXL memory devices.
What is the exact use case for the CXL region based scrub control?
How the HPA range, which Dan asked for region based scrubbing is used?
Does spec change is required for patrol scrub control feature with support
for setting the HPA range?

Q2: Both CXL device based and CXL region based scrub control would be
enabled at the same time in a system?

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/edac/edac-scrub.rst |  74 ++++++
 drivers/cxl/Kconfig               |  18 ++
 drivers/cxl/core/Makefile         |   1 +
 drivers/cxl/core/memfeature.c     | 372 ++++++++++++++++++++++++++++++
 drivers/cxl/core/region.c         |   6 +
 drivers/cxl/cxlmem.h              |   7 +
 drivers/cxl/mem.c                 |   4 +
 7 files changed, 482 insertions(+)
 create mode 100644 Documentation/edac/edac-scrub.rst
 create mode 100644 drivers/cxl/core/memfeature.c

diff --git a/Documentation/edac/edac-scrub.rst b/Documentation/edac/edac-scrub.rst
new file mode 100644
index 000000000000..243035957e99
--- /dev/null
+++ b/Documentation/edac/edac-scrub.rst
@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================
+EDAC Scrub control
+===================
+
+Copyright (c) 2024 HiSilicon Limited.
+
+:Author:   Shiju Jose <shiju.jose@huawei.com>
+:License:  The GNU Free Documentation License, Version 1.2
+          (dual licensed under the GPL v2)
+:Original Reviewers:
+
+- Written for: 6.12
+- Updated for:
+
+Introduction
+------------
+The EDAC enhancement for RAS featurues exposes interfaces for controlling
+the memory scrubbers in the system. The scrub device drivers in the
+system register with the EDAC scrub. The driver exposes the
+scrub controls to user in the sysfs.
+
+The File System
+---------------
+
+The control attributes of the registered scrubber instance could be
+accessed in the /sys/bus/edac/devices/<dev-name>/scrub*/
+
+sysfs
+-----
+
+Sysfs files are documented in
+`Documentation/ABI/testing/sysfs-edac-scrub-control`.
+
+Example
+-------
+
+The usage takes the form shown in this example::
+
+1. CXL memory device patrol scrubber
+1.1 device based
+root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/min_cycle_duration
+3600
+root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/max_cycle_duration
+918000
+root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
+43200
+root@localhost:~# echo 54000 > /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
+root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
+54000
+root@localhost:~# echo 1 > /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
+root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
+1
+root@localhost:~# echo 0 > /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
+root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
+0
+
+1.2. region based
+root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/min_cycle_duration
+3600
+root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/max_cycle_duration
+918000
+root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
+43200
+root@localhost:~# echo 54000 > /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
+root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
+54000
+root@localhost:~# echo 1 > /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
+root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
+1
+root@localhost:~# echo 0 > /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
+root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
+0
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 99b5c25be079..394bdbc4de87 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -145,4 +145,22 @@ config CXL_REGION_INVALIDATION_TEST
 	  If unsure, or if this kernel is meant for production environments,
 	  say N.
 
+config CXL_RAS_FEAT
+	bool "CXL: Memory RAS features"
+	depends on CXL_PCI
+	depends on CXL_MEM
+	depends on EDAC
+	help
+	  The CXL memory RAS feature control is optional allows host to control
+	  the RAS features configurations of CXL Type 3 devices.
+
+	  Registers with the EDAC device subsystem to expose control attributes
+	  of CXL memory device's RAS features to the user.
+	  Provides interface functions to support configuring the CXL memory
+	  device's RAS features.
+
+	  Say 'y/n' to enable/disable CXL.mem device'ss RAS features control.
+	  See section 8.2.9.9.11 of CXL 3.1 specification for the detailed
+	  information of CXL memory device features.
+
 endif
diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index 9259bcc6773c..2a3c7197bc23 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -16,3 +16,4 @@ cxl_core-y += pmu.o
 cxl_core-y += cdat.o
 cxl_core-$(CONFIG_TRACING) += trace.o
 cxl_core-$(CONFIG_CXL_REGION) += region.o
+cxl_core-$(CONFIG_CXL_RAS_FEAT) += memfeature.o
diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c
new file mode 100644
index 000000000000..90c68d20b02b
--- /dev/null
+++ b/drivers/cxl/core/memfeature.c
@@ -0,0 +1,372 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * CXL memory RAS feature driver.
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ *
+ *  - Supports functions to configure RAS features of the
+ *    CXL memory devices.
+ *  - Registers with the EDAC device subsystem driver to expose
+ *    the features sysfs attributes to the user for configuring
+ *    CXL memory RAS feature.
+ */
+
+#define pr_fmt(fmt)	"CXL MEM FEAT: " fmt
+
+#include <cxlmem.h>
+#include <linux/cleanup.h>
+#include <linux/limits.h>
+#include <cxl.h>
+#include <linux/edac.h>
+
+#define CXL_DEV_NUM_RAS_FEATURES	1
+#define CXL_DEV_HOUR_IN_SECS	3600
+
+#define CXL_SCRUB_NAME_LEN	128
+
+/* CXL memory patrol scrub control definitions */
+static const uuid_t cxl_patrol_scrub_uuid =
+	UUID_INIT(0x96dad7d6, 0xfde8, 0x482b, 0xa7, 0x33, 0x75, 0x77, 0x4e,     \
+		  0x06, 0xdb, 0x8a);
+
+/* CXL memory patrol scrub control functions */
+struct cxl_patrol_scrub_context {
+	u8 instance;
+	u16 get_feat_size;
+	u16 set_feat_size;
+	u8 get_version;
+	u8 set_version;
+	u16 set_effects;
+	struct cxl_memdev *cxlmd;
+	struct cxl_region *cxlr;
+};
+
+/**
+ * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data structure.
+ * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
+ * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is changeable.
+ * @scrub_cycle_hrs:    [IN] Requested patrol scrub cycle in hours.
+ *                      [OUT] Current patrol scrub cycle in hours.
+ * @min_scrub_cycle_hrs:[OUT] minimum patrol scrub cycle in hours supported.
+ */
+struct cxl_memdev_ps_params {
+	bool enable;
+	bool scrub_cycle_changeable;
+	u16 scrub_cycle_hrs;
+	u16 min_scrub_cycle_hrs;
+};
+
+enum cxl_scrub_param {
+	CXL_PS_PARAM_ENABLE,
+	CXL_PS_PARAM_SCRUB_CYCLE,
+};
+
+#define	CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK	BIT(0)
+#define	CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK	BIT(1)
+#define	CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK	GENMASK(7, 0)
+#define	CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK	GENMASK(15, 8)
+#define	CXL_MEMDEV_PS_FLAG_ENABLED_MASK	BIT(0)
+
+struct cxl_memdev_ps_rd_attrs {
+	u8 scrub_cycle_cap;
+	__le16 scrub_cycle_hrs;
+	u8 scrub_flags;
+}  __packed;
+
+struct cxl_memdev_ps_wr_attrs {
+	u8 scrub_cycle_hrs;
+	u8 scrub_flags;
+}  __packed;
+
+static int cxl_mem_ps_get_attrs(struct cxl_dev_state *cxlds,
+				struct cxl_memdev_ps_params *params)
+{
+	size_t rd_data_size = sizeof(struct cxl_memdev_ps_rd_attrs);
+	size_t data_size;
+	struct cxl_memdev_ps_rd_attrs *rd_attrs __free(kfree) =
+						kmalloc(rd_data_size, GFP_KERNEL);
+	if (!rd_attrs)
+		return -ENOMEM;
+
+	data_size = cxl_get_feature(cxlds, cxl_patrol_scrub_uuid,
+				    CXL_GET_FEAT_SEL_CURRENT_VALUE,
+				    rd_attrs, rd_data_size);
+	if (!data_size)
+		return -EIO;
+
+	params->scrub_cycle_changeable = FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
+						   rd_attrs->scrub_cycle_cap);
+	params->enable = FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+				   rd_attrs->scrub_flags);
+	params->scrub_cycle_hrs = FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+					    rd_attrs->scrub_cycle_hrs);
+	params->min_scrub_cycle_hrs = FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK,
+						rd_attrs->scrub_cycle_hrs);
+
+	return 0;
+}
+
+static int cxl_ps_get_attrs(struct device *dev, void *drv_data,
+			    struct cxl_memdev_ps_params *params)
+{
+	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
+	struct cxl_memdev *cxlmd;
+	struct cxl_dev_state *cxlds;
+	u16 min_scrub_cycle = 0;
+	int i, ret;
+
+	if (cxl_ps_ctx->cxlr) {
+		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
+		struct cxl_region_params *p = &cxlr->params;
+
+		for (i = p->interleave_ways - 1; i >= 0; i--) {
+			struct cxl_endpoint_decoder *cxled = p->targets[i];
+
+			cxlmd = cxled_to_memdev(cxled);
+			cxlds = cxlmd->cxlds;
+			ret = cxl_mem_ps_get_attrs(cxlds, params);
+			if (ret)
+				return ret;
+
+			if (params->min_scrub_cycle_hrs > min_scrub_cycle)
+				min_scrub_cycle = params->min_scrub_cycle_hrs;
+		}
+		params->min_scrub_cycle_hrs = min_scrub_cycle;
+		return 0;
+	}
+	cxlmd = cxl_ps_ctx->cxlmd;
+	cxlds = cxlmd->cxlds;
+
+	return cxl_mem_ps_get_attrs(cxlds, params);
+}
+
+static int cxl_mem_ps_set_attrs(struct device *dev, void *drv_data,
+				struct cxl_dev_state *cxlds,
+				struct cxl_memdev_ps_params *params,
+				enum cxl_scrub_param param_type)
+{
+	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
+	struct cxl_memdev_ps_wr_attrs wr_attrs;
+	struct cxl_memdev_ps_params rd_params;
+	int ret;
+
+	ret = cxl_mem_ps_get_attrs(cxlds, &rd_params);
+	if (ret) {
+		dev_err(dev, "Get cxlmemdev patrol scrub params failed ret=%d\n",
+			ret);
+		return ret;
+	}
+
+	switch (param_type) {
+	case CXL_PS_PARAM_ENABLE:
+		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+						  params->enable);
+		wr_attrs.scrub_cycle_hrs = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+						      rd_params.scrub_cycle_hrs);
+		break;
+	case CXL_PS_PARAM_SCRUB_CYCLE:
+		if (params->scrub_cycle_hrs < rd_params.min_scrub_cycle_hrs) {
+			dev_err(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
+				params->scrub_cycle_hrs);
+			dev_err(dev, "Minimum supported CXL patrol scrub cycle in hour %d\n",
+				params->min_scrub_cycle_hrs);
+			return -EINVAL;
+		}
+		wr_attrs.scrub_cycle_hrs = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
+						      params->scrub_cycle_hrs);
+		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
+						  rd_params.enable);
+		break;
+	}
+
+	ret = cxl_set_feature(cxlds, cxl_patrol_scrub_uuid,
+			      cxl_ps_ctx->set_version,
+			      &wr_attrs, sizeof(wr_attrs),
+			      CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
+	if (ret) {
+		dev_err(dev, "CXL patrol scrub set feature failed ret=%d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int cxl_ps_set_attrs(struct device *dev, void *drv_data,
+			    struct cxl_memdev_ps_params *params,
+			    enum cxl_scrub_param param_type)
+{
+	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
+	struct cxl_memdev *cxlmd;
+	struct cxl_dev_state *cxlds;
+	int ret, i;
+
+	if (cxl_ps_ctx->cxlr) {
+		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
+		struct cxl_region_params *p = &cxlr->params;
+
+		for (i = p->interleave_ways - 1; i >= 0; i--) {
+			struct cxl_endpoint_decoder *cxled = p->targets[i];
+
+			cxlmd = cxled_to_memdev(cxled);
+			cxlds = cxlmd->cxlds;
+			ret = cxl_mem_ps_set_attrs(dev, drv_data, cxlds,
+						   params, param_type);
+			if (ret)
+				return ret;
+		}
+	} else {
+		cxlmd = cxl_ps_ctx->cxlmd;
+		cxlds = cxlmd->cxlds;
+
+		return cxl_mem_ps_set_attrs(dev, drv_data, cxlds, params, param_type);
+	}
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, void *drv_data, bool *enabled)
+{
+	struct cxl_memdev_ps_params params;
+	int ret;
+
+	ret = cxl_ps_get_attrs(dev, drv_data, &params);
+	if (ret)
+		return ret;
+
+	*enabled = params.enable;
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable)
+{
+	struct cxl_memdev_ps_params params = {
+		.enable = enable,
+	};
+
+	return cxl_ps_set_attrs(dev, drv_data, &params, CXL_PS_PARAM_ENABLE);
+}
+
+static int cxl_patrol_scrub_read_min_scrub_cycle(struct device *dev, void *drv_data,
+						 u32 *min)
+{
+	struct cxl_memdev_ps_params params;
+	int ret;
+
+	ret = cxl_ps_get_attrs(dev, drv_data, &params);
+	if (ret)
+		return ret;
+	*min = params.min_scrub_cycle_hrs * CXL_DEV_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_read_max_scrub_cycle(struct device *dev, void *drv_data,
+						 u32 *max)
+{
+	*max = U8_MAX * CXL_DEV_HOUR_IN_SECS; /* Max set by register size */
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_read_scrub_cycle(struct device *dev, void *drv_data,
+					     u32 *scrub_cycle_secs)
+{
+	struct cxl_memdev_ps_params params;
+	int ret;
+
+	ret = cxl_ps_get_attrs(dev, drv_data, &params);
+	if (ret)
+		return ret;
+
+	*scrub_cycle_secs = params.scrub_cycle_hrs * CXL_DEV_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int cxl_patrol_scrub_write_scrub_cycle(struct device *dev, void *drv_data,
+					      u32 scrub_cycle_secs)
+{
+	struct cxl_memdev_ps_params params = {
+		.scrub_cycle_hrs = scrub_cycle_secs / CXL_DEV_HOUR_IN_SECS,
+	};
+
+	return cxl_ps_set_attrs(dev, drv_data, &params, CXL_PS_PARAM_SCRUB_CYCLE);
+}
+
+static const struct edac_scrub_ops cxl_ps_scrub_ops = {
+	.get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
+	.set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
+	.min_cycle_read = cxl_patrol_scrub_read_min_scrub_cycle,
+	.max_cycle_read = cxl_patrol_scrub_read_max_scrub_cycle,
+	.cycle_duration_read = cxl_patrol_scrub_read_scrub_cycle,
+	.cycle_duration_write = cxl_patrol_scrub_write_scrub_cycle,
+};
+
+int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
+{
+	struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
+	struct cxl_dev_state *cxlds;
+	struct cxl_patrol_scrub_context *cxl_ps_ctx;
+	struct cxl_feat_entry feat_entry;
+	char cxl_dev_name[CXL_SCRUB_NAME_LEN];
+	int rc, i, num_ras_features = 0;
+
+	if (cxlr) {
+		struct cxl_region_params *p = &cxlr->params;
+
+		for (i = p->interleave_ways - 1; i >= 0; i--) {
+			struct cxl_endpoint_decoder *cxled = p->targets[i];
+
+			cxlmd = cxled_to_memdev(cxled);
+			cxlds = cxlmd->cxlds;
+			memset(&feat_entry, 0, sizeof(feat_entry));
+			rc = cxl_get_supported_feature_entry(cxlds, &cxl_patrol_scrub_uuid,
+							     &feat_entry);
+			if (rc < 0)
+				return rc;
+			if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
+				return -EOPNOTSUPP;
+		}
+	} else {
+		cxlds = cxlmd->cxlds;
+		rc = cxl_get_supported_feature_entry(cxlds, &cxl_patrol_scrub_uuid,
+						     &feat_entry);
+		if (rc < 0)
+			return rc;
+
+		if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
+			return -EOPNOTSUPP;
+	}
+
+	cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
+	if (!cxl_ps_ctx)
+		return -ENOMEM;
+
+	*cxl_ps_ctx = (struct cxl_patrol_scrub_context) {
+		.instance = cxl_ps_ctx->instance,
+		.get_feat_size = feat_entry.get_feat_size,
+		.set_feat_size = feat_entry.set_feat_size,
+		.get_version = feat_entry.get_feat_ver,
+		.set_version = feat_entry.set_feat_ver,
+		.set_effects = feat_entry.set_effects,
+	};
+	if (cxlr) {
+		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
+			 "cxl_region%d", cxlr->id);
+		cxl_ps_ctx->cxlr = cxlr;
+	} else {
+		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
+			 "%s_%s", "cxl", dev_name(&cxlmd->dev));
+		cxl_ps_ctx->cxlmd = cxlmd;
+	}
+
+	ras_features[num_ras_features].ft_type = RAS_FEAT_SCRUB;
+	ras_features[num_ras_features].scrub_ops = &cxl_ps_scrub_ops;
+	ras_features[num_ras_features].ctx = cxl_ps_ctx;
+	num_ras_features++;
+
+	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
+				 num_ras_features, ras_features);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_ras_features_init, CXL);
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 21ad5f242875..1cc29ec9ffac 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3434,6 +3434,12 @@ static int cxl_region_probe(struct device *dev)
 					p->res->start, p->res->end, cxlr,
 					is_system_ram) > 0)
 			return 0;
+
+		rc = cxl_mem_ras_features_init(NULL, cxlr);
+		if (rc)
+			dev_warn(&cxlr->dev, "CXL RAS features init for region_id=%d failed\n",
+				 cxlr->id);
+
 		return devm_cxl_add_dax_region(cxlr);
 	default:
 		dev_dbg(&cxlr->dev, "unsupported region mode: %d\n",
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index b565a061a4e3..2187c3378eaa 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -889,6 +889,13 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
 int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
 int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
 
+#ifdef CONFIG_CXL_RAS_FEAT
+int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr);
+#else
+static inline int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
+{ return 0; }
+#endif
+
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
 void cxl_mem_active_dec(void);
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 7de232eaeb17..be2e69548909 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -117,6 +117,10 @@ static int cxl_mem_probe(struct device *dev)
 	if (!cxlds->media_ready)
 		return -EBUSY;
 
+	rc = cxl_mem_ras_features_init(cxlmd, NULL);
+	if (rc)
+		dev_warn(&cxlmd->dev, "CXL RAS features init failed\n");
+
 	/*
 	 * Someone is trying to reattach this device after it lost its port
 	 * connection (an endpoint port previously registered by this memdev was
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 11/17] cxl/memfeature: Add CXL memory device ECS control feature
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (9 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-30 18:12   ` Fan Ni
  2024-09-11  9:04 ` [PATCH v12 12/17] platform: Add __free() based cleanup function for platform_device_put shiju.jose
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

CXL spec 3.1 section 8.2.9.9.11.2 describes the DDR5 ECS (Error Check
Scrub) control feature.
The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
Specification (JESD79-5) and allows the DRAM to internally read, correct
single-bit errors, and write back corrected data bits to the DRAM array
while providing transparency to error counts.

The ECS control allows the requester to change the log entry type, the ECS
threshold count provided that the request is within the definition
specified in DDR5 mode registers, change mode between codeword mode and
row count mode, and reset the ECS counter.

Register with EDAC RAS control feature driver, which gets the ECS attr
descriptors from the EDAC ECS and expose sysfs ECS control attributes
to the userspace.
For example ECS control for the memory media FRU 0 in CXL mem0 device is
in /sys/bus/edac/devices/cxl_mem0/ecs_fru0/

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/memfeature.c | 439 +++++++++++++++++++++++++++++++++-
 1 file changed, 438 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c
index 90c68d20b02b..5d4057fa304c 100644
--- a/drivers/cxl/core/memfeature.c
+++ b/drivers/cxl/core/memfeature.c
@@ -19,7 +19,7 @@
 #include <cxl.h>
 #include <linux/edac.h>
 
-#define CXL_DEV_NUM_RAS_FEATURES	1
+#define CXL_DEV_NUM_RAS_FEATURES	2
 #define CXL_DEV_HOUR_IN_SECS	3600
 
 #define CXL_SCRUB_NAME_LEN	128
@@ -303,6 +303,405 @@ static const struct edac_scrub_ops cxl_ps_scrub_ops = {
 	.cycle_duration_write = cxl_patrol_scrub_write_scrub_cycle,
 };
 
+/* CXL DDR5 ECS control definitions */
+static const uuid_t cxl_ecs_uuid =
+	UUID_INIT(0xe5b13f22, 0x2328, 0x4a14, 0xb8, 0xba, 0xb9, 0x69, 0x1e,     \
+		  0x89, 0x33, 0x86);
+
+struct cxl_ecs_context {
+	u16 num_media_frus;
+	u16 get_feat_size;
+	u16 set_feat_size;
+	u8 get_version;
+	u8 set_version;
+	u16 set_effects;
+	struct cxl_memdev *cxlmd;
+};
+
+enum {
+	CXL_ECS_PARAM_LOG_ENTRY_TYPE,
+	CXL_ECS_PARAM_THRESHOLD,
+	CXL_ECS_PARAM_MODE,
+	CXL_ECS_PARAM_RESET_COUNTER,
+};
+
+#define	CXL_ECS_LOG_ENTRY_TYPE_MASK	GENMASK(1, 0)
+#define	CXL_ECS_REALTIME_REPORT_CAP_MASK	BIT(0)
+#define	CXL_ECS_THRESHOLD_COUNT_MASK	GENMASK(2, 0)
+#define	CXL_ECS_MODE_MASK	BIT(3)
+#define	CXL_ECS_RESET_COUNTER_MASK	BIT(4)
+
+static const u16 ecs_supp_threshold[] = { 0, 0, 0, 256, 1024, 4096 };
+
+enum {
+	ECS_LOG_ENTRY_TYPE_DRAM = 0x0,
+	ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU = 0x1,
+};
+
+enum {
+	ECS_THRESHOLD_256 = 3,
+	ECS_THRESHOLD_1024 = 4,
+	ECS_THRESHOLD_4096 = 5,
+};
+
+enum cxl_ecs_mode {
+	ECS_MODE_COUNTS_ROWS = 0,
+	ECS_MODE_COUNTS_CODEWORDS = 1,
+};
+
+/**
+ * struct cxl_ecs_params - CXL memory DDR5 ECS parameter data structure.
+ * @log_entry_type: ECS log entry type, per DRAM or per memory media FRU.
+ * @threshold: ECS threshold count per GB of memory cells.
+ * @mode:	codeword/row count mode
+ *		0 : ECS counts rows with errors
+ *		1 : ECS counts codeword with errors
+ * @reset_counter: [IN] reset ECC counter to default value.
+ */
+struct cxl_ecs_params {
+	u8 log_entry_type;
+	u16 threshold;
+	enum cxl_ecs_mode mode;
+	bool reset_counter;
+};
+
+struct cxl_ecs_rd_attrs {
+	u8 ecs_log_cap;
+	u8 ecs_cap;
+	__le16 ecs_config;
+	u8 ecs_flags;
+}  __packed;
+
+struct cxl_ecs_wr_attrs {
+	u8 ecs_log_cap;
+	__le16 ecs_config;
+}  __packed;
+
+/* CXL DDR5 ECS control functions */
+static int cxl_mem_ecs_get_attrs(struct device *dev, void *drv_data, int fru_id,
+				 struct cxl_ecs_params *params)
+{
+	struct cxl_ecs_context *cxl_ecs_ctx = drv_data;
+	struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	size_t rd_data_size;
+	u8 threshold_index;
+	size_t data_size;
+
+	rd_data_size = cxl_ecs_ctx->get_feat_size;
+
+	struct cxl_ecs_rd_attrs *rd_attrs __free(kfree) =
+					kmalloc(rd_data_size, GFP_KERNEL);
+	if (!rd_attrs)
+		return -ENOMEM;
+
+	params->log_entry_type = 0;
+	params->threshold = 0;
+	params->mode = 0;
+	data_size = cxl_get_feature(cxlds, cxl_ecs_uuid,
+				    CXL_GET_FEAT_SEL_CURRENT_VALUE,
+				    rd_attrs, rd_data_size);
+	if (!data_size)
+		return -EIO;
+
+	params->log_entry_type = FIELD_GET(CXL_ECS_LOG_ENTRY_TYPE_MASK,
+					   rd_attrs[fru_id].ecs_log_cap);
+	threshold_index = FIELD_GET(CXL_ECS_THRESHOLD_COUNT_MASK,
+				    rd_attrs[fru_id].ecs_config);
+	params->threshold = ecs_supp_threshold[threshold_index];
+	params->mode = FIELD_GET(CXL_ECS_MODE_MASK,
+				 rd_attrs[fru_id].ecs_config);
+	return 0;
+}
+
+static int cxl_mem_ecs_set_attrs(struct device *dev, void *drv_data, int fru_id,
+				 struct cxl_ecs_params *params, u8 param_type)
+{
+	struct cxl_ecs_context *cxl_ecs_ctx = drv_data;
+	struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	size_t rd_data_size, wr_data_size;
+	u16 num_media_frus, count;
+	size_t data_size;
+	int ret;
+
+	num_media_frus = cxl_ecs_ctx->num_media_frus;
+	rd_data_size = cxl_ecs_ctx->get_feat_size;
+	wr_data_size = cxl_ecs_ctx->set_feat_size;
+	struct cxl_ecs_rd_attrs *rd_attrs __free(kfree) =
+				kmalloc(rd_data_size, GFP_KERNEL);
+	if (!rd_attrs)
+		return -ENOMEM;
+
+	data_size = cxl_get_feature(cxlds, cxl_ecs_uuid,
+				    CXL_GET_FEAT_SEL_CURRENT_VALUE,
+				    rd_attrs, rd_data_size);
+	if (!data_size)
+		return -EIO;
+	struct cxl_ecs_wr_attrs *wr_attrs __free(kfree) =
+					kmalloc(wr_data_size, GFP_KERNEL);
+	if (!wr_attrs)
+		return -ENOMEM;
+
+	/* Fill writable attributes from the current attributes
+	 * read for all the media FRUs.
+	 */
+	for (count = 0; count < num_media_frus; count++) {
+		wr_attrs[count].ecs_log_cap = rd_attrs[count].ecs_log_cap;
+	wr_attrs[count].ecs_config = rd_attrs[count].ecs_config;
+	}
+
+	/* Fill attribute to be set for the media FRU */
+	switch (param_type) {
+	case CXL_ECS_PARAM_LOG_ENTRY_TYPE:
+		if (params->log_entry_type != ECS_LOG_ENTRY_TYPE_DRAM &&
+		    params->log_entry_type != ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU) {
+			dev_err(dev,
+				"Invalid CXL ECS scrub log entry type(%d) to set\n",
+			       params->log_entry_type);
+			dev_err(dev,
+				"Log Entry Type 0: per DRAM  1: per Memory Media FRU\n");
+			return -EINVAL;
+		}
+		wr_attrs[fru_id].ecs_log_cap = FIELD_PREP(CXL_ECS_LOG_ENTRY_TYPE_MASK,
+							  params->log_entry_type);
+		break;
+	case CXL_ECS_PARAM_THRESHOLD:
+		wr_attrs[fru_id].ecs_config &= ~CXL_ECS_THRESHOLD_COUNT_MASK;
+		switch (params->threshold) {
+		case 256:
+			wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
+								  ECS_THRESHOLD_256);
+			break;
+		case 1024:
+			wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
+								  ECS_THRESHOLD_1024);
+			break;
+		case 4096:
+			wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
+								  ECS_THRESHOLD_4096);
+			break;
+		default:
+			dev_err(dev,
+				"Invalid CXL ECS scrub threshold count(%d) to set\n",
+				params->threshold);
+			dev_err(dev,
+				"Supported scrub threshold count: 256,1024,4096\n");
+			return -EINVAL;
+		}
+		break;
+	case CXL_ECS_PARAM_MODE:
+		if (params->mode != ECS_MODE_COUNTS_ROWS &&
+		    params->mode != ECS_MODE_COUNTS_CODEWORDS) {
+			dev_err(dev,
+				"Invalid CXL ECS scrub mode(%d) to set\n",
+				params->mode);
+			dev_err(dev,
+				"Mode 0: ECS counts rows with errors"
+				" 1: ECS counts codewords with errors\n");
+			return -EINVAL;
+		}
+		wr_attrs[fru_id].ecs_config &= ~CXL_ECS_MODE_MASK;
+		wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_MODE_MASK,
+							  params->mode);
+		break;
+	case CXL_ECS_PARAM_RESET_COUNTER:
+		wr_attrs[fru_id].ecs_config &= ~CXL_ECS_RESET_COUNTER_MASK;
+		wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_RESET_COUNTER_MASK,
+							  params->reset_counter);
+		break;
+	default:
+		dev_err(dev, "Invalid CXL ECS parameter to set\n");
+		return -EINVAL;
+	}
+
+	ret = cxl_set_feature(cxlds, cxl_ecs_uuid, cxl_ecs_ctx->set_version,
+			      wr_attrs, wr_data_size,
+			      CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
+	if (ret) {
+		dev_err(dev, "CXL ECS set feature failed ret=%d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int cxl_ecs_get_log_entry_type(struct device *dev, void *drv_data,
+				      int fru_id, u32 *val)
+{
+	struct cxl_ecs_params params;
+	int ret;
+
+	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
+	if (ret)
+		return ret;
+
+	*val = params.log_entry_type;
+
+	return 0;
+}
+
+static int cxl_ecs_set_log_entry_type(struct device *dev, void *drv_data,
+				      int fru_id, u32 val)
+{
+	struct cxl_ecs_params params = {
+		.log_entry_type = val,
+	};
+
+	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
+				     &params, CXL_ECS_PARAM_LOG_ENTRY_TYPE);
+}
+
+static int cxl_ecs_get_log_entry_type_per_dram(struct device *dev, void *drv_data,
+					       int fru_id, u32 *val)
+{
+	struct cxl_ecs_params params;
+	int ret;
+
+	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
+	if (ret)
+		return ret;
+
+	if (params.log_entry_type == ECS_LOG_ENTRY_TYPE_DRAM)
+		*val = 1;
+	else
+		*val = 0;
+
+	return 0;
+}
+
+static int cxl_ecs_get_log_entry_type_per_memory_media(struct device *dev,
+						       void *drv_data,
+						       int fru_id, u32 *val)
+{
+	struct cxl_ecs_params params;
+	int ret;
+
+	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
+	if (ret)
+		return ret;
+
+	if (params.log_entry_type == ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU)
+		*val = 1;
+	else
+		*val = 0;
+
+	return 0;
+}
+
+static int cxl_ecs_get_mode(struct device *dev, void *drv_data,
+			    int fru_id, u32 *val)
+{
+	struct cxl_ecs_params params;
+	int ret;
+
+	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
+	if (ret)
+		return ret;
+
+	*val = params.mode;
+
+	return 0;
+}
+
+static int cxl_ecs_set_mode(struct device *dev, void *drv_data,
+			    int fru_id, u32 val)
+{
+	struct cxl_ecs_params params = {
+		.mode = val,
+	};
+
+	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
+				     &params, CXL_ECS_PARAM_MODE);
+}
+
+static int cxl_ecs_get_mode_counts_rows(struct device *dev, void *drv_data,
+					int fru_id, u32 *val)
+{
+	struct cxl_ecs_params params;
+	int ret;
+
+	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
+	if (ret)
+		return ret;
+
+	if (params.mode == ECS_MODE_COUNTS_ROWS)
+		*val = 1;
+	else
+		*val = 0;
+
+	return 0;
+}
+
+static int cxl_ecs_get_mode_counts_codewords(struct device *dev, void *drv_data,
+					     int fru_id, u32 *val)
+{
+	struct cxl_ecs_params params;
+	int ret;
+
+	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
+	if (ret)
+		return ret;
+
+	if (params.mode == ECS_MODE_COUNTS_CODEWORDS)
+		*val = 1;
+	else
+		*val = 0;
+
+	return 0;
+}
+
+static int cxl_ecs_reset(struct device *dev, void *drv_data, int fru_id, u32 val)
+{
+	struct cxl_ecs_params params = {
+		.reset_counter = val,
+	};
+
+	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
+				     &params, CXL_ECS_PARAM_RESET_COUNTER);
+}
+
+static int cxl_ecs_get_threshold(struct device *dev, void *drv_data,
+				 int fru_id, u32 *val)
+{
+	struct cxl_ecs_params params;
+	int ret;
+
+	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
+	if (ret)
+		return ret;
+
+	*val = params.threshold;
+
+	return 0;
+}
+
+static int cxl_ecs_set_threshold(struct device *dev, void *drv_data,
+				 int fru_id, u32 val)
+{
+	struct cxl_ecs_params params = {
+		.threshold = val,
+	};
+
+	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
+				     &params, CXL_ECS_PARAM_THRESHOLD);
+}
+
+static const struct edac_ecs_ops cxl_ecs_ops = {
+	.get_log_entry_type = cxl_ecs_get_log_entry_type,
+	.set_log_entry_type = cxl_ecs_set_log_entry_type,
+	.get_log_entry_type_per_dram = cxl_ecs_get_log_entry_type_per_dram,
+	.get_log_entry_type_per_memory_media =
+				cxl_ecs_get_log_entry_type_per_memory_media,
+	.get_mode = cxl_ecs_get_mode,
+	.set_mode = cxl_ecs_set_mode,
+	.get_mode_counts_codewords = cxl_ecs_get_mode_counts_codewords,
+	.get_mode_counts_rows = cxl_ecs_get_mode_counts_rows,
+	.reset = cxl_ecs_reset,
+	.get_threshold = cxl_ecs_get_threshold,
+	.set_threshold = cxl_ecs_set_threshold,
+};
+
 int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
 {
 	struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
@@ -310,7 +709,9 @@ int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
 	struct cxl_patrol_scrub_context *cxl_ps_ctx;
 	struct cxl_feat_entry feat_entry;
 	char cxl_dev_name[CXL_SCRUB_NAME_LEN];
+	struct cxl_ecs_context *cxl_ecs_ctx;
 	int rc, i, num_ras_features = 0;
+	int num_media_frus;
 
 	if (cxlr) {
 		struct cxl_region_params *p = &cxlr->params;
@@ -366,6 +767,42 @@ int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
 	ras_features[num_ras_features].ctx = cxl_ps_ctx;
 	num_ras_features++;
 
+	if (!cxlr) {
+		rc = cxl_get_supported_feature_entry(cxlds, &cxl_ecs_uuid,
+						     &feat_entry);
+		if (rc < 0)
+			goto feat_register;
+
+		if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
+			goto feat_register;
+		num_media_frus = feat_entry.get_feat_size /
+					sizeof(struct cxl_ecs_rd_attrs);
+		if (!num_media_frus)
+			goto feat_register;
+
+		cxl_ecs_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ecs_ctx),
+					   GFP_KERNEL);
+		if (!cxl_ecs_ctx)
+			goto feat_register;
+		*cxl_ecs_ctx = (struct cxl_ecs_context) {
+			.get_feat_size = feat_entry.get_feat_size,
+			.set_feat_size = feat_entry.set_feat_size,
+			.get_version = feat_entry.get_feat_ver,
+			.set_version = feat_entry.set_feat_ver,
+			.set_effects = feat_entry.set_effects,
+			.num_media_frus = num_media_frus,
+			.cxlmd = cxlmd,
+		};
+
+		ras_features[num_ras_features].ft_type = RAS_FEAT_ECS;
+		ras_features[num_ras_features].ecs_ops = &cxl_ecs_ops;
+		ras_features[num_ras_features].ctx = cxl_ecs_ctx;
+		ras_features[num_ras_features].ecs_info.num_media_frus =
+								num_media_frus;
+		num_ras_features++;
+	}
+
+feat_register:
 	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
 				 num_ras_features, ras_features);
 }
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 12/17] platform: Add __free() based cleanup function for platform_device_put
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (10 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 11/17] cxl/memfeature: Add CXL memory device ECS " shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-11  9:04 ` [PATCH v12 13/17] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Add __free() based cleanup function for platform_device_put().

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 include/linux/platform_device.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index d422db6eec63..606533b88f44 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -232,6 +232,7 @@ extern int platform_device_add_data(struct platform_device *pdev,
 extern int platform_device_add(struct platform_device *pdev);
 extern void platform_device_del(struct platform_device *pdev);
 extern void platform_device_put(struct platform_device *pdev);
+DEFINE_FREE(platform_device_put, struct platform_device *, if (_T) platform_device_put(_T))
 
 struct platform_driver {
 	int (*probe)(struct platform_device *);
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 13/17] ACPI:RAS2: Add ACPI RAS2 driver
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (11 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 12/17] platform: Add __free() based cleanup function for platform_device_put shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-10-01 15:47   ` Fan Ni
  2024-09-11  9:04 ` [PATCH v12 14/17] ras: mem: Add memory " shiju.jose
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for ACPI RAS2 feature table (RAS2) defined in the
ACPI 6.5 Specification, section 5.2.21.
Driver contains RAS2 Init, which extracts the RAS2 table and driver
adds platform device for each memory features which binds to the
RAS2 memory driver.

Driver uses PCC mailbox to communicate with the ACPI HW and the
driver adds OSPM interfaces to send RAS2 commands.

Co-developed-by: A Somasundaram <somasundaram.a@hpe.com>
Signed-off-by: A Somasundaram <somasundaram.a@hpe.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/acpi/Kconfig     |  10 +
 drivers/acpi/Makefile    |   1 +
 drivers/acpi/ras2.c      | 391 +++++++++++++++++++++++++++++++++++++++
 include/acpi/ras2_acpi.h |  60 ++++++
 4 files changed, 462 insertions(+)
 create mode 100755 drivers/acpi/ras2.c
 create mode 100644 include/acpi/ras2_acpi.h

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index e3a7c2aedd5f..482080f1f0c5 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -284,6 +284,16 @@ config ACPI_CPPC_LIB
 	  If your platform does not support CPPC in firmware,
 	  leave this option disabled.
 
+config ACPI_RAS2
+	bool "ACPI RAS2 driver"
+	select MAILBOX
+	select PCC
+	help
+	  The driver adds support for ACPI RAS2 feature table(extracts RAS2
+	  table from OS system table) and OSPM interfaces to send RAS2
+	  commands via PCC mailbox subspace. Driver adds platform device for
+	  the RAS2 memory features which binds to the RAS2 memory driver.
+
 config ACPI_PROCESSOR
 	tristate "Processor"
 	depends on X86 || ARM64 || LOONGARCH || RISCV
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 61ca4afe83dc..84e2a2519bae 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -100,6 +100,7 @@ obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
 obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
 obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
+obj-$(CONFIG_ACPI_RAS2)		+= ras2.o
 obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
 obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
 obj-$(CONFIG_ACPI_PFRUT)	+= pfr_update.o pfr_telemetry.o
diff --git a/drivers/acpi/ras2.c b/drivers/acpi/ras2.c
new file mode 100755
index 000000000000..5daf1510d19e
--- /dev/null
+++ b/drivers/acpi/ras2.c
@@ -0,0 +1,391 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Implementation of ACPI RAS2 driver.
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ *
+ * Support for RAS2 - ACPI 6.5 Specification, section 5.2.21
+ *
+ * Driver contains ACPI RAS2 init, which extracts the ACPI RAS2 table and
+ * get the PCC channel subspace for communicating with the ACPI compliant
+ * HW platform which supports ACPI RAS2. Driver adds platform devices
+ * for each RAS2 memory feature which binds to the memory ACPI RAS2 driver.
+ */
+
+#define pr_fmt(fmt)    "ACPI RAS2: " fmt
+
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/ktime.h>
+#include <linux/platform_device.h>
+#include <acpi/pcc.h>
+#include <acpi/ras2_acpi.h>
+
+/*
+ * Arbitrary Retries for PCC commands because the
+ * remote processor could be much slower to reply.
+ */
+#define RAS2_NUM_RETRIES 600
+
+#define RAS2_FEATURE_TYPE_MEMORY        0x00
+
+/* global variables for the RAS2 PCC subspaces */
+static DEFINE_MUTEX(ras2_pcc_subspace_lock);
+static LIST_HEAD(ras2_pcc_subspaces);
+
+static int ras2_report_cap_error(u32 cap_status)
+{
+	switch (cap_status) {
+	case ACPI_RAS2_NOT_VALID:
+	case ACPI_RAS2_NOT_SUPPORTED:
+		return -EPERM;
+	case ACPI_RAS2_BUSY:
+		return -EBUSY;
+	case ACPI_RAS2_FAILED:
+	case ACPI_RAS2_ABORTED:
+	case ACPI_RAS2_INVALID_DATA:
+		return -EINVAL;
+	default: /* 0 or other, Success */
+		return 0;
+	}
+}
+
+static int ras2_check_pcc_chan(struct ras2_pcc_subspace *pcc_subspace)
+{
+	struct acpi_ras2_shared_memory __iomem *generic_comm_base = pcc_subspace->pcc_comm_addr;
+	ktime_t next_deadline = ktime_add(ktime_get(), pcc_subspace->deadline);
+	u32 cap_status;
+	u16 status;
+	u32 ret;
+
+	while (!ktime_after(ktime_get(), next_deadline)) {
+		/*
+		 * As per ACPI spec, the PCC space will be initialized by
+		 * platform and should have set the command completion bit when
+		 * PCC can be used by OSPM
+		 */
+		status = readw_relaxed(&generic_comm_base->status);
+		if (status & RAS2_PCC_CMD_ERROR) {
+			cap_status = readw_relaxed(&generic_comm_base->set_capabilities_status);
+			ret = ras2_report_cap_error(cap_status);
+
+			status &= ~RAS2_PCC_CMD_ERROR;
+			writew_relaxed(status, &generic_comm_base->status);
+			return ret;
+		}
+		if (status & RAS2_PCC_CMD_COMPLETE)
+			return 0;
+		/*
+		 * Reducing the bus traffic in case this loop takes longer than
+		 * a few retries.
+		 */
+		msleep(10);
+	}
+
+	return -EIO;
+}
+
+/**
+ * ras2_send_pcc_cmd() - Send RAS2 command via PCC channel
+ * @ras2_ctx:	pointer to the RAS2 context structure
+ * @cmd:	command to send
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int ras2_send_pcc_cmd(struct ras2_scrub_ctx *ras2_ctx, u16 cmd)
+{
+	struct ras2_pcc_subspace *pcc_subspace = ras2_ctx->pcc_subspace;
+	struct acpi_ras2_shared_memory *generic_comm_base = pcc_subspace->pcc_comm_addr;
+	static ktime_t last_cmd_cmpl_time, last_mpar_reset;
+	struct mbox_chan *pcc_channel;
+	unsigned int time_delta;
+	static int mpar_count;
+	int ret;
+
+	guard(mutex)(&ras2_pcc_subspace_lock);
+	ret = ras2_check_pcc_chan(pcc_subspace);
+	if (ret < 0)
+		return ret;
+	pcc_channel = pcc_subspace->pcc_chan->mchan;
+
+	/*
+	 * Handle the Minimum Request Turnaround Time(MRTT)
+	 * "The minimum amount of time that OSPM must wait after the completion
+	 * of a command before issuing the next command, in microseconds"
+	 */
+	if (pcc_subspace->pcc_mrtt) {
+		time_delta = ktime_us_delta(ktime_get(), last_cmd_cmpl_time);
+		if (pcc_subspace->pcc_mrtt > time_delta)
+			udelay(pcc_subspace->pcc_mrtt - time_delta);
+	}
+
+	/*
+	 * Handle the non-zero Maximum Periodic Access Rate(MPAR)
+	 * "The maximum number of periodic requests that the subspace channel can
+	 * support, reported in commands per minute. 0 indicates no limitation."
+	 *
+	 * This parameter should be ideally zero or large enough so that it can
+	 * handle maximum number of requests that all the cores in the system can
+	 * collectively generate. If it is not, we will follow the spec and just
+	 * not send the request to the platform after hitting the MPAR limit in
+	 * any 60s window
+	 */
+	if (pcc_subspace->pcc_mpar) {
+		if (mpar_count == 0) {
+			time_delta = ktime_ms_delta(ktime_get(), last_mpar_reset);
+			if (time_delta < 60 * MSEC_PER_SEC) {
+				dev_dbg(ras2_ctx->dev,
+					"PCC cmd not sent due to MPAR limit");
+				return -EIO;
+			}
+			last_mpar_reset = ktime_get();
+			mpar_count = pcc_subspace->pcc_mpar;
+		}
+		mpar_count--;
+	}
+
+	/* Write to the shared comm region. */
+	writew_relaxed(cmd, &generic_comm_base->command);
+
+	/* Flip CMD COMPLETE bit */
+	writew_relaxed(0, &generic_comm_base->status);
+
+	/* Ring doorbell */
+	ret = mbox_send_message(pcc_channel, &cmd);
+	if (ret < 0) {
+		dev_err(ras2_ctx->dev,
+			"Err sending PCC mbox message. cmd:%d, ret:%d\n",
+			cmd, ret);
+		return ret;
+	}
+
+	/*
+	 * If Minimum Request Turnaround Time is non-zero, we need
+	 * to record the completion time of both READ and WRITE
+	 * command for proper handling of MRTT, so we need to check
+	 * for pcc_mrtt in addition to CMD_READ
+	 */
+	if (cmd == RAS2_PCC_CMD_EXEC || pcc_subspace->pcc_mrtt) {
+		ret = ras2_check_pcc_chan(pcc_subspace);
+		if (pcc_subspace->pcc_mrtt)
+			last_cmd_cmpl_time = ktime_get();
+	}
+
+	if (pcc_channel->mbox->txdone_irq)
+		mbox_chan_txdone(pcc_channel, ret);
+	else
+		mbox_client_txdone(pcc_channel, ret);
+
+	return ret >= 0 ? 0 : ret;
+}
+EXPORT_SYMBOL_GPL(ras2_send_pcc_cmd);
+
+static int ras2_register_pcc_channel(struct device *dev, struct ras2_scrub_ctx *ras2_ctx,
+				     int pcc_subspace_id)
+{
+	struct acpi_pcct_hw_reduced *ras2_ss;
+	struct mbox_client *ras2_mbox_cl;
+	struct pcc_mbox_chan *pcc_chan;
+	struct ras2_pcc_subspace *pcc_subspace;
+
+	if (pcc_subspace_id < 0)
+		return -EINVAL;
+
+	mutex_lock(&ras2_pcc_subspace_lock);
+	list_for_each_entry(pcc_subspace, &ras2_pcc_subspaces, elem) {
+		if (pcc_subspace->pcc_subspace_id == pcc_subspace_id) {
+			ras2_ctx->pcc_subspace = pcc_subspace;
+			pcc_subspace->ref_count++;
+			mutex_unlock(&ras2_pcc_subspace_lock);
+			return 0;
+		}
+	}
+	mutex_unlock(&ras2_pcc_subspace_lock);
+
+	pcc_subspace = kcalloc(1, sizeof(*pcc_subspace), GFP_KERNEL);
+	if (!pcc_subspace)
+		return -ENOMEM;
+	pcc_subspace->pcc_subspace_id = pcc_subspace_id;
+	ras2_mbox_cl = &pcc_subspace->mbox_client;
+	ras2_mbox_cl->dev = dev;
+	ras2_mbox_cl->knows_txdone = true;
+
+	pcc_chan = pcc_mbox_request_channel(ras2_mbox_cl, pcc_subspace_id);
+	if (IS_ERR(pcc_chan)) {
+		kfree(pcc_subspace);
+		return PTR_ERR(pcc_chan);
+	}
+	pcc_subspace->pcc_chan = pcc_chan;
+	ras2_ss = pcc_chan->mchan->con_priv;
+	pcc_subspace->comm_base_addr = ras2_ss->base_address;
+
+	/*
+	 * ras2_ss->latency is just a Nominal value. In reality
+	 * the remote processor could be much slower to reply.
+	 * So add an arbitrary amount of wait on top of Nominal.
+	 */
+	pcc_subspace->deadline = ns_to_ktime(RAS2_NUM_RETRIES * ras2_ss->latency *
+					     NSEC_PER_USEC);
+	pcc_subspace->pcc_mrtt = ras2_ss->min_turnaround_time;
+	pcc_subspace->pcc_mpar = ras2_ss->max_access_rate;
+	pcc_subspace->pcc_comm_addr = acpi_os_ioremap(pcc_subspace->comm_base_addr,
+						      ras2_ss->length);
+	/* Set flag so that we dont come here for each CPU. */
+	pcc_subspace->pcc_channel_acquired = true;
+
+	mutex_lock(&ras2_pcc_subspace_lock);
+	list_add(&pcc_subspace->elem, &ras2_pcc_subspaces);
+	pcc_subspace->ref_count++;
+	mutex_unlock(&ras2_pcc_subspace_lock);
+	ras2_ctx->pcc_subspace = pcc_subspace;
+
+	return 0;
+}
+
+static void ras2_unregister_pcc_channel(void *ctx)
+{
+	struct ras2_scrub_ctx *ras2_ctx = ctx;
+	struct ras2_pcc_subspace *pcc_subspace = ras2_ctx->pcc_subspace;
+
+	if (!pcc_subspace  || !pcc_subspace->pcc_chan)
+		return;
+
+	guard(mutex)(&ras2_pcc_subspace_lock);
+	if (pcc_subspace->ref_count > 0)
+		pcc_subspace->ref_count--;
+	if (!pcc_subspace->ref_count) {
+		list_del(&pcc_subspace->elem);
+		pcc_mbox_free_channel(pcc_subspace->pcc_chan);
+		kfree(pcc_subspace);
+	}
+}
+
+/**
+ * devm_ras2_register_pcc_channel() - Register RAS2 PCC channel
+ * @dev:		pointer to the RAS2 device
+ * @ras2_ctx:		pointer to the RAS2 context structure
+ * @pcc_subspace_id:	identifier of the RAS2 PCC channel.
+ *
+ * Returns: 0 on success, an error otherwise
+ */
+int devm_ras2_register_pcc_channel(struct device *dev, struct ras2_scrub_ctx *ras2_ctx,
+				   int pcc_subspace_id)
+{
+	int ret;
+
+	ret = ras2_register_pcc_channel(dev, ras2_ctx, pcc_subspace_id);
+	if (ret)
+		return ret;
+
+	return devm_add_action_or_reset(dev, ras2_unregister_pcc_channel, ras2_ctx);
+}
+EXPORT_SYMBOL_NS_GPL(devm_ras2_register_pcc_channel, ACPI_RAS2);
+
+static struct platform_device *ras2_add_platform_device(char *name, int channel)
+{
+	int ret;
+	struct platform_device *pdev __free(platform_device_put) =
+		platform_device_alloc(name, PLATFORM_DEVID_AUTO);
+	if (!pdev)
+		return ERR_PTR(-ENOMEM);
+
+	ret = platform_device_add_data(pdev, &channel, sizeof(channel));
+	if (ret)
+		return ERR_PTR(ret);
+
+	ret = platform_device_add(pdev);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return_ptr(pdev);
+}
+
+static int __init ras2_acpi_init(void)
+{
+	struct acpi_table_header *pAcpiTable = NULL;
+	struct acpi_ras2_pcc_desc *pcc_desc_list;
+	struct acpi_table_ras2 *pRas2Table;
+	struct platform_device *pdev;
+	int pcc_subspace_id;
+	acpi_size ras2_size;
+	acpi_status status;
+	u8 count = 0, i;
+	int ret;
+
+	status = acpi_get_table("RAS2", 0, &pAcpiTable);
+	if (ACPI_FAILURE(status) || !pAcpiTable) {
+		pr_err("ACPI RAS2 driver failed to initialize, get table failed\n");
+		return -EINVAL;
+	}
+
+	ras2_size = pAcpiTable->length;
+	if (ras2_size < sizeof(struct acpi_table_ras2)) {
+		pr_err("ACPI RAS2 table present but broken (too short #1)\n");
+		ret = -EINVAL;
+		goto free_ras2_table;
+	}
+
+	pRas2Table = (struct acpi_table_ras2 *)pAcpiTable;
+	if (pRas2Table->num_pcc_descs <= 0) {
+		pr_err("ACPI RAS2 table does not contain PCC descriptors\n");
+		ret = -EINVAL;
+		goto free_ras2_table;
+	}
+
+	struct platform_device **pdev_list __free(kfree) =
+			kcalloc(pRas2Table->num_pcc_descs, sizeof(*pdev_list),
+				GFP_KERNEL);
+	if (!pdev_list) {
+		ret = -ENOMEM;
+		goto free_ras2_table;
+	}
+
+	pcc_desc_list = (struct acpi_ras2_pcc_desc *)(pRas2Table + 1);
+	/* Double scan for the case of only one actual controller */
+	pcc_subspace_id = -1;
+	count = 0;
+	for (i = 0; i < pRas2Table->num_pcc_descs; i++, pcc_desc_list++) {
+		if (pcc_desc_list->feature_type != RAS2_FEATURE_TYPE_MEMORY)
+			continue;
+		if (pcc_subspace_id == -1) {
+			pcc_subspace_id = pcc_desc_list->channel_id;
+			count++;
+		}
+		if (pcc_desc_list->channel_id != pcc_subspace_id)
+			count++;
+	}
+	if (count == 1) {
+		pdev = ras2_add_platform_device("acpi_ras2", pcc_subspace_id);
+		if (!pdev) {
+			ret = -ENODEV;
+			goto free_ras2_pdev;
+		}
+		pdev_list[0] = pdev;
+		return 0;
+	}
+
+	count = 0;
+	for (i = 0; i < pRas2Table->num_pcc_descs; i++, pcc_desc_list++) {
+		if (pcc_desc_list->feature_type != RAS2_FEATURE_TYPE_MEMORY)
+			continue;
+		pcc_subspace_id = pcc_desc_list->channel_id;
+		/* Add the platform device and bind ACPI RAS2 memory driver */
+		pdev = ras2_add_platform_device("acpi_ras2", pcc_subspace_id);
+		if (!pdev)
+			goto free_ras2_pdev;
+		pdev_list[count++] = pdev;
+	}
+
+	acpi_put_table(pAcpiTable);
+	return 0;
+
+free_ras2_pdev:
+	for (i = count; i >= 0; i++)
+		platform_device_put(pdev_list[i]);
+
+free_ras2_table:
+	acpi_put_table(pAcpiTable);
+
+	return ret;
+}
+late_initcall(ras2_acpi_init)
diff --git a/include/acpi/ras2_acpi.h b/include/acpi/ras2_acpi.h
new file mode 100644
index 000000000000..edfca253d88a
--- /dev/null
+++ b/include/acpi/ras2_acpi.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * RAS2 ACPI driver header file
+ *
+ * (C) Copyright 2014, 2015 Hewlett-Packard Enterprises
+ *
+ * Copyright (c) 2024 HiSilicon Limited
+ */
+
+#ifndef _RAS2_ACPI_H
+#define _RAS2_ACPI_H
+
+#include <linux/acpi.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+
+#define RAS2_PCC_CMD_COMPLETE	BIT(0)
+#define RAS2_PCC_CMD_ERROR	BIT(2)
+
+/* RAS2 specific PCC commands */
+#define RAS2_PCC_CMD_EXEC 0x01
+
+struct device;
+
+/* Data structures for PCC communication and RAS2 table */
+struct pcc_mbox_chan;
+
+struct ras2_pcc_subspace {
+	int pcc_subspace_id;
+	struct mbox_client mbox_client;
+	struct pcc_mbox_chan *pcc_chan;
+	struct acpi_ras2_shared_memory __iomem *pcc_comm_addr;
+	u64 comm_base_addr;
+	bool pcc_channel_acquired;
+	ktime_t deadline;
+	unsigned int pcc_mpar;
+	unsigned int pcc_mrtt;
+	struct list_head elem;
+	u16 ref_count;
+};
+
+struct ras2_scrub_ctx {
+	struct device *dev;
+	struct ras2_pcc_subspace *pcc_subspace;
+	int id;
+	u8 instance;
+	struct device *scrub_dev;
+	bool bg;
+	u64 base, size;
+	u8 scrub_cycle_hrs, min_scrub_cycle, max_scrub_cycle;
+	/* Lock to provide mutually exclusive access to PCC channel */
+	struct mutex lock;
+};
+
+int ras2_send_pcc_cmd(struct ras2_scrub_ctx *ras2_ctx, u16 cmd);
+int devm_ras2_register_pcc_channel(struct device *dev, struct ras2_scrub_ctx *ras2_ctx,
+				   int pcc_subspace_id);
+
+#endif /* _RAS2_ACPI_H */
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 14/17] ras: mem: Add memory ACPI RAS2 driver
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (12 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 13/17] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-11  9:04 ` [PATCH v12 15/17] EDAC: Add EDAC PPR control driver shiju.jose
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Memory ACPI RAS2 driver binds to the platform device add by the
ACPI RAS2 table parser.

Driver uses a PCC subspace for communicating with the ACPI compliant
platform to provide control of memory scrub parameters to the userspace
via the EDAC scrub.

Get the scrub attr descriptors from the EDAC scrub and register with EDAC
RAS feature driver to expose sysfs scrub control attributes to the userspace.
For example scrub control for the RAS2 memory device is exposed in
/sys/bus/edac/devices/acpi_ras2_mem0/scrub/

Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/edac/edac-scrub.rst |  41 +++
 drivers/ras/Kconfig               |  10 +
 drivers/ras/Makefile              |   1 +
 drivers/ras/acpi_ras2.c           | 412 ++++++++++++++++++++++++++++++
 4 files changed, 464 insertions(+)
 create mode 100644 drivers/ras/acpi_ras2.c

diff --git a/Documentation/edac/edac-scrub.rst b/Documentation/edac/edac-scrub.rst
index 243035957e99..b941566bd36b 100644
--- a/Documentation/edac/edac-scrub.rst
+++ b/Documentation/edac/edac-scrub.rst
@@ -72,3 +72,44 @@ root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
 root@localhost:~# echo 0 > /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
 root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
 0
+
+2. RAS2
+2.1 On demand scrubbing for a specific memory region.
+root@localhost:~# echo 0x120000 > /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/addr_range_base
+root@localhost:~# echo 0x150000 > /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/addr_range_size
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/min_cycle_duration
+3600
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/max_cycle_duration
+86400
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/current_cycle_duration
+36000
+root@localhost:~# echo 54000 > /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/current_cycle_duration
+root@localhost:~# echo 1 > /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/enable_on_demand
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/enable_on_demand
+1
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/current_cycle_duration
+54000
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/addr_range_base
+0x120000
+root@localhost:~# cat //sys/bus/edac/devices/acpi_ras2_mem0/scrub0/addr_range_size
+0x150000
+root@localhost:~# echo 0 > /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/enable_on_demand
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/enable_on_demand
+0
+
+2.2 Background scrubbing the entire memory
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/min_cycle_duration
+3600
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/max_cycle_duration
+86400
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/current_cycle_duration
+36000
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/enable_background
+0
+root@localhost:~# echo 10800 > /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/current_cycle_duration
+root@localhost:~# echo 1 > /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/enable_background
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/enable_background
+1
+root@localhost:~# cat /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/current_cycle_duration
+10800
+root@localhost:~# echo 0 > /sys/bus/edac/devices/acpi_ras2_mem0/scrub0/enable_background
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index fc4f4bb94a4c..b77790bdc73a 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -46,4 +46,14 @@ config RAS_FMPM
 	  Memory will be retired during boot time and run time depending on
 	  platform-specific policies.
 
+config MEM_ACPI_RAS2
+	tristate "Memory ACPI RAS2 driver"
+	depends on ACPI_RAS2
+	depends on EDAC
+	help
+	  The driver binds to the platform device added by the ACPI RAS2
+	  table parser. Use a PCC channel subspace for communicating with
+	  the ACPI compliant platform to provide control of memory scrub
+	  parameters to the user via the EDAC scrub.
+
 endif
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 11f95d59d397..a0e6e903d6b0 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_RAS)	+= ras.o
 obj-$(CONFIG_DEBUG_FS)	+= debugfs.o
 obj-$(CONFIG_RAS_CEC)	+= cec.o
+obj-$(CONFIG_MEM_ACPI_RAS2)	+= acpi_ras2.o
 
 obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
 obj-y			+= amd/atl/
diff --git a/drivers/ras/acpi_ras2.c b/drivers/ras/acpi_ras2.c
new file mode 100644
index 000000000000..5cbcdc208e9c
--- /dev/null
+++ b/drivers/ras/acpi_ras2.c
@@ -0,0 +1,412 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * ACPI RAS2 memory driver
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ *
+ */
+
+#define pr_fmt(fmt)	"MEMORY ACPI RAS2: " fmt
+
+#include <linux/bitfield.h>
+#include <linux/edac.h>
+#include <linux/platform_device.h>
+#include <acpi/ras2_acpi.h>
+
+#define RAS2_DEV_NUM_RAS_FEATURES	1
+
+#define RAS2_SUPPORT_HW_PARTOL_SCRUB	BIT(0)
+#define RAS2_TYPE_PATROL_SCRUB	0x0000
+
+#define RAS2_GET_PATROL_PARAMETERS	0x01
+#define	RAS2_START_PATROL_SCRUBBER	0x02
+#define	RAS2_STOP_PATROL_SCRUBBER	0x03
+
+#define RAS2_PATROL_SCRUB_SCHRS_IN_MASK	GENMASK(15, 8)
+#define RAS2_PATROL_SCRUB_EN_BACKGROUND	BIT(0)
+#define RAS2_PATROL_SCRUB_SCHRS_OUT_MASK	GENMASK(7, 0)
+#define RAS2_PATROL_SCRUB_MIN_SCHRS_OUT_MASK	GENMASK(15, 8)
+#define RAS2_PATROL_SCRUB_MAX_SCHRS_OUT_MASK	GENMASK(23, 16)
+#define RAS2_PATROL_SCRUB_FLAG_SCRUBBER_RUNNING	BIT(0)
+
+#define RAS2_SCRUB_NAME_LEN      128
+#define RAS2_HOUR_IN_SECS    3600
+
+struct acpi_ras2_ps_shared_mem {
+	struct acpi_ras2_shared_memory common;
+	struct acpi_ras2_patrol_scrub_parameter params;
+};
+
+static int ras2_is_patrol_scrub_support(struct ras2_scrub_ctx *ras2_ctx)
+{
+	struct acpi_ras2_shared_memory __iomem *common = (void *)
+				ras2_ctx->pcc_subspace->pcc_comm_addr;
+
+	guard(mutex)(&ras2_ctx->lock);
+	common->set_capabilities[0] = 0;
+
+	return common->features[0] & RAS2_SUPPORT_HW_PARTOL_SCRUB;
+}
+
+static int ras2_update_patrol_scrub_params_cache(struct ras2_scrub_ctx *ras2_ctx)
+{
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
+					ras2_ctx->pcc_subspace->pcc_comm_addr;
+	int ret;
+
+	ps_sm->common.set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	ps_sm->params.patrol_scrub_command = RAS2_GET_PATROL_PARAMETERS;
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "failed to read parameters\n");
+		return ret;
+	}
+
+	ras2_ctx->min_scrub_cycle = FIELD_GET(RAS2_PATROL_SCRUB_MIN_SCHRS_OUT_MASK,
+					      ps_sm->params.scrub_params_out);
+	ras2_ctx->max_scrub_cycle = FIELD_GET(RAS2_PATROL_SCRUB_MAX_SCHRS_OUT_MASK,
+					      ps_sm->params.scrub_params_out);
+	if (!ras2_ctx->bg) {
+		ras2_ctx->base = ps_sm->params.actual_address_range[0];
+		ras2_ctx->size = ps_sm->params.actual_address_range[1];
+	}
+	ras2_ctx->scrub_cycle_hrs = FIELD_GET(RAS2_PATROL_SCRUB_SCHRS_OUT_MASK,
+					      ps_sm->params.scrub_params_out);
+
+	return 0;
+}
+
+/* Context - lock must be held */
+static int ras2_get_patrol_scrub_running(struct ras2_scrub_ctx *ras2_ctx,
+					 bool *running)
+{
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
+					ras2_ctx->pcc_subspace->pcc_comm_addr;
+	int ret;
+
+	ps_sm->common.set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	ps_sm->params.patrol_scrub_command = RAS2_GET_PATROL_PARAMETERS;
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "failed to read parameters\n");
+		return ret;
+	}
+
+	*running = ps_sm->params.flags & RAS2_PATROL_SCRUB_FLAG_SCRUBBER_RUNNING;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_min_scrub_cycle(struct device *dev, void *drv_data,
+					      u32 *min)
+{
+	struct ras2_scrub_ctx *ras2_ctx = drv_data;
+
+	*min = ras2_ctx->min_scrub_cycle * RAS2_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_max_scrub_cycle(struct device *dev, void *drv_data,
+					      u32 *max)
+{
+	struct ras2_scrub_ctx *ras2_ctx = drv_data;
+
+	*max = ras2_ctx->max_scrub_cycle * RAS2_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_cycle_read(struct device *dev, void *drv_data,
+				    u32 *scrub_cycle_secs)
+{
+	struct ras2_scrub_ctx *ras2_ctx = drv_data;
+
+	*scrub_cycle_secs = ras2_ctx->scrub_cycle_hrs * RAS2_HOUR_IN_SECS;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_cycle_write(struct device *dev, void *drv_data,
+				     u32 scrub_cycle_secs)
+{
+	u8 scrub_cycle_hrs = scrub_cycle_secs / RAS2_HOUR_IN_SECS;
+	struct ras2_scrub_ctx *ras2_ctx = drv_data;
+	bool running;
+	int ret;
+
+	guard(mutex)(&ras2_ctx->lock);
+	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (running)
+		return -EBUSY;
+
+	if (scrub_cycle_hrs < ras2_ctx->min_scrub_cycle ||
+	    scrub_cycle_hrs > ras2_ctx->max_scrub_cycle)
+		return -EINVAL;
+
+	ras2_ctx->scrub_cycle_hrs = scrub_cycle_hrs;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_read_range(struct device *dev, void *drv_data, u64 *base, u64 *size)
+{
+	struct ras2_scrub_ctx *ras2_ctx = drv_data;
+
+	/*
+	 * When BG scrubbing is enabled the actual address range is not valid.
+	 * Return -EBUSY now unless findout a method to retrieve actual full PA range.
+	 */
+	if (ras2_ctx->bg)
+		return -EBUSY;
+
+	*base = ras2_ctx->base;
+	*size = ras2_ctx->size;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_write_range(struct device *dev, void *drv_data, u64 base, u64 size)
+{
+	struct ras2_scrub_ctx *ras2_ctx = drv_data;
+	bool running;
+	int ret;
+
+	guard(mutex)(&ras2_ctx->lock);
+	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+
+	if (running)
+		return -EBUSY;
+
+	if (!base || !size) {
+		dev_warn(dev, "%s: Invalid address range, base=0x%llx size=0x%llx\n",
+			 __func__, base, size);
+		return -EINVAL;
+	}
+
+	ras2_ctx->base = base;
+	ras2_ctx->size = size;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable)
+{
+	struct ras2_scrub_ctx *ras2_ctx = drv_data;
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
+					ras2_ctx->pcc_subspace->pcc_comm_addr;
+	bool running;
+	int ret;
+
+	guard(mutex)(&ras2_ctx->lock);
+	ps_sm->common.set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+	if (enable) {
+		if (ras2_ctx->bg || running)
+			return -EBUSY;
+		ps_sm->params.requested_address_range[0] = 0;
+		ps_sm->params.requested_address_range[1] = 0;
+		ps_sm->params.scrub_params_in &= ~RAS2_PATROL_SCRUB_SCHRS_IN_MASK;
+		ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PATROL_SCRUB_SCHRS_IN_MASK,
+							    ras2_ctx->scrub_cycle_hrs);
+		ps_sm->params.patrol_scrub_command = RAS2_START_PATROL_SCRUBBER;
+	} else {
+		if (!ras2_ctx->bg)
+			return -EPERM;
+		if (!ras2_ctx->bg && running)
+			return -EBUSY;
+		ps_sm->params.patrol_scrub_command = RAS2_STOP_PATROL_SCRUBBER;
+	}
+	ps_sm->params.scrub_params_in &= ~RAS2_PATROL_SCRUB_EN_BACKGROUND;
+	ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PATROL_SCRUB_EN_BACKGROUND,
+						    enable);
+	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "Failed to %s background scrubbing\n",
+			enable ? "enable" : "disable");
+		return ret;
+	}
+	if (enable) {
+		ras2_ctx->bg = true;
+		/* Update the cache to account for rounding of supplied parameters and similar */
+		ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	} else {
+		ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+		ras2_ctx->bg = false;
+	}
+
+	return ret;
+}
+
+static int ras2_hw_scrub_get_enabled_bg(struct device *dev, void *drv_data, bool *enabled)
+{
+	struct ras2_scrub_ctx *ras2_ctx = drv_data;
+
+	*enabled = ras2_ctx->bg;
+
+	return 0;
+}
+
+static int ras2_hw_scrub_set_enabled_od(struct device *dev, void *drv_data, bool enable)
+{
+	struct ras2_scrub_ctx *ras2_ctx = drv_data;
+	struct acpi_ras2_ps_shared_mem __iomem *ps_sm = (void *)
+					ras2_ctx->pcc_subspace->pcc_comm_addr;
+	bool running;
+	int ret;
+
+	guard(mutex)(&ras2_ctx->lock);
+	ps_sm->common.set_capabilities[0] = RAS2_SUPPORT_HW_PARTOL_SCRUB;
+	if (ras2_ctx->bg)
+		return -EBUSY;
+	ret = ras2_get_patrol_scrub_running(ras2_ctx, &running);
+	if (ret)
+		return ret;
+	if (enable) {
+		if (!ras2_ctx->base || !ras2_ctx->size) {
+			dev_warn(ras2_ctx->dev,
+				 "%s: Invalid address range, base=0x%llx "
+				 "size=0x%llx\n", __func__,
+				 ras2_ctx->base, ras2_ctx->size);
+			return -ERANGE;
+		}
+		if (running)
+			return -EBUSY;
+		ps_sm->params.scrub_params_in &= ~RAS2_PATROL_SCRUB_SCHRS_IN_MASK;
+		ps_sm->params.scrub_params_in |= FIELD_PREP(RAS2_PATROL_SCRUB_SCHRS_IN_MASK,
+							    ras2_ctx->scrub_cycle_hrs);
+		ps_sm->params.requested_address_range[0] = ras2_ctx->base;
+		ps_sm->params.requested_address_range[1] = ras2_ctx->size;
+		ps_sm->params.scrub_params_in &= ~RAS2_PATROL_SCRUB_EN_BACKGROUND;
+		ps_sm->params.patrol_scrub_command = RAS2_START_PATROL_SCRUBBER;
+	} else {
+		if (!running)
+			return 0;
+		ps_sm->params.patrol_scrub_command = RAS2_STOP_PATROL_SCRUBBER;
+	}
+
+	ret = ras2_send_pcc_cmd(ras2_ctx, RAS2_PCC_CMD_EXEC);
+	if (ret) {
+		dev_err(ras2_ctx->dev, "Failed to %s demand scrubbing\n",
+			enable ? "enable" : "disable");
+		return ret;
+	}
+
+	return ras2_update_patrol_scrub_params_cache(ras2_ctx);
+}
+
+static int ras2_hw_scrub_get_enabled_od(struct device *dev, void *drv_data, bool *enabled)
+{
+	struct ras2_scrub_ctx *ras2_ctx = drv_data;
+
+	guard(mutex)(&ras2_ctx->lock);
+	if (ras2_ctx->bg) {
+		*enabled = false;
+		return 0;
+	}
+
+	return ras2_get_patrol_scrub_running(ras2_ctx, enabled);
+}
+
+static const struct edac_scrub_ops ras2_scrub_ops = {
+	.read_range = ras2_hw_scrub_read_range,
+	.write_range = ras2_hw_scrub_write_range,
+	.get_enabled_bg = ras2_hw_scrub_get_enabled_bg,
+	.set_enabled_bg = ras2_hw_scrub_set_enabled_bg,
+	.get_enabled_od = ras2_hw_scrub_get_enabled_od,
+	.set_enabled_od = ras2_hw_scrub_set_enabled_od,
+	.min_cycle_read = ras2_hw_scrub_read_min_scrub_cycle,
+	.max_cycle_read = ras2_hw_scrub_read_max_scrub_cycle,
+	.cycle_duration_read = ras2_hw_scrub_cycle_read,
+	.cycle_duration_write = ras2_hw_scrub_cycle_write,
+};
+
+static DEFINE_IDA(ras2_ida);
+
+static void ida_release(void *ctx)
+{
+	struct ras2_scrub_ctx *ras2_ctx = ctx;
+
+	ida_free(&ras2_ida, ras2_ctx->id);
+}
+
+static int ras2_probe(struct platform_device *pdev)
+{
+	struct edac_dev_feature ras_features[RAS2_DEV_NUM_RAS_FEATURES];
+	char scrub_name[RAS2_SCRUB_NAME_LEN];
+	struct ras2_scrub_ctx *ras2_ctx;
+	int num_ras_features = 0;
+	int ret, id;
+
+	/* RAS2 PCC Channel and Scrub specific context */
+	ras2_ctx = devm_kzalloc(&pdev->dev, sizeof(*ras2_ctx), GFP_KERNEL);
+	if (!ras2_ctx)
+		return -ENOMEM;
+
+	ras2_ctx->dev = &pdev->dev;
+	mutex_init(&ras2_ctx->lock);
+
+	ret = devm_ras2_register_pcc_channel(&pdev->dev, ras2_ctx,
+					     *((int *)dev_get_platdata(&pdev->dev)));
+	if (ret < 0) {
+		dev_dbg(ras2_ctx->dev,
+			"failed to register pcc channel ret=%d\n", ret);
+		return ret;
+	}
+	if (!ras2_is_patrol_scrub_support(ras2_ctx))
+		return -EOPNOTSUPP;
+
+	ret = ras2_update_patrol_scrub_params_cache(ras2_ctx);
+	if (ret)
+		return ret;
+
+	id = ida_alloc(&ras2_ida, GFP_KERNEL);
+	if (id < 0)
+		return id;
+
+	ras2_ctx->id = id;
+
+	ret = devm_add_action_or_reset(&pdev->dev, ida_release, ras2_ctx);
+	if (ret < 0)
+		return ret;
+
+	snprintf(scrub_name, sizeof(scrub_name), "acpi_ras2_mem%d",
+		 ras2_ctx->id);
+
+	ras_features[num_ras_features].ft_type = RAS_FEAT_SCRUB;
+	ras_features[num_ras_features].instance = ras2_ctx->instance;
+	ras_features[num_ras_features].scrub_ops = &ras2_scrub_ops;
+	ras_features[num_ras_features].ctx = ras2_ctx;
+	num_ras_features++;
+
+	return edac_dev_register(&pdev->dev, scrub_name, NULL,
+				 num_ras_features, ras_features);
+}
+
+static const struct platform_device_id ras2_id_table[] = {
+	{ .name = "acpi_ras2", },
+	{ }
+};
+MODULE_DEVICE_TABLE(platform, ras2_id_table);
+
+static struct platform_driver ras2_driver = {
+	.probe = ras2_probe,
+	.driver = {
+		.name = "acpi_ras2",
+	},
+	.id_table = ras2_id_table,
+};
+module_driver(ras2_driver, platform_driver_register, platform_driver_unregister);
+
+MODULE_IMPORT_NS(ACPI_RAS2);
+MODULE_DESCRIPTION("ACPI RAS2 memory driver");
+MODULE_LICENSE("GPL");
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 15/17] EDAC: Add EDAC PPR control driver
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (13 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 14/17] ras: mem: Add memory " shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-11  9:04 ` [PATCH v12 16/17] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
  2024-09-11  9:04 ` [PATCH v12 17/17] cxl/memfeature: Add CXL memory device PPR control feature shiju.jose
  16 siblings, 0 replies; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add generic EDAC PPR(Post Package Repair) control driver supports configuring
the memory PPR feature in the system. Supports both sPPR(soft PPR) and
hPPR(hard PPR). The device with PPR feature, get the PPR descriptor from the
EDAC PPR and registers with the EDAC, which adds the sysfs PPR control attributes.
The PPR control attributes are available to userspace in
/sys/bus/edac/devices/<dev-name>/pprX/

Generic EDAC PPR driver and the common sysfs PPR interface promotes
unambiguous access from the userspace irrespective of the underlying memory
devices with PPR feature supported.

The sysfs PPR attribute nodes would be present only if the client driver
has implemented the corresponding attribute callback functions and pass in ops
to the EDAC during registration.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 Documentation/ABI/testing/sysfs-edac-ppr |  69 ++++++
 drivers/edac/Makefile                    |   2 +-
 drivers/edac/edac_device.c               |   4 +
 drivers/edac/edac_ppr.c                  | 255 +++++++++++++++++++++++
 include/linux/edac.h                     |  30 +++
 5 files changed, 359 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/ABI/testing/sysfs-edac-ppr
 create mode 100755 drivers/edac/edac_ppr.c

diff --git a/Documentation/ABI/testing/sysfs-edac-ppr b/Documentation/ABI/testing/sysfs-edac-ppr
new file mode 100644
index 000000000000..aaa645c195fc
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-edac-ppr
@@ -0,0 +1,69 @@
+What:		/sys/bus/edac/devices/<dev-name>/ppr*
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		The sysfs EDAC bus devices /<dev-name>/ppr* subdirectory
+		belongs to the memory media PPR (Post Package Repair) control
+		feature, where <dev-name> directory corresponds to a device
+		registered with the EDAC PPR driver and thus registered with
+		the generic EDAC device driver.
+		/ppr* belongs to either sPPR (Soft PPR) or hPPR (Hard PPR)
+		feature for the memory device.
+		The sysfs PPR attr nodes would be only present if PPR is
+		supported.
+
+What:		/sys/bus/edac/devices/<dev-name>/ppr*/persist_mode_avail
+Date:		Oct 2024
+KernelVersion:	6.12For e.g. HBM, DDR.
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) persist PPR modes supported in the device.
+		For e.g. Hard PPR(hPPR) for a permanent row repair,
+		Soft PPR(sPPR) for a temporary row repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/ppr*/persist_mode
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RW) Current persist PPR mode.
+
+What:		/sys/bus/edac/devices/<dev-name>/ppr*/dpa_support
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) True if supports DPA for PPR maintenance operation.
+
+What:		/sys/bus/edac/devices/<dev-name>/ppr*/ppr_safe_when_in_use
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(RO) True if memory media is accessible and data is retained
+		during the PPR operation.
+
+What:		/sys/bus/edac/devices/<dev-name>/ppr*/repair_hpa
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(WO) Start the PPR operation for the HPA (host physical
+		address) set. Return failure if resources are not available
+		to perform repair.
+
+What:		/sys/bus/edac/devices/<dev-name>/ppr*/repair_dpa
+Date:		Oct 2024
+KernelVersion:	6.12
+Contact:	linux-edac@vger.kernel.org
+Description:
+		(WO) Starts the PPR operation for the DPA(device physical
+		address) set. Return failure if resources are not available
+                to perform repair.
+		In some states of system configuration (e.g. before address decoders
+		have been configured), memory devices (e.g. CXL) may not have an
+		active mapping in the main host address physical address map.
+		As such, the memory to repair must be identified by a device
+		specific physical addressing scheme using a DPA. The DPA to use
+		will be presented in related error records.
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 62115eff6a9a..19ab22a210a1 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
 
 edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
 edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
-edac_core-y	+= edac_scrub.o edac_ecs.o
+edac_core-y	+= edac_scrub.o edac_ecs.o edac_ppr.o
 
 edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
 
diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
index 9cac9ae75080..8a0640523342 100644
--- a/drivers/edac/edac_device.c
+++ b/drivers/edac/edac_device.c
@@ -630,6 +630,10 @@ static int edac_dev_feat_init(struct device *parent,
 	case RAS_FEAT_PPR:
 		dev_data->ppr_ops = ras_feat->ppr_ops;
 		dev_data->private = ras_feat->ctx;
+		ret = edac_ppr_get_desc(parent, attr_groups,
+					ras_feat->instance);
+		if (ret)
+			return ret;
 		return 1;
 	default:
 		return -EINVAL;
diff --git a/drivers/edac/edac_ppr.c b/drivers/edac/edac_ppr.c
new file mode 100755
index 000000000000..4f97ea4deee3
--- /dev/null
+++ b/drivers/edac/edac_ppr.c
@@ -0,0 +1,255 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Generic EDAC PPR driver supports controlling the memory
+ * device with Post Package Repair (PPR) feature in the system
+ * and the common sysfs PPR control interface promotes unambiguous
+ * access from the userspace.
+ *
+ * Copyright (c) 2024 HiSilicon Limited.
+ */
+
+#define pr_fmt(fmt)     "EDAC PPR: " fmt
+
+#include <linux/edac.h>
+
+enum edac_ppr_attributes {
+	PPR_PERSIST_MODE_AVAIL,
+	PPR_PERSIST_MODE,
+	PPR_DPA_SUPPORT,
+	PPR_SAFE_IN_USE,
+	PPR_HPA,
+	PPR_DPA,
+	PPR_MAX_ATTRS
+};
+
+struct edac_ppr_dev_attr {
+	struct device_attribute dev_attr;
+	u8 instance;
+};
+
+struct edac_ppr_context {
+	char name[EDAC_FEAT_NAME_LEN];
+	struct edac_ppr_dev_attr ppr_dev_attr[PPR_MAX_ATTRS];
+	struct attribute *ppr_attrs[PPR_MAX_ATTRS + 1];
+	struct attribute_group group;
+};
+
+#define to_ppr_dev_attr(_dev_attr)      \
+		container_of(_dev_attr, struct edac_ppr_dev_attr, dev_attr)
+
+static ssize_t persist_mode_avail_show(struct device *ras_feat_dev,
+				       struct device_attribute *attr, char *buf)
+{
+	u8 inst = ((struct edac_ppr_dev_attr *)to_ppr_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ppr_ops *ops = ctx->ppr[inst].ppr_ops;
+
+	return ops->get_persist_mode_avail(ras_feat_dev->parent,
+					   ctx->ppr[inst].private, buf);
+}
+
+static ssize_t persist_mode_show(struct device *ras_feat_dev,
+				 struct device_attribute *attr, char *buf)
+{
+	u8 inst = ((struct edac_ppr_dev_attr *)to_ppr_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ppr_ops *ops = ctx->ppr[inst].ppr_ops;
+	u32 mode;
+	int ret;
+
+	ret = ops->get_persist_mode(ras_feat_dev->parent, ctx->ppr[inst].private, &mode);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", mode);
+}
+
+static ssize_t persist_mode_store(struct device *ras_feat_dev,
+				  struct device_attribute *attr,
+				  const char *buf,
+				  size_t len)
+{
+	u8 inst = ((struct edac_ppr_dev_attr *)to_ppr_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ppr_ops *ops = ctx->ppr[inst].ppr_ops;
+	long mode;
+	int ret;
+
+	ret = kstrtol(buf, 0, &mode);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->set_persist_mode(ras_feat_dev->parent, ctx->ppr[inst].private, mode);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t dpa_support_show(struct device *ras_feat_dev,
+				struct device_attribute *attr, char *buf)
+{
+	u8 inst = ((struct edac_ppr_dev_attr *)to_ppr_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ppr_ops *ops = ctx->ppr[inst].ppr_ops;
+	int ret;
+	u32 val;
+
+	ret = ops->get_dpa_support(ras_feat_dev->parent, ctx->ppr[inst].private, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t ppr_safe_when_in_use_show(struct device *ras_feat_dev,
+					 struct device_attribute *attr, char *buf)
+{
+	u8 inst = ((struct edac_ppr_dev_attr *)to_ppr_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ppr_ops *ops = ctx->ppr[inst].ppr_ops;
+	int ret;
+	u32 val;
+
+	ret = ops->get_ppr_safe_when_in_use(ras_feat_dev->parent,
+					    ctx->ppr[inst].private, &val);
+	if (ret)
+		return ret;
+
+	return sysfs_emit(buf, "%u\n", val);
+}
+
+static ssize_t repair_hpa_store(struct device *ras_feat_dev,
+				struct device_attribute *attr,
+				const char *buf, size_t len)
+{
+	u8 inst = ((struct edac_ppr_dev_attr *)to_ppr_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ppr_ops *ops = ctx->ppr[inst].ppr_ops;
+	u64 hpa;
+	int ret;
+
+	ret = kstrtou64(buf, 0, &hpa);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->do_ppr(ras_feat_dev->parent, ctx->ppr[inst].private, true, hpa);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static ssize_t repair_dpa_store(struct device *ras_feat_dev,
+				struct device_attribute *attr,
+				const char *buf, size_t len)
+{
+	u8 inst = ((struct edac_ppr_dev_attr *)to_ppr_dev_attr(attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ppr_ops *ops = ctx->ppr[inst].ppr_ops;
+	u64 dpa;
+	int ret;
+
+	ret = kstrtou64(buf, 0, &dpa);
+	if (ret < 0)
+		return ret;
+
+	ret = ops->do_ppr(ras_feat_dev->parent, ctx->ppr[inst].private, 0, dpa);
+	if (ret)
+		return ret;
+
+	return len;
+}
+
+static umode_t ppr_attr_visible(struct kobject *kobj,
+				struct attribute *a, int attr_id)
+{
+	struct device *ras_feat_dev = kobj_to_dev(kobj);
+	struct device_attribute *dev_attr =
+				container_of(a, struct device_attribute, attr);
+	u8 inst = ((struct edac_ppr_dev_attr *)to_ppr_dev_attr(dev_attr))->instance;
+	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
+	const struct edac_ppr_ops *ops = ctx->ppr[inst].ppr_ops;
+
+	switch (attr_id) {
+	case PPR_PERSIST_MODE_AVAIL:
+		return ops->get_persist_mode_avail ? a->mode : 0;
+	case PPR_PERSIST_MODE:
+		if (ops->get_persist_mode && ops->set_persist_mode)
+			return a->mode;
+		if (ops->get_persist_mode)
+			return 0444;
+		return 0;
+	case PPR_DPA_SUPPORT:
+		return ops->get_dpa_support ? a->mode : 0;
+	case PPR_SAFE_IN_USE:
+		return ops->get_ppr_safe_when_in_use ? a->mode : 0;
+	case PPR_HPA:
+	case PPR_DPA:
+		return ops->do_ppr ? a->mode : 0;
+	default:
+		return 0;
+	}
+}
+
+#define EDAC_PPR_ATTR_RO(_name, _instance)       \
+	((struct edac_ppr_dev_attr) { .dev_attr = __ATTR_RO(_name), \
+				     .instance = _instance })
+
+#define EDAC_PPR_ATTR_WO(_name, _instance)       \
+	((struct edac_ppr_dev_attr) { .dev_attr = __ATTR_WO(_name), \
+				     .instance = _instance })
+
+#define EDAC_PPR_ATTR_RW(_name, _instance)       \
+	((struct edac_ppr_dev_attr) { .dev_attr = __ATTR_RW(_name), \
+				     .instance = _instance })
+
+static int ppr_create_desc(struct device *ppr_dev,
+			   const struct attribute_group **attr_groups,
+			   u8 instance)
+{
+	struct edac_ppr_context *ppr_ctx;
+	struct attribute_group *group;
+	int i;
+
+	ppr_ctx = devm_kzalloc(ppr_dev, sizeof(*ppr_ctx), GFP_KERNEL);
+	if (!ppr_ctx)
+		return -ENOMEM;
+
+	group = &ppr_ctx->group;
+	ppr_ctx->ppr_dev_attr[0] = EDAC_PPR_ATTR_RO(persist_mode_avail, instance);
+	ppr_ctx->ppr_dev_attr[1] = EDAC_PPR_ATTR_RW(persist_mode, instance);
+	ppr_ctx->ppr_dev_attr[2] = EDAC_PPR_ATTR_RO(dpa_support, instance);
+	ppr_ctx->ppr_dev_attr[3] = EDAC_PPR_ATTR_RO(ppr_safe_when_in_use, instance);
+	ppr_ctx->ppr_dev_attr[4] = EDAC_PPR_ATTR_WO(repair_hpa, instance);
+	ppr_ctx->ppr_dev_attr[5] = EDAC_PPR_ATTR_WO(repair_dpa, instance);
+	for (i = 0; i < PPR_MAX_ATTRS; i++)
+		ppr_ctx->ppr_attrs[i] = &ppr_ctx->ppr_dev_attr[i].dev_attr.attr;
+
+	sprintf(ppr_ctx->name, "%s%d", "ppr", instance);
+	group->name = ppr_ctx->name;
+	group->attrs = ppr_ctx->ppr_attrs;
+	group->is_visible  = ppr_attr_visible;
+
+	attr_groups[0] = group;
+
+	return 0;
+}
+
+/**
+ * edac_ppr_get_desc - get EDAC PPR descriptors
+ * @ppr_dev: client PPR device
+ * @attr_groups: pointer to attrribute group container
+ * @instance: device's PPR instance number.
+ *
+ * Returns 0 on success, error otherwise.
+ */
+int edac_ppr_get_desc(struct device *ppr_dev,
+		      const struct attribute_group **attr_groups,
+		      u8 instance)
+{
+	if (!ppr_dev || !attr_groups)
+		return -EINVAL;
+
+	return ppr_create_desc(ppr_dev, attr_groups, instance);
+}
diff --git a/include/linux/edac.h b/include/linux/edac.h
index 90cb90cf5272..bd99b7a6804d 100644
--- a/include/linux/edac.h
+++ b/include/linux/edac.h
@@ -741,6 +741,36 @@ struct edac_ecs_ex_info {
 int edac_ecs_get_desc(struct device *ecs_dev,
 		      const struct attribute_group **attr_groups,
 		      u16 num_media_frus);
+
+enum edac_ppr_type {
+	EDAC_TYPE_SPPR, /* soft PPR */
+	EDAC_TYPE_HPPR, /* hard PPR */
+};
+
+/**
+ * struct edac_ppr_ops - PPR(Post Package Repair) device operations
+ * (all elements optional)
+ * @get_persist_mode_avail: get the persist modes supported in the device.
+ * @get_persist_mode: get the persist mode of the PPR instance.
+ * @set_persist_mode: set the persist mode for the PPR instance.
+ * @get_dpa_support: get dpa support flag.
+ * @get_ppr_safe_when_in_use: get whether memory media is accessible and
+ *			       data is retained during PPR operation.
+ * @do_ppr: start PPR operation for the HPA/DPA set.
+ */
+struct edac_ppr_ops {
+	int (*get_persist_mode_avail)(struct device *dev, void *drv_data, char *buf);
+	int (*get_persist_mode)(struct device *dev, void *drv_data, u32 *mode);
+	int (*set_persist_mode)(struct device *dev, void *drv_data, u32 mode);
+	int (*get_dpa_support)(struct device *dev, void *drv_data, u32 *val);
+	int (*get_ppr_safe_when_in_use)(struct device *dev, void *drv_data, u32 *val);
+	int (*do_ppr)(struct device *dev, void *drv_data, bool hpa, u64 pa);
+};
+
+int edac_ppr_get_desc(struct device *ppr_dev,
+		      const struct attribute_group **attr_groups,
+		      u8 instance);
+
 /*
  * EDAC device feature information structure
  */
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 16/17] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (14 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 15/17] EDAC: Add EDAC PPR control driver shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  2024-09-11  9:04 ` [PATCH v12 17/17] cxl/memfeature: Add CXL memory device PPR control feature shiju.jose
  16 siblings, 0 replies; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Add support for PERFORM_MAINTENANCE mailbox command.

CXL spec 3.1 section 8.2.9.7.1 describes the Perform Maintenance command.
This command requests the device to execute the maintenance operation
specified by the maintenance operation class and the maintenance operation
subclass.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/mbox.c | 40 ++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h    | 17 +++++++++++++++++
 2 files changed, 57 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 806b1c8087b0..6c91f5f5be48 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1079,6 +1079,46 @@ int cxl_set_feature(struct cxl_dev_state *cxlds,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_set_feature, CXL);
 
+int cxl_do_maintenance(struct cxl_dev_state *cxlds,
+		       u8 class, u8 subclass,
+		       void *data_in, size_t data_in_size)
+{
+	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
+	struct cxl_memdev_maintenance_pi {
+		struct cxl_mbox_do_maintenance_hdr hdr;
+		u8 data[];
+	}  __packed;
+	struct cxl_mbox_cmd mbox_cmd;
+	size_t hdr_size;
+	int rc = 0;
+
+	struct cxl_memdev_maintenance_pi *pi __free(kfree) =
+					kmalloc(cxl_mbox->payload_size, GFP_KERNEL);
+	pi->hdr.op_class = class;
+	pi->hdr.op_subclass = subclass;
+	hdr_size = sizeof(pi->hdr);
+	/*
+	 * Check minimum mbox payload size is available for
+	 * the maintenance data transfer.
+	 */
+	if (hdr_size + data_in_size > cxl_mbox->payload_size)
+		return -ENOMEM;
+
+	memcpy(pi->data, data_in, data_in_size);
+	mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_DO_MAINTENANCE,
+		.size_in = hdr_size + data_in_size,
+		.payload_in = pi,
+	};
+
+	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
+	if (rc < 0)
+		return rc;
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_do_maintenance, CXL);
+
 /**
  * cxl_enumerate_cmds() - Enumerate commands for a device.
  * @mds: The driver data for the operation
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 2187c3378eaa..ba454a2315ee 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -489,6 +489,7 @@ enum cxl_opcode {
 	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
 	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
 	CXL_MBOX_OP_SET_FEATURE		= 0x0502,
+	CXL_MBOX_OP_DO_MAINTENANCE	= 0x0600,
 	CXL_MBOX_OP_IDENTIFY		= 0x4000,
 	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
 	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
@@ -865,6 +866,19 @@ struct cxl_mbox_set_feat_hdr {
 	u8 rsvd[9];
 }  __packed;
 
+/*
+ * Perform Maintenance CXL 3.1 Spec 8.2.9.7.1
+ */
+
+/*
+ * Perform Maintenance input payload
+ * CXL rev 3.1 section 8.2.9.7.1 Table 8-102
+ */
+struct cxl_mbox_do_maintenance_hdr {
+	u8 op_class;
+	u8 op_subclass;
+}  __packed;
+
 int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
 			  struct cxl_mbox_cmd *cmd);
 int cxl_dev_state_identify(struct cxl_memdev_state *mds);
@@ -942,4 +956,7 @@ int cxl_set_feature(struct cxl_dev_state *cxlds,
 		    const uuid_t feat_uuid, u8 feat_version,
 		    void *feat_data, size_t feat_data_size,
 		    u8 feat_flag);
+int cxl_do_maintenance(struct cxl_dev_state *cxlds,
+		       u8 class, u8 subclass,
+		       void *data_in, size_t data_in_size);
 #endif /* __CXL_MEM_H__ */
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 17/17] cxl/memfeature: Add CXL memory device PPR control feature
  2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
                   ` (15 preceding siblings ...)
  2024-09-11  9:04 ` [PATCH v12 16/17] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
@ 2024-09-11  9:04 ` shiju.jose
  16 siblings, 0 replies; 39+ messages in thread
From: shiju.jose @ 2024-09-11  9:04 UTC (permalink / raw)
  To: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, yazen.ghannam, jgroves, vsalve, tanxiaofei,
	prime.zeng, roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm,
	shiju.jose

From: Shiju Jose <shiju.jose@huawei.com>

Post Package Repair (PPR) maintenance operations may be supported by CXL
devices that implement CXL.mem protocol. A PPR maintenance operation
requests the CXL device to perform a repair operation on its media.
For example, a CXL device with DRAM components that support PPR features
may implement PPR Maintenance operations. DRAM components may support two
types of PPR: Hard PPR (hPPR), for a permanent row repair, and Soft PPR
(sPPR), for a temporary row repair. sPPR is much faster than hPPR, but the
repair is lost with a power cycle.

During the execution of a PPR Maintenance operation, a CXL memory device:
- May or may not retain data
- May or may not be able to process CXL.mem requests correctly, including
the ones that target the DPA involved in the repair.
These CXL Memory Device capabilities are specified by Restriction Flags
in the sPPR Feature and hPPR Feature.

sPPR maintenance operation may be executed at runtime, if data is retained
and CXL.mem requests are correctly processed. For CXL devices with DRAM
components, hPPR maintenance operation may be executed only at boot because
data would not be retained.
When a CXL device identifies a failure on a memory component, the device
may inform the host about the need for a PPR maintenance operation by using
an Event Record, where the Maintenance Needed flag is set. The Event Record
specifies the DPA that should be repaired. A CXL device may not keep track
of the requests that have already been sent and the information on which
DPA should be repaired may be lost upon power cycle.
The userspace tool requests for maintenance operation if the number of
corrected error reported on a CXL.mem media exceeds error threshold.

CXL spec 3.1 section 8.2.9.7.1.2 describes the device's sPPR (soft PPR)
maintenance operation and section 8.2.9.7.1.3 describes the device's
hPPR (hard PPR) maintenance operation feature.

CXL spec 3.1 section 8.2.9.7.2.1 describes the sPPR feature discovery and
configuration.

CXL spec 3.1 section 8.2.9.7.2.12 describes the hPPR feature discovery and
configuration.

Add support for CXL memory device PPR control.
Register with EDAC driver, which gets the PPR attr descriptors from the
EDAC PPR and expose sysfs PPR control attributes to the userspace.
For example CXL PPR control for the CXL mem0 device is
exposed in /sys/bus/edac/devices/cxl_mem0/pprX/

Tested with QEMU patch for CXL PPR feature.
https://lore.kernel.org/all/20240730045722.71482-1-dave@stgolabs.net/

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 drivers/cxl/core/memfeature.c | 277 +++++++++++++++++++++++++++++++++-
 1 file changed, 276 insertions(+), 1 deletion(-)

diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c
index 5d4057fa304c..ad0b4a616d9d 100644
--- a/drivers/cxl/core/memfeature.c
+++ b/drivers/cxl/core/memfeature.c
@@ -18,8 +18,9 @@
 #include <linux/limits.h>
 #include <cxl.h>
 #include <linux/edac.h>
+#include "core.h"
 
-#define CXL_DEV_NUM_RAS_FEATURES	2
+#define CXL_DEV_NUM_RAS_FEATURES	3
 #define CXL_DEV_HOUR_IN_SECS	3600
 
 #define CXL_SCRUB_NAME_LEN	128
@@ -702,6 +703,247 @@ static const struct edac_ecs_ops cxl_ecs_ops = {
 	.set_threshold = cxl_ecs_set_threshold,
 };
 
+/* CXL memory soft PPR & hard PPR control definitions */
+static const uuid_t cxl_sppr_uuid =
+	UUID_INIT(0x892ba475, 0xfad8, 0x474e, 0x9d, 0x3e, 0x69, 0x2c, 0x91,     \
+		  0x75, 0x68, 0xbb);
+
+static const uuid_t cxl_hppr_uuid =
+	UUID_INIT(0x80ea4521, 0x786f, 0x4127, 0xaf, 0xb1, 0xec, 0x74, 0x59,     \
+		  0xfb, 0x0e, 0x24);
+
+struct cxl_ppr_context {
+	uuid_t ppr_uuid;
+	u8 instance;
+	u16 get_feat_size;
+	u16 set_feat_size;
+	u8 get_version;
+	u8 set_version;
+	u16 set_effects;
+	struct cxl_memdev *cxlmd;
+	enum edac_ppr_type ppr_type;
+	u64 dpa;
+	u32 nibble_mask;
+};
+
+/**
+ * struct cxl_memdev_ppr_params - CXL memory PPR parameter data structure.
+ * @op_class[OUT]: PPR operation class.
+ * @op_subclass[OUT]: PPR operation subclass.
+ * @dpa_support[OUT]: device physical address for PPR support.
+ * @media_accessible[OUT]: memory media is accessible or not during PPR operation.
+ * @data_retained[OUT]: data is retained or not during PPR operation.
+ * @dpa:[IN]: device physical address.
+ */
+struct cxl_memdev_ppr_params {
+	u8 op_class;
+	u8 op_subclass;
+	bool dpa_support;
+	bool media_accessible;
+	bool data_retained;
+	u64 dpa;
+};
+
+enum cxl_ppr_param {
+	CXL_PPR_PARAM_DO_PPR,
+};
+
+#define	CXL_MEMDEV_PPR_QUERY_RESOURCE_FLAG BIT(0)
+
+#define CXL_MEMDEV_PPR_DEVICE_INITIATED_MASK BIT(0)
+#define CXL_MEMDEV_PPR_FLAG_DPA_SUPPORT_MASK BIT(0)
+#define CXL_MEMDEV_PPR_FLAG_NIBBLE_SUPPORT_MASK BIT(1)
+#define CXL_MEMDEV_PPR_FLAG_MEM_SPARING_EV_REC_SUPPORT_MASK BIT(2)
+
+#define CXL_MEMDEV_PPR_RESTRICTION_FLAG_MEDIA_ACCESSIBLE_MASK BIT(0)
+#define CXL_MEMDEV_PPR_RESTRICTION_FLAG_DATA_RETAINED_MASK BIT(2)
+
+#define CXL_MEMDEV_PPR_SPARING_EV_REC_EN_MASK BIT(0)
+
+struct cxl_memdev_ppr_rd_attrs {
+	u8 max_op_latency;
+	__le16 op_cap;
+	__le16 op_mode;
+	u8 op_class;
+	u8 op_subclass;
+	u8 rsvd[9];
+	u8 ppr_flags;
+	__le16 restriction_flags;
+	u8 ppr_op_mode;
+}  __packed;
+
+struct cxl_memdev_ppr_wr_attrs {
+	__le16 op_mode;
+	u8 ppr_op_mode;
+}  __packed;
+
+struct cxl_memdev_ppr_maintenance_attrs {
+	u8 flags;
+	__le64 dpa;
+	u8 nibble_mask[3];
+}  __packed;
+
+static int cxl_mem_ppr_get_attrs(struct device *dev, void *drv_data,
+				 struct cxl_memdev_ppr_params *params)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+	struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	size_t rd_data_size = sizeof(struct cxl_memdev_ppr_rd_attrs);
+	size_t data_size;
+	struct cxl_memdev_ppr_rd_attrs *rd_attrs __free(kfree) =
+				kmalloc(rd_data_size, GFP_KERNEL);
+	if (!rd_attrs)
+		return -ENOMEM;
+
+	data_size = cxl_get_feature(cxlds, cxl_ppr_ctx->ppr_uuid,
+				    CXL_GET_FEAT_SEL_CURRENT_VALUE,
+				    rd_attrs, rd_data_size);
+	if (!data_size)
+		return -EIO;
+
+	params->op_class = rd_attrs->op_class;
+	params->op_subclass = rd_attrs->op_subclass;
+	params->dpa_support = FIELD_GET(CXL_MEMDEV_PPR_FLAG_DPA_SUPPORT_MASK,
+					rd_attrs->ppr_flags);
+	params->media_accessible = FIELD_GET(CXL_MEMDEV_PPR_RESTRICTION_FLAG_MEDIA_ACCESSIBLE_MASK,
+					     rd_attrs->restriction_flags) ^ 1;
+	params->data_retained = FIELD_GET(CXL_MEMDEV_PPR_RESTRICTION_FLAG_DATA_RETAINED_MASK,
+					  rd_attrs->restriction_flags) ^ 1;
+
+	return 0;
+}
+
+static int cxl_mem_ppr_set_attrs(struct device *dev, void *drv_data,
+				 struct cxl_memdev_ppr_params *params,
+				 enum cxl_ppr_param param_type)
+{
+	struct cxl_memdev_ppr_maintenance_attrs maintenance_attrs;
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+	struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd;
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	struct cxl_memdev_ppr_params rd_params;
+	struct cxl_region *cxlr;
+	int ret;
+
+	ret = cxl_mem_ppr_get_attrs(dev, drv_data, &rd_params);
+	if (ret) {
+		dev_err(dev, "Get cxlmemdev PPR params failed ret=%d\n",
+			ret);
+		return ret;
+	}
+
+	switch (param_type) {
+	case CXL_PPR_PARAM_DO_PPR:
+		ret = down_read_interruptible(&cxl_region_rwsem);
+		if (ret)
+			return ret;
+		if (!rd_params.media_accessible || !rd_params.data_retained) {
+			/* Check if DPA is mapped */
+			ret = down_read_interruptible(&cxl_dpa_rwsem);
+			if (ret) {
+				up_read(&cxl_region_rwsem);
+				return ret;
+			}
+
+			cxlr = cxl_dpa_to_region(cxlmd, cxl_ppr_ctx->dpa);
+			up_read(&cxl_dpa_rwsem);
+			if (cxlr) {
+				dev_err(dev, "CXL can't do PPR as DPA is mapped\n");
+				up_read(&cxl_region_rwsem);
+				return -EBUSY;
+			}
+		}
+		maintenance_attrs.flags = CXL_MEMDEV_PPR_QUERY_RESOURCE_FLAG;
+		maintenance_attrs.dpa = params->dpa;
+		/* May need to get the nibble mask from the CXL dram error record via
+		 * trace dram event. Presently all nibble masks bits set to 1.
+		 */
+		maintenance_attrs.nibble_mask[0] = 0xFF;
+		maintenance_attrs.nibble_mask[1] = 0xFF;
+		maintenance_attrs.nibble_mask[2] = 0xFF;
+		ret = cxl_do_maintenance(cxlds, rd_params.op_class, rd_params.op_subclass,
+					 &maintenance_attrs, sizeof(maintenance_attrs));
+		if (ret) {
+			dev_err(dev, "CXL do PPR maintenance failed ret=%d\n", ret);
+			up_read(&cxl_region_rwsem);
+			return ret;
+		}
+		up_read(&cxl_region_rwsem);
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static int cxl_ppr_get_persist_mode_avail(struct device *dev, void *drv_data,
+					  char *buf)
+{
+	return sysfs_emit(buf, "Soft PPR Hard PPR\n");
+}
+
+static int cxl_ppr_get_persist_mode(struct device *dev, void *drv_data,
+				    u32 *persist_mode)
+{
+	struct cxl_ppr_context *cxl_ppr_ctx = drv_data;
+
+	*persist_mode = cxl_ppr_ctx->ppr_type;
+
+	return 0;
+}
+
+static int cxl_ppr_get_dpa_support(struct device *dev, void *drv_data,
+				   u32 *dpa_support)
+{
+	struct cxl_memdev_ppr_params params;
+	int ret;
+
+	ret = cxl_mem_ppr_get_attrs(dev, drv_data, &params);
+	if (ret)
+		return ret;
+
+	*dpa_support = params.dpa_support;
+
+	return 0;
+}
+
+static int cxl_get_ppr_safe_when_in_use(struct device *dev, void *drv_data,
+					u32 *safe)
+{
+	struct cxl_memdev_ppr_params params;
+	int ret;
+
+	ret = cxl_mem_ppr_get_attrs(dev, drv_data, &params);
+	if (ret)
+		return ret;
+
+	*safe = params.media_accessible & params.data_retained;
+
+	return 0;
+}
+
+static int cxl_do_ppr(struct device *dev, void *drv_data, bool hpa, u64 pa)
+{
+	struct cxl_memdev_ppr_params params = {
+		.dpa = pa,
+	};
+
+	/* CXL mem perform PPR, need support for HPA? */
+	if (hpa)
+		return -EOPNOTSUPP;
+
+	return cxl_mem_ppr_set_attrs(dev, drv_data, &params,
+				     CXL_PPR_PARAM_DO_PPR);
+}
+
+static const struct edac_ppr_ops cxl_sppr_ops = {
+	.get_persist_mode_avail = cxl_ppr_get_persist_mode_avail,
+	.get_persist_mode = cxl_ppr_get_persist_mode,
+	.get_dpa_support = cxl_ppr_get_dpa_support,
+	.get_ppr_safe_when_in_use = cxl_get_ppr_safe_when_in_use,
+	.do_ppr = cxl_do_ppr,
+};
+
 int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
 {
 	struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
@@ -710,8 +952,10 @@ int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
 	struct cxl_feat_entry feat_entry;
 	char cxl_dev_name[CXL_SCRUB_NAME_LEN];
 	struct cxl_ecs_context *cxl_ecs_ctx;
+	struct cxl_ppr_context *cxl_sppr_ctx;
 	int rc, i, num_ras_features = 0;
 	int num_media_frus;
+	u8 ppr_inst = 0;
 
 	if (cxlr) {
 		struct cxl_region_params *p = &cxlr->params;
@@ -800,6 +1044,37 @@ int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
 		ras_features[num_ras_features].ecs_info.num_media_frus =
 								num_media_frus;
 		num_ras_features++;
+
+		/* CXL sPPR */
+		rc = cxl_get_supported_feature_entry(cxlds, &cxl_sppr_uuid,
+						     &feat_entry);
+		if (rc < 0)
+			goto feat_register;
+
+		if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
+			goto feat_register;
+
+		cxl_sppr_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_sppr_ctx),
+					    GFP_KERNEL);
+		if (!cxl_sppr_ctx)
+			goto feat_register;
+		*cxl_sppr_ctx = (struct cxl_ppr_context) {
+			.ppr_uuid = cxl_sppr_uuid,
+			.get_feat_size = feat_entry.get_feat_size,
+			.set_feat_size = feat_entry.set_feat_size,
+			.get_version = feat_entry.get_feat_ver,
+			.set_version = feat_entry.set_feat_ver,
+			.set_effects = feat_entry.set_effects,
+			.cxlmd = cxlmd,
+			.ppr_type = EDAC_TYPE_SPPR,
+			.instance = ppr_inst++,
+		};
+
+		ras_features[num_ras_features].ft_type = RAS_FEAT_PPR;
+		ras_features[num_ras_features].instance = cxl_sppr_ctx->instance;
+		ras_features[num_ras_features].ppr_ops = &cxl_sppr_ops;
+		ras_features[num_ras_features].ctx = cxl_sppr_ctx;
+		num_ras_features++;
 	}
 
 feat_register:
-- 
2.34.1



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 04/17] cxl: Move mailbox related bits to the same context
  2024-09-11  9:04 ` [PATCH v12 04/17] cxl: Move mailbox related bits to the same context shiju.jose
@ 2024-09-11 17:20   ` Dave Jiang
  2024-09-12  9:42     ` Shiju Jose
  0 siblings, 1 reply; 39+ messages in thread
From: Dave Jiang @ 2024-09-11 17:20 UTC (permalink / raw)
  To: shiju.jose, linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, alison.schofield, vishal.l.verma, ira.weiny,
	david, Vilas.Sridharan, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	jgroves, vsalve, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang, linuxarm



On 9/11/24 2:04 AM, shiju.jose@huawei.com wrote:
> From: Dave Jiang <dave.jiang@intel.com>
> 
> Create a new 'struct cxl_mailbox' and move all mailbox related bits to
> it. This allows isolation of all CXL mailbox data in order to export
> some of the calls to external caller (fwctl) and avoid exporting of
> CXL driver specific bits such has device states.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

Hi Shiju,
Just FYI there's this series [1] that may require some rebasing for the cxl mailbox bits. I do plan to pull those changes in for 6.12 merge window. 

And I should be able to rebase the fwctl code and post v2 after plumbers pending the discussions there. and it should reflect the same code WRT to features as what you are using here in the RAS series.  

[1]: https://lore.kernel.org/linux-cxl/20240905223711.1990186-1-dave.jiang@intel.com/

DJ

> ---
>  MAINTAINERS                  |  1 +
>  drivers/cxl/core/mbox.c      | 48 ++++++++++++++++++---------
>  drivers/cxl/core/memdev.c    | 18 +++++++----
>  drivers/cxl/cxlmem.h         | 49 ++--------------------------
>  drivers/cxl/pci.c            | 58 +++++++++++++++++++--------------
>  drivers/cxl/pmem.c           |  4 ++-
>  include/linux/cxl/mailbox.h  | 63 ++++++++++++++++++++++++++++++++++++
>  tools/testing/cxl/test/mem.c | 27 ++++++++++------
>  8 files changed, 163 insertions(+), 105 deletions(-)
>  create mode 100644 include/linux/cxl/mailbox.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 878dcd23b331..227c2b214f00 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5619,6 +5619,7 @@ F:	Documentation/driver-api/cxl
>  F:	drivers/cxl/
>  F:	include/linux/einj-cxl.h
>  F:	include/linux/cxl-event.h
> +F:	include/linux/cxl/
>  F:	include/uapi/linux/cxl_mem.h
>  F:	tools/testing/cxl/
>  
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index e5cdeafdf76e..216937ef9e07 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -244,16 +244,17 @@ static const char *cxl_mem_opcode_to_name(u16 opcode)
>  int cxl_internal_send_cmd(struct cxl_memdev_state *mds,
>  			  struct cxl_mbox_cmd *mbox_cmd)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	size_t out_size, min_out;
>  	int rc;
>  
> -	if (mbox_cmd->size_in > mds->payload_size ||
> -	    mbox_cmd->size_out > mds->payload_size)
> +	if (mbox_cmd->size_in > cxl_mbox->payload_size ||
> +	    mbox_cmd->size_out > cxl_mbox->payload_size)
>  		return -E2BIG;
>  
>  	out_size = mbox_cmd->size_out;
>  	min_out = mbox_cmd->min_out;
> -	rc = mds->mbox_send(mds, mbox_cmd);
> +	rc = cxl_mbox->mbox_send(cxl_mbox, mbox_cmd);
>  	/*
>  	 * EIO is reserved for a payload size mismatch and mbox_send()
>  	 * may not return this error.
> @@ -353,6 +354,7 @@ static int cxl_mbox_cmd_ctor(struct cxl_mbox_cmd *mbox,
>  			     struct cxl_memdev_state *mds, u16 opcode,
>  			     size_t in_size, size_t out_size, u64 in_payload)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	*mbox = (struct cxl_mbox_cmd) {
>  		.opcode = opcode,
>  		.size_in = in_size,
> @@ -374,7 +376,7 @@ static int cxl_mbox_cmd_ctor(struct cxl_mbox_cmd *mbox,
>  
>  	/* Prepare to handle a full payload for variable sized output */
>  	if (out_size == CXL_VARIABLE_PAYLOAD)
> -		mbox->size_out = mds->payload_size;
> +		mbox->size_out = cxl_mbox->payload_size;
>  	else
>  		mbox->size_out = out_size;
>  
> @@ -398,6 +400,8 @@ static int cxl_to_mem_cmd_raw(struct cxl_mem_command *mem_cmd,
>  			      const struct cxl_send_command *send_cmd,
>  			      struct cxl_memdev_state *mds)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> +
>  	if (send_cmd->raw.rsvd)
>  		return -EINVAL;
>  
> @@ -406,7 +410,7 @@ static int cxl_to_mem_cmd_raw(struct cxl_mem_command *mem_cmd,
>  	 * gets passed along without further checking, so it must be
>  	 * validated here.
>  	 */
> -	if (send_cmd->out.size > mds->payload_size)
> +	if (send_cmd->out.size > cxl_mbox->payload_size)
>  		return -EINVAL;
>  
>  	if (!cxl_mem_raw_command_allowed(send_cmd->raw.opcode))
> @@ -494,6 +498,7 @@ static int cxl_validate_cmd_from_user(struct cxl_mbox_cmd *mbox_cmd,
>  				      struct cxl_memdev_state *mds,
>  				      const struct cxl_send_command *send_cmd)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	struct cxl_mem_command mem_cmd;
>  	int rc;
>  
> @@ -505,7 +510,7 @@ static int cxl_validate_cmd_from_user(struct cxl_mbox_cmd *mbox_cmd,
>  	 * supports, but output can be arbitrarily large (simply write out as
>  	 * much data as the hardware provides).
>  	 */
> -	if (send_cmd->in.size > mds->payload_size)
> +	if (send_cmd->in.size > cxl_mbox->payload_size)
>  		return -EINVAL;
>  
>  	/* Sanitize and construct a cxl_mem_command */
> @@ -591,6 +596,7 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev_state *mds,
>  					u64 out_payload, s32 *size_out,
>  					u32 *retval)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	struct device *dev = mds->cxlds.dev;
>  	int rc;
>  
> @@ -601,7 +607,7 @@ static int handle_mailbox_cmd_from_user(struct cxl_memdev_state *mds,
>  		cxl_mem_opcode_to_name(mbox_cmd->opcode),
>  		mbox_cmd->opcode, mbox_cmd->size_in);
>  
> -	rc = mds->mbox_send(mds, mbox_cmd);
> +	rc = cxl_mbox->mbox_send(cxl_mbox, mbox_cmd);
>  	if (rc)
>  		goto out;
>  
> @@ -659,11 +665,12 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd, struct cxl_send_command __user *s)
>  static int cxl_xfer_log(struct cxl_memdev_state *mds, uuid_t *uuid,
>  			u32 *size, u8 *out)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	u32 remaining = *size;
>  	u32 offset = 0;
>  
>  	while (remaining) {
> -		u32 xfer_size = min_t(u32, remaining, mds->payload_size);
> +		u32 xfer_size = min_t(u32, remaining, cxl_mbox->payload_size);
>  		struct cxl_mbox_cmd mbox_cmd;
>  		struct cxl_mbox_get_log log;
>  		int rc;
> @@ -752,17 +759,18 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel)
>  
>  static struct cxl_mbox_get_supported_logs *cxl_get_gsl(struct cxl_memdev_state *mds)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	struct cxl_mbox_get_supported_logs *ret;
>  	struct cxl_mbox_cmd mbox_cmd;
>  	int rc;
>  
> -	ret = kvmalloc(mds->payload_size, GFP_KERNEL);
> +	ret = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
>  	if (!ret)
>  		return ERR_PTR(-ENOMEM);
>  
>  	mbox_cmd = (struct cxl_mbox_cmd) {
>  		.opcode = CXL_MBOX_OP_GET_SUPPORTED_LOGS,
> -		.size_out = mds->payload_size,
> +		.size_out = cxl_mbox->payload_size,
>  		.payload_out = ret,
>  		/* At least the record number field must be valid */
>  		.min_out = 2,
> @@ -910,6 +918,7 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
>  				  enum cxl_event_log_type log,
>  				  struct cxl_get_event_payload *get_pl)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	struct cxl_mbox_clear_event_payload *payload;
>  	u16 total = le16_to_cpu(get_pl->record_count);
>  	u8 max_handles = CXL_CLEAR_EVENT_MAX_HANDLES;
> @@ -920,8 +929,8 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
>  	int i;
>  
>  	/* Payload size may limit the max handles */
> -	if (pl_size > mds->payload_size) {
> -		max_handles = (mds->payload_size - sizeof(*payload)) /
> +	if (pl_size > cxl_mbox->payload_size) {
> +		max_handles = (cxl_mbox->payload_size - sizeof(*payload)) /
>  			      sizeof(__le16);
>  		pl_size = struct_size(payload, handles, max_handles);
>  	}
> @@ -979,6 +988,7 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
>  static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>  				    enum cxl_event_log_type type)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
>  	struct device *dev = mds->cxlds.dev;
>  	struct cxl_get_event_payload *payload;
> @@ -995,7 +1005,7 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>  			.payload_in = &log_type,
>  			.size_in = sizeof(log_type),
>  			.payload_out = payload,
> -			.size_out = mds->payload_size,
> +			.size_out = cxl_mbox->payload_size,
>  			.min_out = struct_size(payload, records, 0),
>  		};
>  
> @@ -1328,6 +1338,7 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>  		       struct cxl_region *cxlr)
>  {
>  	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	struct cxl_mbox_poison_out *po;
>  	struct cxl_mbox_poison_in pi;
>  	int nr_records = 0;
> @@ -1346,7 +1357,7 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>  			.opcode = CXL_MBOX_OP_GET_POISON,
>  			.size_in = sizeof(pi),
>  			.payload_in = &pi,
> -			.size_out = mds->payload_size,
> +			.size_out = cxl_mbox->payload_size,
>  			.payload_out = po,
>  			.min_out = struct_size(po, record, 0),
>  		};
> @@ -1382,7 +1393,9 @@ static void free_poison_buf(void *buf)
>  /* Get Poison List output buffer is protected by mds->poison.lock */
>  static int cxl_poison_alloc_buf(struct cxl_memdev_state *mds)
>  {
> -	mds->poison.list_out = kvmalloc(mds->payload_size, GFP_KERNEL);
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> +
> +	mds->poison.list_out = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
>  	if (!mds->poison.list_out)
>  		return -ENOMEM;
>  
> @@ -1411,6 +1424,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL);
>  struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
>  {
>  	struct cxl_memdev_state *mds;
> +	struct cxl_mailbox *cxl_mbox;
>  
>  	mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
>  	if (!mds) {
> @@ -1418,7 +1432,9 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
>  		return ERR_PTR(-ENOMEM);
>  	}
>  
> -	mutex_init(&mds->mbox_mutex);
> +	cxl_mbox = &mds->cxlds.cxl_mbox;
> +	mutex_init(&cxl_mbox->mbox_mutex);
> +
>  	mutex_init(&mds->event.log_lock);
>  	mds->cxlds.dev = dev;
>  	mds->cxlds.reg_map.host = dev;
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index 0277726afd04..05bb84cb1274 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -58,7 +58,7 @@ static ssize_t payload_max_show(struct device *dev,
>  
>  	if (!mds)
>  		return sysfs_emit(buf, "\n");
> -	return sysfs_emit(buf, "%zu\n", mds->payload_size);
> +	return sysfs_emit(buf, "%zu\n", cxlds->cxl_mbox.payload_size);
>  }
>  static DEVICE_ATTR_RO(payload_max);
>  
> @@ -124,15 +124,16 @@ static ssize_t security_state_show(struct device *dev,
>  {
>  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>  	struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>  	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>  	unsigned long state = mds->security.state;
>  	int rc = 0;
>  
>  	/* sync with latest submission state */
> -	mutex_lock(&mds->mbox_mutex);
> +	mutex_lock(&cxl_mbox->mbox_mutex);
>  	if (mds->security.sanitize_active)
>  		rc = sysfs_emit(buf, "sanitize\n");
> -	mutex_unlock(&mds->mbox_mutex);
> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>  	if (rc)
>  		return rc;
>  
> @@ -829,12 +830,13 @@ static enum fw_upload_err cxl_fw_prepare(struct fw_upload *fwl, const u8 *data,
>  {
>  	struct cxl_memdev_state *mds = fwl->dd_handle;
>  	struct cxl_mbox_transfer_fw *transfer;
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  
>  	if (!size)
>  		return FW_UPLOAD_ERR_INVALID_SIZE;
>  
>  	mds->fw.oneshot = struct_size(transfer, data, size) <
> -			    mds->payload_size;
> +			    cxl_mbox->payload_size;
>  
>  	if (cxl_mem_get_fw_info(mds))
>  		return FW_UPLOAD_ERR_HW_ERROR;
> @@ -854,6 +856,7 @@ static enum fw_upload_err cxl_fw_write(struct fw_upload *fwl, const u8 *data,
>  {
>  	struct cxl_memdev_state *mds = fwl->dd_handle;
>  	struct cxl_dev_state *cxlds = &mds->cxlds;
> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>  	struct cxl_memdev *cxlmd = cxlds->cxlmd;
>  	struct cxl_mbox_transfer_fw *transfer;
>  	struct cxl_mbox_cmd mbox_cmd;
> @@ -877,7 +880,7 @@ static enum fw_upload_err cxl_fw_write(struct fw_upload *fwl, const u8 *data,
>  	 * sizeof(*transfer) is 128.  These constraints imply that @cur_size
>  	 * will always be 128b aligned.
>  	 */
> -	cur_size = min_t(size_t, size, mds->payload_size - sizeof(*transfer));
> +	cur_size = min_t(size_t, size, cxl_mbox->payload_size - sizeof(*transfer));
>  
>  	remaining = size - cur_size;
>  	size_in = struct_size(transfer, data, cur_size);
> @@ -1059,16 +1062,17 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, CXL);
>  static void sanitize_teardown_notifier(void *data)
>  {
>  	struct cxl_memdev_state *mds = data;
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	struct kernfs_node *state;
>  
>  	/*
>  	 * Prevent new irq triggered invocations of the workqueue and
>  	 * flush inflight invocations.
>  	 */
> -	mutex_lock(&mds->mbox_mutex);
> +	mutex_lock(&cxl_mbox->mbox_mutex);
>  	state = mds->security.sanitize_node;
>  	mds->security.sanitize_node = NULL;
> -	mutex_unlock(&mds->mbox_mutex);
> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>  
>  	cancel_delayed_work_sync(&mds->security.poll_dwork);
>  	sysfs_put(state);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index afb53d058d62..19609b708b09 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -8,6 +8,7 @@
>  #include <linux/rcuwait.h>
>  #include <linux/cxl-event.h>
>  #include <linux/node.h>
> +#include <linux/cxl/mailbox.h>
>  #include "cxl.h"
>  
>  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
> @@ -105,42 +106,6 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
>  	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);
>  }
>  
> -/**
> - * struct cxl_mbox_cmd - A command to be submitted to hardware.
> - * @opcode: (input) The command set and command submitted to hardware.
> - * @payload_in: (input) Pointer to the input payload.
> - * @payload_out: (output) Pointer to the output payload. Must be allocated by
> - *		 the caller.
> - * @size_in: (input) Number of bytes to load from @payload_in.
> - * @size_out: (input) Max number of bytes loaded into @payload_out.
> - *            (output) Number of bytes generated by the device. For fixed size
> - *            outputs commands this is always expected to be deterministic. For
> - *            variable sized output commands, it tells the exact number of bytes
> - *            written.
> - * @min_out: (input) internal command output payload size validation
> - * @poll_count: (input) Number of timeouts to attempt.
> - * @poll_interval_ms: (input) Time between mailbox background command polling
> - *                    interval timeouts.
> - * @return_code: (output) Error code returned from hardware.
> - *
> - * This is the primary mechanism used to send commands to the hardware.
> - * All the fields except @payload_* correspond exactly to the fields described in
> - * Command Register section of the CXL 2.0 8.2.8.4.5. @payload_in and
> - * @payload_out are written to, and read from the Command Payload Registers
> - * defined in CXL 2.0 8.2.8.4.8.
> - */
> -struct cxl_mbox_cmd {
> -	u16 opcode;
> -	void *payload_in;
> -	void *payload_out;
> -	size_t size_in;
> -	size_t size_out;
> -	size_t min_out;
> -	int poll_count;
> -	int poll_interval_ms;
> -	u16 return_code;
> -};
> -
>  /*
>   * Per CXL 3.0 Section 8.2.8.4.5.1
>   */
> @@ -438,6 +403,7 @@ struct cxl_dev_state {
>  	struct resource ram_res;
>  	u64 serial;
>  	enum cxl_devtype type;
> +	struct cxl_mailbox cxl_mbox;
>  };
>  
>  /**
> @@ -448,11 +414,8 @@ struct cxl_dev_state {
>   * the functionality related to that like Identify Memory Device and Get
>   * Partition Info
>   * @cxlds: Core driver state common across Type-2 and Type-3 devices
> - * @payload_size: Size of space for payload
> - *                (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
>   * @lsa_size: Size of Label Storage Area
>   *                (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
> - * @mbox_mutex: Mutex to synchronize mailbox access.
>   * @firmware_version: Firmware version for the memory device.
>   * @enabled_cmds: Hardware commands found enabled in CEL.
>   * @exclusive_cmds: Commands that are kernel-internal only
> @@ -470,17 +433,13 @@ struct cxl_dev_state {
>   * @poison: poison driver state info
>   * @security: security driver state info
>   * @fw: firmware upload / activation state
> - * @mbox_wait: RCU wait for mbox send completely
> - * @mbox_send: @dev specific transport for transmitting mailbox commands
>   *
>   * See CXL 3.0 8.2.9.8.2 Capacity Configuration and Label Storage for
>   * details on capacity parameters.
>   */
>  struct cxl_memdev_state {
>  	struct cxl_dev_state cxlds;
> -	size_t payload_size;
>  	size_t lsa_size;
> -	struct mutex mbox_mutex; /* Protects device mailbox and firmware */
>  	char firmware_version[0x10];
>  	DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
>  	DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
> @@ -500,10 +459,6 @@ struct cxl_memdev_state {
>  	struct cxl_poison_state poison;
>  	struct cxl_security_state security;
>  	struct cxl_fw_state fw;
> -
> -	struct rcuwait mbox_wait;
> -	int (*mbox_send)(struct cxl_memdev_state *mds,
> -			 struct cxl_mbox_cmd *cmd);
>  };
>  
>  static inline struct cxl_memdev_state *
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 4be35dc22202..faf6f5a49368 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -124,6 +124,7 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>  	u16 opcode;
>  	struct cxl_dev_id *dev_id = id;
>  	struct cxl_dev_state *cxlds = dev_id->cxlds;
> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>  	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>  
>  	if (!cxl_mbox_background_complete(cxlds))
> @@ -132,13 +133,13 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>  	reg = readq(cxlds->regs.mbox + CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
>  	opcode = FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
>  	if (opcode == CXL_MBOX_OP_SANITIZE) {
> -		mutex_lock(&mds->mbox_mutex);
> +		mutex_lock(&cxl_mbox->mbox_mutex);
>  		if (mds->security.sanitize_node)
>  			mod_delayed_work(system_wq, &mds->security.poll_dwork, 0);
> -		mutex_unlock(&mds->mbox_mutex);
> +		mutex_unlock(&cxl_mbox->mbox_mutex);
>  	} else {
>  		/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
> -		rcuwait_wake_up(&mds->mbox_wait);
> +		rcuwait_wake_up(&cxl_mbox->mbox_wait);
>  	}
>  
>  	return IRQ_HANDLED;
> @@ -152,8 +153,9 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
>  	struct cxl_memdev_state *mds =
>  		container_of(work, typeof(*mds), security.poll_dwork.work);
>  	struct cxl_dev_state *cxlds = &mds->cxlds;
> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>  
> -	mutex_lock(&mds->mbox_mutex);
> +	mutex_lock(&cxl_mbox->mbox_mutex);
>  	if (cxl_mbox_background_complete(cxlds)) {
>  		mds->security.poll_tmo_secs = 0;
>  		if (mds->security.sanitize_node)
> @@ -167,7 +169,7 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
>  		mds->security.poll_tmo_secs = min(15 * 60, timeout);
>  		schedule_delayed_work(&mds->security.poll_dwork, timeout * HZ);
>  	}
> -	mutex_unlock(&mds->mbox_mutex);
> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>  }
>  
>  /**
> @@ -192,17 +194,20 @@ static void cxl_mbox_sanitize_work(struct work_struct *work)
>   * not need to coordinate with each other. The driver only uses the primary
>   * mailbox.
>   */
> -static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
> +static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
>  				   struct cxl_mbox_cmd *mbox_cmd)
>  {
> -	struct cxl_dev_state *cxlds = &mds->cxlds;
> +	struct cxl_dev_state *cxlds = container_of(cxl_mbox,
> +						   struct cxl_dev_state,
> +						   cxl_mbox);
> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>  	void __iomem *payload = cxlds->regs.mbox + CXLDEV_MBOX_PAYLOAD_OFFSET;
>  	struct device *dev = cxlds->dev;
>  	u64 cmd_reg, status_reg;
>  	size_t out_len;
>  	int rc;
>  
> -	lockdep_assert_held(&mds->mbox_mutex);
> +	lockdep_assert_held(&cxl_mbox->mbox_mutex);
>  
>  	/*
>  	 * Here are the steps from 8.2.8.4 of the CXL 2.0 spec.
> @@ -315,10 +320,10 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
>  
>  		timeout = mbox_cmd->poll_interval_ms;
>  		for (i = 0; i < mbox_cmd->poll_count; i++) {
> -			if (rcuwait_wait_event_timeout(&mds->mbox_wait,
> -				       cxl_mbox_background_complete(cxlds),
> -				       TASK_UNINTERRUPTIBLE,
> -				       msecs_to_jiffies(timeout)) > 0)
> +			if (rcuwait_wait_event_timeout(&cxl_mbox->mbox_wait,
> +						       cxl_mbox_background_complete(cxlds),
> +						       TASK_UNINTERRUPTIBLE,
> +						       msecs_to_jiffies(timeout)) > 0)
>  				break;
>  		}
>  
> @@ -360,7 +365,7 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
>  		 */
>  		size_t n;
>  
> -		n = min3(mbox_cmd->size_out, mds->payload_size, out_len);
> +		n = min3(mbox_cmd->size_out, cxl_mbox->payload_size, out_len);
>  		memcpy_fromio(mbox_cmd->payload_out, payload, n);
>  		mbox_cmd->size_out = n;
>  	} else {
> @@ -370,14 +375,14 @@ static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
>  	return 0;
>  }
>  
> -static int cxl_pci_mbox_send(struct cxl_memdev_state *mds,
> +static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
>  			     struct cxl_mbox_cmd *cmd)
>  {
>  	int rc;
>  
> -	mutex_lock_io(&mds->mbox_mutex);
> -	rc = __cxl_pci_mbox_send_cmd(mds, cmd);
> -	mutex_unlock(&mds->mbox_mutex);
> +	mutex_lock_io(&cxl_mbox->mbox_mutex);
> +	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, cmd);
> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>  
>  	return rc;
>  }
> @@ -385,6 +390,7 @@ static int cxl_pci_mbox_send(struct cxl_memdev_state *mds,
>  static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
>  {
>  	struct cxl_dev_state *cxlds = &mds->cxlds;
> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>  	const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
>  	struct device *dev = cxlds->dev;
>  	unsigned long timeout;
> @@ -392,6 +398,7 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
>  	u64 md_status;
>  	u32 ctrl;
>  
> +	cxl_mbox->host = dev;
>  	timeout = jiffies + mbox_ready_timeout * HZ;
>  	do {
>  		md_status = readq(cxlds->regs.memdev + CXLMDEV_STATUS_OFFSET);
> @@ -417,8 +424,8 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
>  		return -ETIMEDOUT;
>  	}
>  
> -	mds->mbox_send = cxl_pci_mbox_send;
> -	mds->payload_size =
> +	cxl_mbox->mbox_send = cxl_pci_mbox_send;
> +	cxl_mbox->payload_size =
>  		1 << FIELD_GET(CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK, cap);
>  
>  	/*
> @@ -428,16 +435,16 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
>  	 * there's no point in going forward. If the size is too large, there's
>  	 * no harm is soft limiting it.
>  	 */
> -	mds->payload_size = min_t(size_t, mds->payload_size, SZ_1M);
> -	if (mds->payload_size < 256) {
> +	cxl_mbox->payload_size = min_t(size_t, cxl_mbox->payload_size, SZ_1M);
> +	if (cxl_mbox->payload_size < 256) {
>  		dev_err(dev, "Mailbox is too small (%zub)",
> -			mds->payload_size);
> +			cxl_mbox->payload_size);
>  		return -ENXIO;
>  	}
>  
> -	dev_dbg(dev, "Mailbox payload sized %zu", mds->payload_size);
> +	dev_dbg(dev, "Mailbox payload sized %zu", cxl_mbox->payload_size);
>  
> -	rcuwait_init(&mds->mbox_wait);
> +	rcuwait_init(&cxl_mbox->mbox_wait);
>  	INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mbox_sanitize_work);
>  
>  	/* background command interrupts are optional */
> @@ -578,9 +585,10 @@ static void free_event_buf(void *buf)
>   */
>  static int cxl_mem_alloc_event_buf(struct cxl_memdev_state *mds)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	struct cxl_get_event_payload *buf;
>  
> -	buf = kvmalloc(mds->payload_size, GFP_KERNEL);
> +	buf = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
>  	if (!buf)
>  		return -ENOMEM;
>  	mds->event.buf = buf;
> diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c
> index 4ef93da22335..3985ff9ce70e 100644
> --- a/drivers/cxl/pmem.c
> +++ b/drivers/cxl/pmem.c
> @@ -102,13 +102,15 @@ static int cxl_pmem_get_config_size(struct cxl_memdev_state *mds,
>  				    struct nd_cmd_get_config_size *cmd,
>  				    unsigned int buf_len)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> +
>  	if (sizeof(*cmd) > buf_len)
>  		return -EINVAL;
>  
>  	*cmd = (struct nd_cmd_get_config_size){
>  		.config_size = mds->lsa_size,
>  		.max_xfer =
> -			mds->payload_size - sizeof(struct cxl_mbox_set_lsa),
> +			cxl_mbox->payload_size - sizeof(struct cxl_mbox_set_lsa),
>  	};
>  
>  	return 0;
> diff --git a/include/linux/cxl/mailbox.h b/include/linux/cxl/mailbox.h
> new file mode 100644
> index 000000000000..654df6175828
> --- /dev/null
> +++ b/include/linux/cxl/mailbox.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright(c) 2024 Intel Corporation. */
> +#ifndef __CXL_MBOX_H__
> +#define __CXL_MBOX_H__
> +
> +#include <linux/auxiliary_bus.h>
> +
> +/**
> + * struct cxl_mbox_cmd - A command to be submitted to hardware.
> + * @opcode: (input) The command set and command submitted to hardware.
> + * @payload_in: (input) Pointer to the input payload.
> + * @payload_out: (output) Pointer to the output payload. Must be allocated by
> + *		 the caller.
> + * @size_in: (input) Number of bytes to load from @payload_in.
> + * @size_out: (input) Max number of bytes loaded into @payload_out.
> + *            (output) Number of bytes generated by the device. For fixed size
> + *            outputs commands this is always expected to be deterministic. For
> + *            variable sized output commands, it tells the exact number of bytes
> + *            written.
> + * @min_out: (input) internal command output payload size validation
> + * @poll_count: (input) Number of timeouts to attempt.
> + * @poll_interval_ms: (input) Time between mailbox background command polling
> + *                    interval timeouts.
> + * @return_code: (output) Error code returned from hardware.
> + *
> + * This is the primary mechanism used to send commands to the hardware.
> + * All the fields except @payload_* correspond exactly to the fields described in
> + * Command Register section of the CXL 2.0 8.2.8.4.5. @payload_in and
> + * @payload_out are written to, and read from the Command Payload Registers
> + * defined in CXL 2.0 8.2.8.4.8.
> + */
> +struct cxl_mbox_cmd {
> +	u16 opcode;
> +	void *payload_in;
> +	void *payload_out;
> +	size_t size_in;
> +	size_t size_out;
> +	size_t min_out;
> +	int poll_count;
> +	int poll_interval_ms;
> +	u16 return_code;
> +};
> +
> +/**
> + * struct cxl_mailbox - context for CXL mailbox operations
> + * @host: device that hosts the mailbox
> + * @adev: auxiliary device for fw-ctl
> + * @payload_size: Size of space for payload
> + *                (CXL 3.1 8.2.8.4.3 Mailbox Capabilities Register)
> + * @mbox_mutex: mutex protects device mailbox and firmware
> + * @mbox_wait: rcuwait for mailbox
> + * @mbox_send: @dev specific transport for transmitting mailbox commands
> + */
> +struct cxl_mailbox {
> +	struct device *host;
> +	struct auxiliary_device adev; /* For fw-ctl */
> +	size_t payload_size;
> +	struct mutex mbox_mutex; /* lock to protect mailbox context */
> +	struct rcuwait mbox_wait;
> +	int (*mbox_send)(struct cxl_mailbox *cxl_mbox, struct cxl_mbox_cmd *cmd);
> +};
> +
> +#endif
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index 129f179b0ac5..1829b626bb40 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -8,6 +8,7 @@
>  #include <linux/delay.h>
>  #include <linux/sizes.h>
>  #include <linux/bits.h>
> +#include <linux/cxl/mailbox.h>
>  #include <asm/unaligned.h>
>  #include <crypto/sha2.h>
>  #include <cxlmem.h>
> @@ -534,6 +535,7 @@ static int mock_gsl(struct cxl_mbox_cmd *cmd)
>  
>  static int mock_get_log(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd)
>  {
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	struct cxl_mbox_get_log *gl = cmd->payload_in;
>  	u32 offset = le32_to_cpu(gl->offset);
>  	u32 length = le32_to_cpu(gl->length);
> @@ -542,7 +544,7 @@ static int mock_get_log(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd)
>  
>  	if (cmd->size_in < sizeof(*gl))
>  		return -EINVAL;
> -	if (length > mds->payload_size)
> +	if (length > cxl_mbox->payload_size)
>  		return -EINVAL;
>  	if (offset + length > sizeof(mock_cel))
>  		return -EINVAL;
> @@ -617,12 +619,13 @@ void cxl_mockmem_sanitize_work(struct work_struct *work)
>  {
>  	struct cxl_memdev_state *mds =
>  		container_of(work, typeof(*mds), security.poll_dwork.work);
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  
> -	mutex_lock(&mds->mbox_mutex);
> +	mutex_lock(&cxl_mbox->mbox_mutex);
>  	if (mds->security.sanitize_node)
>  		sysfs_notify_dirent(mds->security.sanitize_node);
>  	mds->security.sanitize_active = false;
> -	mutex_unlock(&mds->mbox_mutex);
> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>  
>  	dev_dbg(mds->cxlds.dev, "sanitize complete\n");
>  }
> @@ -631,6 +634,7 @@ static int mock_sanitize(struct cxl_mockmem_data *mdata,
>  			 struct cxl_mbox_cmd *cmd)
>  {
>  	struct cxl_memdev_state *mds = mdata->mds;
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>  	int rc = 0;
>  
>  	if (cmd->size_in != 0)
> @@ -648,14 +652,14 @@ static int mock_sanitize(struct cxl_mockmem_data *mdata,
>  		return -ENXIO;
>  	}
>  
> -	mutex_lock(&mds->mbox_mutex);
> +	mutex_lock(&cxl_mbox->mbox_mutex);
>  	if (schedule_delayed_work(&mds->security.poll_dwork,
>  				  msecs_to_jiffies(mdata->sanitize_timeout))) {
>  		mds->security.sanitize_active = true;
>  		dev_dbg(mds->cxlds.dev, "sanitize issued\n");
>  	} else
>  		rc = -EBUSY;
> -	mutex_unlock(&mds->mbox_mutex);
> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>  
>  	return rc;
>  }
> @@ -1333,10 +1337,13 @@ static int mock_activate_fw(struct cxl_mockmem_data *mdata,
>  	return -EINVAL;
>  }
>  
> -static int cxl_mock_mbox_send(struct cxl_memdev_state *mds,
> +static int cxl_mock_mbox_send(struct cxl_mailbox *cxl_mbox,
>  			      struct cxl_mbox_cmd *cmd)
>  {
> -	struct cxl_dev_state *cxlds = &mds->cxlds;
> +	struct cxl_dev_state *cxlds = container_of(cxl_mbox,
> +						   struct cxl_dev_state,
> +						   cxl_mbox);
> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>  	struct device *dev = cxlds->dev;
>  	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
>  	int rc = -EIO;
> @@ -1460,6 +1467,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>  	struct cxl_memdev_state *mds;
>  	struct cxl_dev_state *cxlds;
>  	struct cxl_mockmem_data *mdata;
> +	struct cxl_mailbox *cxl_mbox;
>  	int rc;
>  
>  	mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL);
> @@ -1487,9 +1495,10 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>  	if (IS_ERR(mds))
>  		return PTR_ERR(mds);
>  
> +	cxl_mbox = &mds->cxlds.cxl_mbox;
>  	mdata->mds = mds;
> -	mds->mbox_send = cxl_mock_mbox_send;
> -	mds->payload_size = SZ_4K;
> +	cxl_mbox->mbox_send = cxl_mock_mbox_send;
> +	cxl_mbox->payload_size = SZ_4K;
>  	mds->event.buf = (struct cxl_get_event_payload *) mdata->event_buf;
>  	INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mockmem_sanitize_work);
>  


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 04/17] cxl: Move mailbox related bits to the same context
  2024-09-11 17:20   ` Dave Jiang
@ 2024-09-12  9:42     ` Shiju Jose
  0 siblings, 0 replies; 39+ messages in thread
From: Shiju Jose @ 2024-09-12  9:42 UTC (permalink / raw)
  To: Dave Jiang, linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	Jonathan Cameron, alison.schofield, vishal.l.verma, ira.weiny,
	david, Vilas.Sridharan, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	jgroves, vsalve, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm



>-----Original Message-----
>From: Dave Jiang <dave.jiang@intel.com>
>Sent: 11 September 2024 18:21
>To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org; linux-
>cxl@vger.kernel.org; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-
>kernel@vger.kernel.org
>Cc: bp@alien8.de; tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>vishal.l.verma@intel.com; ira.weiny@intel.com; david@redhat.com;
>Vilas.Sridharan@amd.com; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>rientjes@google.com; jiaqiyan@google.com; Jon.Grimm@amd.com;
>dave.hansen@linux.intel.com; naoya.horiguchi@nec.com;
>james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; jgroves@micron.com;
>vsalve@micron.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [PATCH v12 04/17] cxl: Move mailbox related bits to the same
>context
>
>
>
>On 9/11/24 2:04 AM, shiju.jose@huawei.com wrote:
>> From: Dave Jiang <dave.jiang@intel.com>
>>
>> Create a new 'struct cxl_mailbox' and move all mailbox related bits to
>> it. This allows isolation of all CXL mailbox data in order to export
>> some of the calls to external caller (fwctl) and avoid exporting of
>> CXL driver specific bits such has device states.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>
>Hi Shiju,
>Just FYI there's this series [1] that may require some rebasing for the cxl mailbox
>bits. I do plan to pull those changes in for 6.12 merge window.
>
>And I should be able to rebase the fwctl code and post v2 after plumbers
>pending the discussions there. and it should reflect the same code WRT to
>features as what you are using here in the RAS series.
>
>[1]: https://lore.kernel.org/linux-cxl/20240905223711.1990186-1-
>dave.jiang@intel.com/

Hi Dave,

Thanks for letting me know.
 
>
>DJ
>
Regards,
Shiju

>> ---
>>  MAINTAINERS                  |  1 +
>>  drivers/cxl/core/mbox.c      | 48 ++++++++++++++++++---------
>>  drivers/cxl/core/memdev.c    | 18 +++++++----
>>  drivers/cxl/cxlmem.h         | 49 ++--------------------------
>>  drivers/cxl/pci.c            | 58 +++++++++++++++++++--------------
>>  drivers/cxl/pmem.c           |  4 ++-
>>  include/linux/cxl/mailbox.h  | 63
>> ++++++++++++++++++++++++++++++++++++
>>  tools/testing/cxl/test/mem.c | 27 ++++++++++------
>>  8 files changed, 163 insertions(+), 105 deletions(-)  create mode
>> 100644 include/linux/cxl/mailbox.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS index
>> 878dcd23b331..227c2b214f00 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -5619,6 +5619,7 @@ F:	Documentation/driver-api/cxl
>>  F:	drivers/cxl/
>>  F:	include/linux/einj-cxl.h
>>  F:	include/linux/cxl-event.h
>> +F:	include/linux/cxl/
>>  F:	include/uapi/linux/cxl_mem.h
>>  F:	tools/testing/cxl/
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index
>> e5cdeafdf76e..216937ef9e07 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -244,16 +244,17 @@ static const char *cxl_mem_opcode_to_name(u16
>> opcode)  int cxl_internal_send_cmd(struct cxl_memdev_state *mds,
>>  			  struct cxl_mbox_cmd *mbox_cmd)
>>  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	size_t out_size, min_out;
>>  	int rc;
>>
>> -	if (mbox_cmd->size_in > mds->payload_size ||
>> -	    mbox_cmd->size_out > mds->payload_size)
>> +	if (mbox_cmd->size_in > cxl_mbox->payload_size ||
>> +	    mbox_cmd->size_out > cxl_mbox->payload_size)
>>  		return -E2BIG;
>>
>>  	out_size = mbox_cmd->size_out;
>>  	min_out = mbox_cmd->min_out;
>> -	rc = mds->mbox_send(mds, mbox_cmd);
>> +	rc = cxl_mbox->mbox_send(cxl_mbox, mbox_cmd);
>>  	/*
>>  	 * EIO is reserved for a payload size mismatch and mbox_send()
>>  	 * may not return this error.
>> @@ -353,6 +354,7 @@ static int cxl_mbox_cmd_ctor(struct cxl_mbox_cmd
>*mbox,
>>  			     struct cxl_memdev_state *mds, u16 opcode,
>>  			     size_t in_size, size_t out_size, u64 in_payload)  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	*mbox = (struct cxl_mbox_cmd) {
>>  		.opcode = opcode,
>>  		.size_in = in_size,
>> @@ -374,7 +376,7 @@ static int cxl_mbox_cmd_ctor(struct cxl_mbox_cmd
>> *mbox,
>>
>>  	/* Prepare to handle a full payload for variable sized output */
>>  	if (out_size == CXL_VARIABLE_PAYLOAD)
>> -		mbox->size_out = mds->payload_size;
>> +		mbox->size_out = cxl_mbox->payload_size;
>>  	else
>>  		mbox->size_out = out_size;
>>
>> @@ -398,6 +400,8 @@ static int cxl_to_mem_cmd_raw(struct
>cxl_mem_command *mem_cmd,
>>  			      const struct cxl_send_command *send_cmd,
>>  			      struct cxl_memdev_state *mds)  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>> +
>>  	if (send_cmd->raw.rsvd)
>>  		return -EINVAL;
>>
>> @@ -406,7 +410,7 @@ static int cxl_to_mem_cmd_raw(struct
>cxl_mem_command *mem_cmd,
>>  	 * gets passed along without further checking, so it must be
>>  	 * validated here.
>>  	 */
>> -	if (send_cmd->out.size > mds->payload_size)
>> +	if (send_cmd->out.size > cxl_mbox->payload_size)
>>  		return -EINVAL;
>>
>>  	if (!cxl_mem_raw_command_allowed(send_cmd->raw.opcode))
>> @@ -494,6 +498,7 @@ static int cxl_validate_cmd_from_user(struct
>cxl_mbox_cmd *mbox_cmd,
>>  				      struct cxl_memdev_state *mds,
>>  				      const struct cxl_send_command
>*send_cmd)  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	struct cxl_mem_command mem_cmd;
>>  	int rc;
>>
>> @@ -505,7 +510,7 @@ static int cxl_validate_cmd_from_user(struct
>cxl_mbox_cmd *mbox_cmd,
>>  	 * supports, but output can be arbitrarily large (simply write out as
>>  	 * much data as the hardware provides).
>>  	 */
>> -	if (send_cmd->in.size > mds->payload_size)
>> +	if (send_cmd->in.size > cxl_mbox->payload_size)
>>  		return -EINVAL;
>>
>>  	/* Sanitize and construct a cxl_mem_command */ @@ -591,6 +596,7
>@@
>> static int handle_mailbox_cmd_from_user(struct cxl_memdev_state *mds,
>>  					u64 out_payload, s32 *size_out,
>>  					u32 *retval)
>>  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	struct device *dev = mds->cxlds.dev;
>>  	int rc;
>>
>> @@ -601,7 +607,7 @@ static int handle_mailbox_cmd_from_user(struct
>cxl_memdev_state *mds,
>>  		cxl_mem_opcode_to_name(mbox_cmd->opcode),
>>  		mbox_cmd->opcode, mbox_cmd->size_in);
>>
>> -	rc = mds->mbox_send(mds, mbox_cmd);
>> +	rc = cxl_mbox->mbox_send(cxl_mbox, mbox_cmd);
>>  	if (rc)
>>  		goto out;
>>
>> @@ -659,11 +665,12 @@ int cxl_send_cmd(struct cxl_memdev *cxlmd,
>> struct cxl_send_command __user *s)  static int cxl_xfer_log(struct
>cxl_memdev_state *mds, uuid_t *uuid,
>>  			u32 *size, u8 *out)
>>  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	u32 remaining = *size;
>>  	u32 offset = 0;
>>
>>  	while (remaining) {
>> -		u32 xfer_size = min_t(u32, remaining, mds->payload_size);
>> +		u32 xfer_size = min_t(u32, remaining, cxl_mbox->payload_size);
>>  		struct cxl_mbox_cmd mbox_cmd;
>>  		struct cxl_mbox_get_log log;
>>  		int rc;
>> @@ -752,17 +759,18 @@ static void cxl_walk_cel(struct cxl_memdev_state
>> *mds, size_t size, u8 *cel)
>>
>>  static struct cxl_mbox_get_supported_logs *cxl_get_gsl(struct
>> cxl_memdev_state *mds)  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	struct cxl_mbox_get_supported_logs *ret;
>>  	struct cxl_mbox_cmd mbox_cmd;
>>  	int rc;
>>
>> -	ret = kvmalloc(mds->payload_size, GFP_KERNEL);
>> +	ret = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
>>  	if (!ret)
>>  		return ERR_PTR(-ENOMEM);
>>
>>  	mbox_cmd = (struct cxl_mbox_cmd) {
>>  		.opcode = CXL_MBOX_OP_GET_SUPPORTED_LOGS,
>> -		.size_out = mds->payload_size,
>> +		.size_out = cxl_mbox->payload_size,
>>  		.payload_out = ret,
>>  		/* At least the record number field must be valid */
>>  		.min_out = 2,
>> @@ -910,6 +918,7 @@ static int cxl_clear_event_record(struct
>cxl_memdev_state *mds,
>>  				  enum cxl_event_log_type log,
>>  				  struct cxl_get_event_payload *get_pl)  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	struct cxl_mbox_clear_event_payload *payload;
>>  	u16 total = le16_to_cpu(get_pl->record_count);
>>  	u8 max_handles = CXL_CLEAR_EVENT_MAX_HANDLES; @@ -920,8
>+929,8 @@
>> static int cxl_clear_event_record(struct cxl_memdev_state *mds,
>>  	int i;
>>
>>  	/* Payload size may limit the max handles */
>> -	if (pl_size > mds->payload_size) {
>> -		max_handles = (mds->payload_size - sizeof(*payload)) /
>> +	if (pl_size > cxl_mbox->payload_size) {
>> +		max_handles = (cxl_mbox->payload_size - sizeof(*payload)) /
>>  			      sizeof(__le16);
>>  		pl_size = struct_size(payload, handles, max_handles);
>>  	}
>> @@ -979,6 +988,7 @@ static int cxl_clear_event_record(struct
>> cxl_memdev_state *mds,  static void cxl_mem_get_records_log(struct
>cxl_memdev_state *mds,
>>  				    enum cxl_event_log_type type)  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
>>  	struct device *dev = mds->cxlds.dev;
>>  	struct cxl_get_event_payload *payload; @@ -995,7 +1005,7 @@ static
>> void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>>  			.payload_in = &log_type,
>>  			.size_in = sizeof(log_type),
>>  			.payload_out = payload,
>> -			.size_out = mds->payload_size,
>> +			.size_out = cxl_mbox->payload_size,
>>  			.min_out = struct_size(payload, records, 0),
>>  		};
>>
>> @@ -1328,6 +1338,7 @@ int cxl_mem_get_poison(struct cxl_memdev
>*cxlmd, u64 offset, u64 len,
>>  		       struct cxl_region *cxlr)
>>  {
>>  	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	struct cxl_mbox_poison_out *po;
>>  	struct cxl_mbox_poison_in pi;
>>  	int nr_records = 0;
>> @@ -1346,7 +1357,7 @@ int cxl_mem_get_poison(struct cxl_memdev
>*cxlmd, u64 offset, u64 len,
>>  			.opcode = CXL_MBOX_OP_GET_POISON,
>>  			.size_in = sizeof(pi),
>>  			.payload_in = &pi,
>> -			.size_out = mds->payload_size,
>> +			.size_out = cxl_mbox->payload_size,
>>  			.payload_out = po,
>>  			.min_out = struct_size(po, record, 0),
>>  		};
>> @@ -1382,7 +1393,9 @@ static void free_poison_buf(void *buf)
>>  /* Get Poison List output buffer is protected by mds->poison.lock */
>> static int cxl_poison_alloc_buf(struct cxl_memdev_state *mds)  {
>> -	mds->poison.list_out = kvmalloc(mds->payload_size, GFP_KERNEL);
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>> +
>> +	mds->poison.list_out = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
>>  	if (!mds->poison.list_out)
>>  		return -ENOMEM;
>>
>> @@ -1411,6 +1424,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init,
>> CXL);  struct cxl_memdev_state *cxl_memdev_state_create(struct device
>> *dev)  {
>>  	struct cxl_memdev_state *mds;
>> +	struct cxl_mailbox *cxl_mbox;
>>
>>  	mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
>>  	if (!mds) {
>> @@ -1418,7 +1432,9 @@ struct cxl_memdev_state
>*cxl_memdev_state_create(struct device *dev)
>>  		return ERR_PTR(-ENOMEM);
>>  	}
>>
>> -	mutex_init(&mds->mbox_mutex);
>> +	cxl_mbox = &mds->cxlds.cxl_mbox;
>> +	mutex_init(&cxl_mbox->mbox_mutex);
>> +
>>  	mutex_init(&mds->event.log_lock);
>>  	mds->cxlds.dev = dev;
>>  	mds->cxlds.reg_map.host = dev;
>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>> index 0277726afd04..05bb84cb1274 100644
>> --- a/drivers/cxl/core/memdev.c
>> +++ b/drivers/cxl/core/memdev.c
>> @@ -58,7 +58,7 @@ static ssize_t payload_max_show(struct device *dev,
>>
>>  	if (!mds)
>>  		return sysfs_emit(buf, "\n");
>> -	return sysfs_emit(buf, "%zu\n", mds->payload_size);
>> +	return sysfs_emit(buf, "%zu\n", cxlds->cxl_mbox.payload_size);
>>  }
>>  static DEVICE_ATTR_RO(payload_max);
>>
>> @@ -124,15 +124,16 @@ static ssize_t security_state_show(struct device
>> *dev,  {
>>  	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>>  	struct cxl_dev_state *cxlds = cxlmd->cxlds;
>> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>>  	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>>  	unsigned long state = mds->security.state;
>>  	int rc = 0;
>>
>>  	/* sync with latest submission state */
>> -	mutex_lock(&mds->mbox_mutex);
>> +	mutex_lock(&cxl_mbox->mbox_mutex);
>>  	if (mds->security.sanitize_active)
>>  		rc = sysfs_emit(buf, "sanitize\n");
>> -	mutex_unlock(&mds->mbox_mutex);
>> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>>  	if (rc)
>>  		return rc;
>>
>> @@ -829,12 +830,13 @@ static enum fw_upload_err cxl_fw_prepare(struct
>> fw_upload *fwl, const u8 *data,  {
>>  	struct cxl_memdev_state *mds = fwl->dd_handle;
>>  	struct cxl_mbox_transfer_fw *transfer;
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>
>>  	if (!size)
>>  		return FW_UPLOAD_ERR_INVALID_SIZE;
>>
>>  	mds->fw.oneshot = struct_size(transfer, data, size) <
>> -			    mds->payload_size;
>> +			    cxl_mbox->payload_size;
>>
>>  	if (cxl_mem_get_fw_info(mds))
>>  		return FW_UPLOAD_ERR_HW_ERROR;
>> @@ -854,6 +856,7 @@ static enum fw_upload_err cxl_fw_write(struct
>> fw_upload *fwl, const u8 *data,  {
>>  	struct cxl_memdev_state *mds = fwl->dd_handle;
>>  	struct cxl_dev_state *cxlds = &mds->cxlds;
>> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>>  	struct cxl_memdev *cxlmd = cxlds->cxlmd;
>>  	struct cxl_mbox_transfer_fw *transfer;
>>  	struct cxl_mbox_cmd mbox_cmd;
>> @@ -877,7 +880,7 @@ static enum fw_upload_err cxl_fw_write(struct
>fw_upload *fwl, const u8 *data,
>>  	 * sizeof(*transfer) is 128.  These constraints imply that @cur_size
>>  	 * will always be 128b aligned.
>>  	 */
>> -	cur_size = min_t(size_t, size, mds->payload_size - sizeof(*transfer));
>> +	cur_size = min_t(size_t, size, cxl_mbox->payload_size -
>> +sizeof(*transfer));
>>
>>  	remaining = size - cur_size;
>>  	size_in = struct_size(transfer, data, cur_size); @@ -1059,16
>> +1062,17 @@ EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, CXL);
>static
>> void sanitize_teardown_notifier(void *data)  {
>>  	struct cxl_memdev_state *mds = data;
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	struct kernfs_node *state;
>>
>>  	/*
>>  	 * Prevent new irq triggered invocations of the workqueue and
>>  	 * flush inflight invocations.
>>  	 */
>> -	mutex_lock(&mds->mbox_mutex);
>> +	mutex_lock(&cxl_mbox->mbox_mutex);
>>  	state = mds->security.sanitize_node;
>>  	mds->security.sanitize_node = NULL;
>> -	mutex_unlock(&mds->mbox_mutex);
>> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>>
>>  	cancel_delayed_work_sync(&mds->security.poll_dwork);
>>  	sysfs_put(state);
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index
>> afb53d058d62..19609b708b09 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -8,6 +8,7 @@
>>  #include <linux/rcuwait.h>
>>  #include <linux/cxl-event.h>
>>  #include <linux/node.h>
>> +#include <linux/cxl/mailbox.h>
>>  #include "cxl.h"
>>
>>  /* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */ @@ -105,42
>> +106,6 @@ static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
>>  	return xa_load(&port->endpoints, (unsigned long)&cxlmd->dev);  }
>>
>> -/**
>> - * struct cxl_mbox_cmd - A command to be submitted to hardware.
>> - * @opcode: (input) The command set and command submitted to hardware.
>> - * @payload_in: (input) Pointer to the input payload.
>> - * @payload_out: (output) Pointer to the output payload. Must be allocated by
>> - *		 the caller.
>> - * @size_in: (input) Number of bytes to load from @payload_in.
>> - * @size_out: (input) Max number of bytes loaded into @payload_out.
>> - *            (output) Number of bytes generated by the device. For fixed size
>> - *            outputs commands this is always expected to be deterministic. For
>> - *            variable sized output commands, it tells the exact number of bytes
>> - *            written.
>> - * @min_out: (input) internal command output payload size validation
>> - * @poll_count: (input) Number of timeouts to attempt.
>> - * @poll_interval_ms: (input) Time between mailbox background command
>polling
>> - *                    interval timeouts.
>> - * @return_code: (output) Error code returned from hardware.
>> - *
>> - * This is the primary mechanism used to send commands to the hardware.
>> - * All the fields except @payload_* correspond exactly to the fields
>> described in
>> - * Command Register section of the CXL 2.0 8.2.8.4.5. @payload_in and
>> - * @payload_out are written to, and read from the Command Payload
>> Registers
>> - * defined in CXL 2.0 8.2.8.4.8.
>> - */
>> -struct cxl_mbox_cmd {
>> -	u16 opcode;
>> -	void *payload_in;
>> -	void *payload_out;
>> -	size_t size_in;
>> -	size_t size_out;
>> -	size_t min_out;
>> -	int poll_count;
>> -	int poll_interval_ms;
>> -	u16 return_code;
>> -};
>> -
>>  /*
>>   * Per CXL 3.0 Section 8.2.8.4.5.1
>>   */
>> @@ -438,6 +403,7 @@ struct cxl_dev_state {
>>  	struct resource ram_res;
>>  	u64 serial;
>>  	enum cxl_devtype type;
>> +	struct cxl_mailbox cxl_mbox;
>>  };
>>
>>  /**
>> @@ -448,11 +414,8 @@ struct cxl_dev_state {
>>   * the functionality related to that like Identify Memory Device and Get
>>   * Partition Info
>>   * @cxlds: Core driver state common across Type-2 and Type-3 devices
>> - * @payload_size: Size of space for payload
>> - *                (CXL 2.0 8.2.8.4.3 Mailbox Capabilities Register)
>>   * @lsa_size: Size of Label Storage Area
>>   *                (CXL 2.0 8.2.9.5.1.1 Identify Memory Device)
>> - * @mbox_mutex: Mutex to synchronize mailbox access.
>>   * @firmware_version: Firmware version for the memory device.
>>   * @enabled_cmds: Hardware commands found enabled in CEL.
>>   * @exclusive_cmds: Commands that are kernel-internal only @@ -470,17
>> +433,13 @@ struct cxl_dev_state {
>>   * @poison: poison driver state info
>>   * @security: security driver state info
>>   * @fw: firmware upload / activation state
>> - * @mbox_wait: RCU wait for mbox send completely
>> - * @mbox_send: @dev specific transport for transmitting mailbox commands
>>   *
>>   * See CXL 3.0 8.2.9.8.2 Capacity Configuration and Label Storage for
>>   * details on capacity parameters.
>>   */
>>  struct cxl_memdev_state {
>>  	struct cxl_dev_state cxlds;
>> -	size_t payload_size;
>>  	size_t lsa_size;
>> -	struct mutex mbox_mutex; /* Protects device mailbox and firmware */
>>  	char firmware_version[0x10];
>>  	DECLARE_BITMAP(enabled_cmds, CXL_MEM_COMMAND_ID_MAX);
>>  	DECLARE_BITMAP(exclusive_cmds, CXL_MEM_COMMAND_ID_MAX);
>@@ -500,10
>> +459,6 @@ struct cxl_memdev_state {
>>  	struct cxl_poison_state poison;
>>  	struct cxl_security_state security;
>>  	struct cxl_fw_state fw;
>> -
>> -	struct rcuwait mbox_wait;
>> -	int (*mbox_send)(struct cxl_memdev_state *mds,
>> -			 struct cxl_mbox_cmd *cmd);
>>  };
>>
>>  static inline struct cxl_memdev_state * diff --git
>> a/drivers/cxl/pci.c b/drivers/cxl/pci.c index
>> 4be35dc22202..faf6f5a49368 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -124,6 +124,7 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>>  	u16 opcode;
>>  	struct cxl_dev_id *dev_id = id;
>>  	struct cxl_dev_state *cxlds = dev_id->cxlds;
>> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>>  	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>>
>>  	if (!cxl_mbox_background_complete(cxlds))
>> @@ -132,13 +133,13 @@ static irqreturn_t cxl_pci_mbox_irq(int irq, void *id)
>>  	reg = readq(cxlds->regs.mbox +
>CXLDEV_MBOX_BG_CMD_STATUS_OFFSET);
>>  	opcode =
>FIELD_GET(CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK, reg);
>>  	if (opcode == CXL_MBOX_OP_SANITIZE) {
>> -		mutex_lock(&mds->mbox_mutex);
>> +		mutex_lock(&cxl_mbox->mbox_mutex);
>>  		if (mds->security.sanitize_node)
>>  			mod_delayed_work(system_wq, &mds-
>>security.poll_dwork, 0);
>> -		mutex_unlock(&mds->mbox_mutex);
>> +		mutex_unlock(&cxl_mbox->mbox_mutex);
>>  	} else {
>>  		/* short-circuit the wait in __cxl_pci_mbox_send_cmd() */
>> -		rcuwait_wake_up(&mds->mbox_wait);
>> +		rcuwait_wake_up(&cxl_mbox->mbox_wait);
>>  	}
>>
>>  	return IRQ_HANDLED;
>> @@ -152,8 +153,9 @@ static void cxl_mbox_sanitize_work(struct work_struct
>*work)
>>  	struct cxl_memdev_state *mds =
>>  		container_of(work, typeof(*mds), security.poll_dwork.work);
>>  	struct cxl_dev_state *cxlds = &mds->cxlds;
>> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>>
>> -	mutex_lock(&mds->mbox_mutex);
>> +	mutex_lock(&cxl_mbox->mbox_mutex);
>>  	if (cxl_mbox_background_complete(cxlds)) {
>>  		mds->security.poll_tmo_secs = 0;
>>  		if (mds->security.sanitize_node)
>> @@ -167,7 +169,7 @@ static void cxl_mbox_sanitize_work(struct work_struct
>*work)
>>  		mds->security.poll_tmo_secs = min(15 * 60, timeout);
>>  		schedule_delayed_work(&mds->security.poll_dwork, timeout *
>HZ);
>>  	}
>> -	mutex_unlock(&mds->mbox_mutex);
>> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>>  }
>>
>>  /**
>> @@ -192,17 +194,20 @@ static void cxl_mbox_sanitize_work(struct
>work_struct *work)
>>   * not need to coordinate with each other. The driver only uses the primary
>>   * mailbox.
>>   */
>> -static int __cxl_pci_mbox_send_cmd(struct cxl_memdev_state *mds,
>> +static int __cxl_pci_mbox_send_cmd(struct cxl_mailbox *cxl_mbox,
>>  				   struct cxl_mbox_cmd *mbox_cmd)  {
>> -	struct cxl_dev_state *cxlds = &mds->cxlds;
>> +	struct cxl_dev_state *cxlds = container_of(cxl_mbox,
>> +						   struct cxl_dev_state,
>> +						   cxl_mbox);
>> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>>  	void __iomem *payload = cxlds->regs.mbox +
>CXLDEV_MBOX_PAYLOAD_OFFSET;
>>  	struct device *dev = cxlds->dev;
>>  	u64 cmd_reg, status_reg;
>>  	size_t out_len;
>>  	int rc;
>>
>> -	lockdep_assert_held(&mds->mbox_mutex);
>> +	lockdep_assert_held(&cxl_mbox->mbox_mutex);
>>
>>  	/*
>>  	 * Here are the steps from 8.2.8.4 of the CXL 2.0 spec.
>> @@ -315,10 +320,10 @@ static int __cxl_pci_mbox_send_cmd(struct
>> cxl_memdev_state *mds,
>>
>>  		timeout = mbox_cmd->poll_interval_ms;
>>  		for (i = 0; i < mbox_cmd->poll_count; i++) {
>> -			if (rcuwait_wait_event_timeout(&mds->mbox_wait,
>> -				       cxl_mbox_background_complete(cxlds),
>> -				       TASK_UNINTERRUPTIBLE,
>> -				       msecs_to_jiffies(timeout)) > 0)
>> +			if (rcuwait_wait_event_timeout(&cxl_mbox-
>>mbox_wait,
>> +
>cxl_mbox_background_complete(cxlds),
>> +						       TASK_UNINTERRUPTIBLE,
>> +						       msecs_to_jiffies(timeout))
>> 0)
>>  				break;
>>  		}
>>
>> @@ -360,7 +365,7 @@ static int __cxl_pci_mbox_send_cmd(struct
>cxl_memdev_state *mds,
>>  		 */
>>  		size_t n;
>>
>> -		n = min3(mbox_cmd->size_out, mds->payload_size, out_len);
>> +		n = min3(mbox_cmd->size_out, cxl_mbox->payload_size,
>out_len);
>>  		memcpy_fromio(mbox_cmd->payload_out, payload, n);
>>  		mbox_cmd->size_out = n;
>>  	} else {
>> @@ -370,14 +375,14 @@ static int __cxl_pci_mbox_send_cmd(struct
>cxl_memdev_state *mds,
>>  	return 0;
>>  }
>>
>> -static int cxl_pci_mbox_send(struct cxl_memdev_state *mds,
>> +static int cxl_pci_mbox_send(struct cxl_mailbox *cxl_mbox,
>>  			     struct cxl_mbox_cmd *cmd)
>>  {
>>  	int rc;
>>
>> -	mutex_lock_io(&mds->mbox_mutex);
>> -	rc = __cxl_pci_mbox_send_cmd(mds, cmd);
>> -	mutex_unlock(&mds->mbox_mutex);
>> +	mutex_lock_io(&cxl_mbox->mbox_mutex);
>> +	rc = __cxl_pci_mbox_send_cmd(cxl_mbox, cmd);
>> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>>
>>  	return rc;
>>  }
>> @@ -385,6 +390,7 @@ static int cxl_pci_mbox_send(struct
>> cxl_memdev_state *mds,  static int cxl_pci_setup_mailbox(struct
>> cxl_memdev_state *mds, bool irq_avail)  {
>>  	struct cxl_dev_state *cxlds = &mds->cxlds;
>> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>>  	const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
>>  	struct device *dev = cxlds->dev;
>>  	unsigned long timeout;
>> @@ -392,6 +398,7 @@ static int cxl_pci_setup_mailbox(struct
>cxl_memdev_state *mds, bool irq_avail)
>>  	u64 md_status;
>>  	u32 ctrl;
>>
>> +	cxl_mbox->host = dev;
>>  	timeout = jiffies + mbox_ready_timeout * HZ;
>>  	do {
>>  		md_status = readq(cxlds->regs.memdev +
>CXLMDEV_STATUS_OFFSET); @@
>> -417,8 +424,8 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state
>*mds, bool irq_avail)
>>  		return -ETIMEDOUT;
>>  	}
>>
>> -	mds->mbox_send = cxl_pci_mbox_send;
>> -	mds->payload_size =
>> +	cxl_mbox->mbox_send = cxl_pci_mbox_send;
>> +	cxl_mbox->payload_size =
>>  		1 << FIELD_GET(CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK,
>cap);
>>
>>  	/*
>> @@ -428,16 +435,16 @@ static int cxl_pci_setup_mailbox(struct
>cxl_memdev_state *mds, bool irq_avail)
>>  	 * there's no point in going forward. If the size is too large, there's
>>  	 * no harm is soft limiting it.
>>  	 */
>> -	mds->payload_size = min_t(size_t, mds->payload_size, SZ_1M);
>> -	if (mds->payload_size < 256) {
>> +	cxl_mbox->payload_size = min_t(size_t, cxl_mbox->payload_size,
>SZ_1M);
>> +	if (cxl_mbox->payload_size < 256) {
>>  		dev_err(dev, "Mailbox is too small (%zub)",
>> -			mds->payload_size);
>> +			cxl_mbox->payload_size);
>>  		return -ENXIO;
>>  	}
>>
>> -	dev_dbg(dev, "Mailbox payload sized %zu", mds->payload_size);
>> +	dev_dbg(dev, "Mailbox payload sized %zu", cxl_mbox->payload_size);
>>
>> -	rcuwait_init(&mds->mbox_wait);
>> +	rcuwait_init(&cxl_mbox->mbox_wait);
>>  	INIT_DELAYED_WORK(&mds->security.poll_dwork,
>> cxl_mbox_sanitize_work);
>>
>>  	/* background command interrupts are optional */ @@ -578,9 +585,10
>> @@ static void free_event_buf(void *buf)
>>   */
>>  static int cxl_mem_alloc_event_buf(struct cxl_memdev_state *mds)  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	struct cxl_get_event_payload *buf;
>>
>> -	buf = kvmalloc(mds->payload_size, GFP_KERNEL);
>> +	buf = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
>>  	if (!buf)
>>  		return -ENOMEM;
>>  	mds->event.buf = buf;
>> diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c index
>> 4ef93da22335..3985ff9ce70e 100644
>> --- a/drivers/cxl/pmem.c
>> +++ b/drivers/cxl/pmem.c
>> @@ -102,13 +102,15 @@ static int cxl_pmem_get_config_size(struct
>cxl_memdev_state *mds,
>>  				    struct nd_cmd_get_config_size *cmd,
>>  				    unsigned int buf_len)
>>  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>> +
>>  	if (sizeof(*cmd) > buf_len)
>>  		return -EINVAL;
>>
>>  	*cmd = (struct nd_cmd_get_config_size){
>>  		.config_size = mds->lsa_size,
>>  		.max_xfer =
>> -			mds->payload_size - sizeof(struct cxl_mbox_set_lsa),
>> +			cxl_mbox->payload_size - sizeof(struct
>cxl_mbox_set_lsa),
>>  	};
>>
>>  	return 0;
>> diff --git a/include/linux/cxl/mailbox.h b/include/linux/cxl/mailbox.h
>> new file mode 100644 index 000000000000..654df6175828
>> --- /dev/null
>> +++ b/include/linux/cxl/mailbox.h
>> @@ -0,0 +1,63 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/* Copyright(c) 2024 Intel Corporation. */ #ifndef __CXL_MBOX_H__
>> +#define __CXL_MBOX_H__
>> +
>> +#include <linux/auxiliary_bus.h>
>> +
>> +/**
>> + * struct cxl_mbox_cmd - A command to be submitted to hardware.
>> + * @opcode: (input) The command set and command submitted to hardware.
>> + * @payload_in: (input) Pointer to the input payload.
>> + * @payload_out: (output) Pointer to the output payload. Must be allocated
>by
>> + *		 the caller.
>> + * @size_in: (input) Number of bytes to load from @payload_in.
>> + * @size_out: (input) Max number of bytes loaded into @payload_out.
>> + *            (output) Number of bytes generated by the device. For fixed size
>> + *            outputs commands this is always expected to be deterministic. For
>> + *            variable sized output commands, it tells the exact number of bytes
>> + *            written.
>> + * @min_out: (input) internal command output payload size validation
>> + * @poll_count: (input) Number of timeouts to attempt.
>> + * @poll_interval_ms: (input) Time between mailbox background command
>polling
>> + *                    interval timeouts.
>> + * @return_code: (output) Error code returned from hardware.
>> + *
>> + * This is the primary mechanism used to send commands to the hardware.
>> + * All the fields except @payload_* correspond exactly to the fields
>> +described in
>> + * Command Register section of the CXL 2.0 8.2.8.4.5. @payload_in and
>> + * @payload_out are written to, and read from the Command Payload
>> +Registers
>> + * defined in CXL 2.0 8.2.8.4.8.
>> + */
>> +struct cxl_mbox_cmd {
>> +	u16 opcode;
>> +	void *payload_in;
>> +	void *payload_out;
>> +	size_t size_in;
>> +	size_t size_out;
>> +	size_t min_out;
>> +	int poll_count;
>> +	int poll_interval_ms;
>> +	u16 return_code;
>> +};
>> +
>> +/**
>> + * struct cxl_mailbox - context for CXL mailbox operations
>> + * @host: device that hosts the mailbox
>> + * @adev: auxiliary device for fw-ctl
>> + * @payload_size: Size of space for payload
>> + *                (CXL 3.1 8.2.8.4.3 Mailbox Capabilities Register)
>> + * @mbox_mutex: mutex protects device mailbox and firmware
>> + * @mbox_wait: rcuwait for mailbox
>> + * @mbox_send: @dev specific transport for transmitting mailbox
>> +commands  */ struct cxl_mailbox {
>> +	struct device *host;
>> +	struct auxiliary_device adev; /* For fw-ctl */
>> +	size_t payload_size;
>> +	struct mutex mbox_mutex; /* lock to protect mailbox context */
>> +	struct rcuwait mbox_wait;
>> +	int (*mbox_send)(struct cxl_mailbox *cxl_mbox, struct cxl_mbox_cmd
>> +*cmd); };
>> +
>> +#endif
>> diff --git a/tools/testing/cxl/test/mem.c
>> b/tools/testing/cxl/test/mem.c index 129f179b0ac5..1829b626bb40 100644
>> --- a/tools/testing/cxl/test/mem.c
>> +++ b/tools/testing/cxl/test/mem.c
>> @@ -8,6 +8,7 @@
>>  #include <linux/delay.h>
>>  #include <linux/sizes.h>
>>  #include <linux/bits.h>
>> +#include <linux/cxl/mailbox.h>
>>  #include <asm/unaligned.h>
>>  #include <crypto/sha2.h>
>>  #include <cxlmem.h>
>> @@ -534,6 +535,7 @@ static int mock_gsl(struct cxl_mbox_cmd *cmd)
>>
>>  static int mock_get_log(struct cxl_memdev_state *mds, struct
>> cxl_mbox_cmd *cmd)  {
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	struct cxl_mbox_get_log *gl = cmd->payload_in;
>>  	u32 offset = le32_to_cpu(gl->offset);
>>  	u32 length = le32_to_cpu(gl->length); @@ -542,7 +544,7 @@ static int
>> mock_get_log(struct cxl_memdev_state *mds, struct cxl_mbox_cmd *cmd)
>>
>>  	if (cmd->size_in < sizeof(*gl))
>>  		return -EINVAL;
>> -	if (length > mds->payload_size)
>> +	if (length > cxl_mbox->payload_size)
>>  		return -EINVAL;
>>  	if (offset + length > sizeof(mock_cel))
>>  		return -EINVAL;
>> @@ -617,12 +619,13 @@ void cxl_mockmem_sanitize_work(struct
>> work_struct *work)  {
>>  	struct cxl_memdev_state *mds =
>>  		container_of(work, typeof(*mds), security.poll_dwork.work);
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>
>> -	mutex_lock(&mds->mbox_mutex);
>> +	mutex_lock(&cxl_mbox->mbox_mutex);
>>  	if (mds->security.sanitize_node)
>>  		sysfs_notify_dirent(mds->security.sanitize_node);
>>  	mds->security.sanitize_active = false;
>> -	mutex_unlock(&mds->mbox_mutex);
>> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>>
>>  	dev_dbg(mds->cxlds.dev, "sanitize complete\n");  } @@ -631,6 +634,7
>> @@ static int mock_sanitize(struct cxl_mockmem_data *mdata,
>>  			 struct cxl_mbox_cmd *cmd)
>>  {
>>  	struct cxl_memdev_state *mds = mdata->mds;
>> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	int rc = 0;
>>
>>  	if (cmd->size_in != 0)
>> @@ -648,14 +652,14 @@ static int mock_sanitize(struct cxl_mockmem_data
>*mdata,
>>  		return -ENXIO;
>>  	}
>>
>> -	mutex_lock(&mds->mbox_mutex);
>> +	mutex_lock(&cxl_mbox->mbox_mutex);
>>  	if (schedule_delayed_work(&mds->security.poll_dwork,
>>  				  msecs_to_jiffies(mdata->sanitize_timeout))) {
>>  		mds->security.sanitize_active = true;
>>  		dev_dbg(mds->cxlds.dev, "sanitize issued\n");
>>  	} else
>>  		rc = -EBUSY;
>> -	mutex_unlock(&mds->mbox_mutex);
>> +	mutex_unlock(&cxl_mbox->mbox_mutex);
>>
>>  	return rc;
>>  }
>> @@ -1333,10 +1337,13 @@ static int mock_activate_fw(struct
>cxl_mockmem_data *mdata,
>>  	return -EINVAL;
>>  }
>>
>> -static int cxl_mock_mbox_send(struct cxl_memdev_state *mds,
>> +static int cxl_mock_mbox_send(struct cxl_mailbox *cxl_mbox,
>>  			      struct cxl_mbox_cmd *cmd)
>>  {
>> -	struct cxl_dev_state *cxlds = &mds->cxlds;
>> +	struct cxl_dev_state *cxlds = container_of(cxl_mbox,
>> +						   struct cxl_dev_state,
>> +						   cxl_mbox);
>> +	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>>  	struct device *dev = cxlds->dev;
>>  	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
>>  	int rc = -EIO;
>> @@ -1460,6 +1467,7 @@ static int cxl_mock_mem_probe(struct
>platform_device *pdev)
>>  	struct cxl_memdev_state *mds;
>>  	struct cxl_dev_state *cxlds;
>>  	struct cxl_mockmem_data *mdata;
>> +	struct cxl_mailbox *cxl_mbox;
>>  	int rc;
>>
>>  	mdata = devm_kzalloc(dev, sizeof(*mdata), GFP_KERNEL); @@ -1487,9
>> +1495,10 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>>  	if (IS_ERR(mds))
>>  		return PTR_ERR(mds);
>>
>> +	cxl_mbox = &mds->cxlds.cxl_mbox;
>>  	mdata->mds = mds;
>> -	mds->mbox_send = cxl_mock_mbox_send;
>> -	mds->payload_size = SZ_4K;
>> +	cxl_mbox->mbox_send = cxl_mock_mbox_send;
>> +	cxl_mbox->payload_size = SZ_4K;
>>  	mds->event.buf = (struct cxl_get_event_payload *) mdata->event_buf;
>>  	INIT_DELAYED_WORK(&mds->security.poll_dwork,
>> cxl_mockmem_sanitize_work);
>>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 01/17] EDAC: Add support for EDAC device features control
  2024-09-11  9:04 ` [PATCH v12 01/17] EDAC: Add support for EDAC device features control shiju.jose
@ 2024-09-13 16:40   ` Borislav Petkov
  2024-09-16  9:21     ` Shiju Jose
  0 siblings, 1 reply; 39+ messages in thread
From: Borislav Petkov @ 2024-09-13 16:40 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm

On Wed, Sep 11, 2024 at 10:04:30AM +0100, shiju.jose@huawei.com wrote:
> +/**
> + * edac_dev_feature_init - Init a RAS feature
> + * @parent: client device.
> + * @dev_data: pointer to the edac_dev_data structure, which contains
> + * client device specific info.
> + * @feat: pointer to struct edac_dev_feature.
> + * @attr_groups: pointer to attribute group's container.
> + *
> + * Returns number of scrub features attribute groups on success,

Not "scrub" - this is an interface initializing a generic feature.

> + * error otherwise.
> + */
> +static int edac_dev_feat_init(struct device *parent,
> +			      struct edac_dev_data *dev_data,
> +			      const struct edac_dev_feature *ras_feat,
> +			      const struct attribute_group **attr_groups)
> +{
> +	int num;
> +
> +	switch (ras_feat->ft_type) {
> +	case RAS_FEAT_SCRUB:
> +		dev_data->scrub_ops = ras_feat->scrub_ops;
> +		dev_data->private = ras_feat->ctx;
> +		return 1;
> +	case RAS_FEAT_ECS:
> +		num = ras_feat->ecs_info.num_media_frus;
> +		dev_data->ecs_ops = ras_feat->ecs_ops;
> +		dev_data->private = ras_feat->ctx;
> +		return num;
> +	case RAS_FEAT_PPR:
> +		dev_data->ppr_ops = ras_feat->ppr_ops;
> +		dev_data->private = ras_feat->ctx;
> +		return 1;
> +	default:
> +		return -EINVAL;
> +	}
> +}

And why does this function even exist and has kernel-doc comments when all it
does is assign a couple of values? And it gets called exactly once?

Just merge its body into the call site. There you can reuse the switch-case
there too. No need for too much noodling around.

> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index b4ee8961e623..b337254cf5b8 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -661,4 +661,59 @@ static inline struct dimm_info *edac_get_dimm(struct mem_ctl_info *mci,
>  
>  	return mci->dimms[index];
>  }
> +
> +/* EDAC device features */
> +
> +#define EDAC_FEAT_NAME_LEN	128
> +
> +/* RAS feature type */
> +enum edac_dev_feat {
> +	RAS_FEAT_SCRUB,
> +	RAS_FEAT_ECS,
> +	RAS_FEAT_PPR,
> +	RAS_FEAT_MAX

I still don't know what ECS or PPR is.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 02/17] EDAC: Add EDAC scrub control driver
  2024-09-11  9:04 ` [PATCH v12 02/17] EDAC: Add EDAC scrub control driver shiju.jose
@ 2024-09-13 17:25   ` Borislav Petkov
  2024-09-16  9:22     ` Shiju Jose
  2024-09-26 23:04   ` Fan Ni
  1 sibling, 1 reply; 39+ messages in thread
From: Borislav Petkov @ 2024-09-13 17:25 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm

On Wed, Sep 11, 2024 at 10:04:31AM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add generic EDAC scrub control driver supports configuring the memory scrubbers

s/supports configuring the/in order to configure/

> in the system. The device with scrub feature, get the scrub descriptor from the
> EDAC scrub and registers with the EDAC RAS feature driver, which adds the sysfs
> scrub control interface.

That sentence reads wrong.

> The scrub control attributes for a scrub instance are
> available to userspace in /sys/bus/edac/devices/<dev-name>/scrub*/.
> 
> Generic EDAC scrub driver and the common sysfs scrub interface promotes
> unambiguous access from the userspace irrespective of the underlying scrub
> devices.

Huh?

Do you wanna say something along the lines that the common sysfs scrub
interface abstracts the control of an arbitrary scrubbing functionality into
a common set of functions or so?

> The sysfs scrub attribute nodes would be present only if the client driver
> has implemented the corresponding attribute callback function and pass in ops
> to the EDAC RAS feature driver during registration.
> 
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  Documentation/ABI/testing/sysfs-edac-scrub |  69 ++++
>  drivers/edac/Makefile                      |   1 +
>  drivers/edac/edac_device.c                 |   6 +-
>  drivers/edac/edac_scrub.c                  | 377 +++++++++++++++++++++
>  include/linux/edac.h                       |  30 ++
>  5 files changed, 482 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-edac-scrub
>  create mode 100755 drivers/edac/edac_scrub.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-edac-scrub b/Documentation/ABI/testing/sysfs-edac-scrub
> new file mode 100644
> index 000000000000..f465cc91423f
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-edac-scrub

...

> +What:		/sys/bus/edac/devices/<dev-name>/scrub*/current_cycle_duration
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) The current scrub cycle duration in seconds and must be
> +		within the supported range by the memory scrubber.

So in reading about that interface, where is the user doc explaining how one
should use scrubbers?

> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
> index 4edfb83ffbee..fbf0e39ec678 100644
> --- a/drivers/edac/Makefile
> +++ b/drivers/edac/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
>  
>  edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
>  edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
> +edac_core-y	+= edac_scrub.o

Just scrub.[co]. The file is already in drivers/edac/. Too many "edac"
strings. :)

>  
>  edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
>  
> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> index e4a5d010ea2d..6381896b6424 100644
> --- a/drivers/edac/edac_device.c
> +++ b/drivers/edac/edac_device.c
> @@ -608,12 +608,16 @@ static int edac_dev_feat_init(struct device *parent,
>  			      const struct edac_dev_feature *ras_feat,
>  			      const struct attribute_group **attr_groups)
>  {
> -	int num;
> +	int num, ret;
>  
>  	switch (ras_feat->ft_type) {
>  	case RAS_FEAT_SCRUB:
>  		dev_data->scrub_ops = ras_feat->scrub_ops;
>  		dev_data->private = ras_feat->ctx;
> +		ret = edac_scrub_get_desc(parent, attr_groups,
> +					  ras_feat->instance);
> +		if (ret)
> +			return ret;
>  		return 1;
>  	case RAS_FEAT_ECS:
>  		num = ras_feat->ecs_info.num_media_frus;
> diff --git a/drivers/edac/edac_scrub.c b/drivers/edac/edac_scrub.c
> new file mode 100755
> index 000000000000..3f8f37629acf
> --- /dev/null
> +++ b/drivers/edac/edac_scrub.c
> @@ -0,0 +1,377 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Generic EDAC scrub driver supports controlling the memory
> + * scrubbers in the system and the common sysfs scrub interface
> + * promotes unambiguous access from the userspace.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + */
> +
> +#define pr_fmt(fmt)     "EDAC SCRUB: " fmt
> +
> +#include <linux/edac.h>
> +
> +enum edac_scrub_attributes {
> +	SCRUB_ADDR_RANGE_BASE,
> +	SCRUB_ADDR_RANGE_SIZE,
> +	SCRUB_ENABLE_BACKGROUND,
> +	SCRUB_ENABLE_ON_DEMAND,
> +	SCRUB_MIN_CYCLE_DURATION,
> +	SCRUB_MAX_CYCLE_DURATION,
> +	SCRUB_CURRENT_CYCLE_DURATION,
> +	SCRUB_MAX_ATTRS
> +};
> +
> +struct edac_scrub_dev_attr {
> +	struct device_attribute dev_attr;
> +	u8 instance;
> +};
> +
> +struct edac_scrub_context {
> +	char name[EDAC_FEAT_NAME_LEN];
> +	struct edac_scrub_dev_attr scrub_dev_attr[SCRUB_MAX_ATTRS];
> +	struct attribute *scrub_attrs[SCRUB_MAX_ATTRS + 1];
> +	struct attribute_group group;
> +};
> +
> +#define to_scrub_dev_attr(_dev_attr)      \
> +		container_of(_dev_attr, struct edac_scrub_dev_attr, dev_attr)
> +
> +static ssize_t addr_range_base_show(struct device *ras_feat_dev,
> +				    struct device_attribute *attr,
> +				    char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u64 base, size;
> +	int ret;
> +
> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "0x%llx\n", base);
> +}
> +
> +static ssize_t addr_range_size_show(struct device *ras_feat_dev,
> +				    struct device_attribute *attr,
> +				    char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u64 base, size;
> +	int ret;
> +
> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "0x%llx\n", size);
> +}
> +
> +static ssize_t addr_range_base_store(struct device *ras_feat_dev,
> +				     struct device_attribute *attr,
> +				     const char *buf, size_t len)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u64 base, size;
> +	int ret;
> +
> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
> +	if (ret)
> +		return ret;

> +
> +	ret = kstrtou64(buf, 0, &base);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->write_range(ras_feat_dev->parent, ctx->scrub[inst].private, base, size);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t addr_range_size_store(struct device *ras_feat_dev,
> +				     struct device_attribute *attr,
> +				     const char *buf,
> +				     size_t len)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u64 base, size;
> +	int ret;
> +
> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
> +	if (ret)
> +		return ret;
> +

Can all that repetitive code be abstracted away in macros pls?

Below too.

> +	ret = kstrtou64(buf, 0, &size);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->write_range(ras_feat_dev->parent, ctx->scrub[inst].private, base, size);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t enable_background_store(struct device *ras_feat_dev,
> +				       struct device_attribute *attr,
> +				       const char *buf, size_t len)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	bool enable;
> +	int ret;
> +
> +	ret = kstrtobool(buf, &enable);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->set_enabled_bg(ras_feat_dev->parent, ctx->scrub[inst].private, enable);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t enable_background_show(struct device *ras_feat_dev,
> +				      struct device_attribute *attr, char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	bool enable;
> +	int ret;
> +
> +	ret = ops->get_enabled_bg(ras_feat_dev->parent, ctx->scrub[inst].private, &enable);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%d\n", enable);
> +}
> +
> +static ssize_t enable_on_demand_show(struct device *ras_feat_dev,
> +				     struct device_attribute *attr, char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	bool enable;
> +	int ret;
> +
> +	ret = ops->get_enabled_od(ras_feat_dev->parent, ctx->scrub[inst].private, &enable);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%d\n", enable);
> +}
> +
> +static ssize_t enable_on_demand_store(struct device *ras_feat_dev,
> +				      struct device_attribute *attr,
> +				      const char *buf, size_t len)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	bool enable;
> +	int ret;
> +
> +	ret = kstrtobool(buf, &enable);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->set_enabled_od(ras_feat_dev->parent, ctx->scrub[inst].private, enable);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t min_cycle_duration_show(struct device *ras_feat_dev,
> +				       struct device_attribute *attr,
> +				       char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->min_cycle_read(ras_feat_dev->parent, ctx->scrub[inst].private, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t max_cycle_duration_show(struct device *ras_feat_dev,
> +				       struct device_attribute *attr,
> +				       char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->max_cycle_read(ras_feat_dev->parent, ctx->scrub[inst].private, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t current_cycle_duration_show(struct device *ras_feat_dev,
> +					   struct device_attribute *attr,
> +					   char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->cycle_duration_read(ras_feat_dev->parent, ctx->scrub[inst].private, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t current_cycle_duration_store(struct device *ras_feat_dev,
> +					    struct device_attribute *attr,
> +					    const char *buf, size_t len)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	long val;
> +	int ret;
> +
> +	ret = kstrtol(buf, 0, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->cycle_duration_write(ras_feat_dev->parent, ctx->scrub[inst].private, val);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static umode_t scrub_attr_visible(struct kobject *kobj,
> +				  struct attribute *a, int attr_id)
> +{
> +	struct device *ras_feat_dev = kobj_to_dev(kobj);
> +	struct device_attribute *dev_attr =
> +				container_of(a, struct device_attribute, attr);

No silly linebreaks like that pls. Check your whole patchset.

> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(dev_attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +
> +	switch (attr_id) {
> +	case SCRUB_ADDR_RANGE_BASE:
> +	case SCRUB_ADDR_RANGE_SIZE:
> +		if (ops->read_range && ops->write_range)
> +			return a->mode;
> +		if (ops->read_range)
> +			return 0444;

		if (...read_range) {
			if (...write_range)
				return a->mode;
			else
				return 0444:
		}
		break;

and now put a single "return 0;" at the end of the function.

Below too.

> +		return 0;
> +	case SCRUB_ENABLE_BACKGROUND:
> +		if (ops->get_enabled_bg && ops->set_enabled_bg)
> +			return a->mode;
> +		if (ops->get_enabled_bg)
> +			return 0444;
> +		return 0;
> +	case SCRUB_ENABLE_ON_DEMAND:
> +		if (ops->get_enabled_od && ops->set_enabled_od)
> +			return a->mode;
> +		if (ops->get_enabled_od)
> +			return 0444;
> +		return 0;
> +	case SCRUB_MIN_CYCLE_DURATION:
> +		return ops->min_cycle_read ? a->mode : 0;

		if (ops->min_cycle_read)
			return a->mode;

> +	case SCRUB_MAX_CYCLE_DURATION:
> +		return ops->max_cycle_read ? a->mode : 0;
> +	case SCRUB_CURRENT_CYCLE_DURATION:
> +		if (ops->cycle_duration_read && ops->cycle_duration_write)
> +			return a->mode;
> +		if (ops->cycle_duration_read)
> +			return 0444;
> +		return 0;
> +	default:
> +		return 0;
> +	}
> +}
> +
> +#define EDAC_SCRUB_ATTR_RO(_name, _instance)       \
> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_RO(_name), \
> +				     .instance = _instance })
> +
> +#define EDAC_SCRUB_ATTR_WO(_name, _instance)       \
> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_WO(_name), \
> +				     .instance = _instance })
> +
> +#define EDAC_SCRUB_ATTR_RW(_name, _instance)       \
> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_RW(_name), \
> +				     .instance = _instance })
> +
> +static int scrub_create_desc(struct device *scrub_dev,
> +			     const struct attribute_group **attr_groups,
> +			     u8 instance)
> +{
> +	struct edac_scrub_context *scrub_ctx;
> +	struct attribute_group *group;
> +	int i;
> +
> +	scrub_ctx = devm_kzalloc(scrub_dev, sizeof(*scrub_ctx), GFP_KERNEL);
> +	if (!scrub_ctx)
> +		return -ENOMEM;
> +
> +	group = &scrub_ctx->group;
> +	scrub_ctx->scrub_dev_attr[0] = EDAC_SCRUB_ATTR_RW(addr_range_base, instance);
> +	scrub_ctx->scrub_dev_attr[1] = EDAC_SCRUB_ATTR_RW(addr_range_size, instance);
> +	scrub_ctx->scrub_dev_attr[2] = EDAC_SCRUB_ATTR_RW(enable_background, instance);
> +	scrub_ctx->scrub_dev_attr[3] = EDAC_SCRUB_ATTR_RW(enable_on_demand, instance);
> +	scrub_ctx->scrub_dev_attr[4] = EDAC_SCRUB_ATTR_RO(min_cycle_duration, instance);
> +	scrub_ctx->scrub_dev_attr[5] = EDAC_SCRUB_ATTR_RO(max_cycle_duration, instance);
> +	scrub_ctx->scrub_dev_attr[6] = EDAC_SCRUB_ATTR_RW(current_cycle_duration, instance);

Why use the naked numbers when you have enum edac_scrub_attributes?

> +	for (i = 0; i < SCRUB_MAX_ATTRS; i++)
> +		scrub_ctx->scrub_attrs[i] = &scrub_ctx->scrub_dev_attr[i].dev_attr.attr;
> +
> +	sprintf(scrub_ctx->name, "%s%d", "scrub", instance);
> +	group->name = scrub_ctx->name;
> +	group->attrs = scrub_ctx->scrub_attrs;
> +	group->is_visible  = scrub_attr_visible;
> +
> +	attr_groups[0] = group;
> +
> +	return 0;
> +}
> +
> +/**
> + * edac_scrub_get_desc - get EDAC scrub descriptors
> + * @scrub_dev: client device, with scrub support
> + * @attr_groups: pointer to attrribute group container

+ * @attr_groups: pointer to attrribute group container
Unknown word [attrribute] in comment.
Suggestions: ['attribute', 'attributed', 'attributes', "attribute's", 'attributive', 'tribute']

Please introduce a spellchecker into your patch creation workflow.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 01/17] EDAC: Add support for EDAC device features control
  2024-09-13 16:40   ` Borislav Petkov
@ 2024-09-16  9:21     ` Shiju Jose
  2024-09-16 10:50       ` Jonathan Cameron
  0 siblings, 1 reply; 39+ messages in thread
From: Shiju Jose @ 2024-09-16  9:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	Jonathan Cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm

Thanks for reviewing.

>-----Original Message-----
>From: Borislav Petkov <bp@alien8.de>
>Sent: 13 September 2024 17:41
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; jgroves@micron.com;
>vsalve@micron.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [PATCH v12 01/17] EDAC: Add support for EDAC device features
>control
>
>On Wed, Sep 11, 2024 at 10:04:30AM +0100, shiju.jose@huawei.com wrote:
>> +/**
>> + * edac_dev_feature_init - Init a RAS feature
>> + * @parent: client device.
>> + * @dev_data: pointer to the edac_dev_data structure, which contains
>> + * client device specific info.
>> + * @feat: pointer to struct edac_dev_feature.
>> + * @attr_groups: pointer to attribute group's container.
>> + *
>> + * Returns number of scrub features attribute groups on success,
>
>Not "scrub" - this is an interface initializing a generic feature.
Will correct.
>
>> + * error otherwise.
>> + */
>> +static int edac_dev_feat_init(struct device *parent,
>> +			      struct edac_dev_data *dev_data,
>> +			      const struct edac_dev_feature *ras_feat,
>> +			      const struct attribute_group **attr_groups) {
>> +	int num;
>> +
>> +	switch (ras_feat->ft_type) {
>> +	case RAS_FEAT_SCRUB:
>> +		dev_data->scrub_ops = ras_feat->scrub_ops;
>> +		dev_data->private = ras_feat->ctx;
>> +		return 1;
>> +	case RAS_FEAT_ECS:
>> +		num = ras_feat->ecs_info.num_media_frus;
>> +		dev_data->ecs_ops = ras_feat->ecs_ops;
>> +		dev_data->private = ras_feat->ctx;
>> +		return num;
>> +	case RAS_FEAT_PPR:
>> +		dev_data->ppr_ops = ras_feat->ppr_ops;
>> +		dev_data->private = ras_feat->ctx;
>> +		return 1;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +}
>
>And why does this function even exist and has kernel-doc comments when all it
>does is assign a couple of values? And it gets called exactly once?
>
>Just merge its body into the call site. There you can reuse the switch-case there
>too. No need for too much noodling around.
edac_dev_feat_init () function is updated with feature specific function call() etc in subsequent
EDAC feature specific patches. Thus added a separate function.   
>
>> diff --git a/include/linux/edac.h b/include/linux/edac.h index
>> b4ee8961e623..b337254cf5b8 100644
>> --- a/include/linux/edac.h
>> +++ b/include/linux/edac.h
>> @@ -661,4 +661,59 @@ static inline struct dimm_info
>> *edac_get_dimm(struct mem_ctl_info *mci,
>>
>>  	return mci->dimms[index];
>>  }
>> +
>> +/* EDAC device features */
>> +
>> +#define EDAC_FEAT_NAME_LEN	128
>> +
>> +/* RAS feature type */
>> +enum edac_dev_feat {
>> +	RAS_FEAT_SCRUB,
>> +	RAS_FEAT_ECS,
>> +	RAS_FEAT_PPR,
>> +	RAS_FEAT_MAX
>
>I still don't know what ECS or PPR is.
I will add comment/documentation here with a short explanation of features
if that make sense?
Each feature is described in the subsequent EDAC feature specific patches. 
>
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette

Thanks,
Shiju


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 02/17] EDAC: Add EDAC scrub control driver
  2024-09-13 17:25   ` Borislav Petkov
@ 2024-09-16  9:22     ` Shiju Jose
  0 siblings, 0 replies; 39+ messages in thread
From: Shiju Jose @ 2024-09-16  9:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	Jonathan Cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm

Thanks for reviewing.

>-----Original Message-----
>From: Borislav Petkov <bp@alien8.de>
>Sent: 13 September 2024 18:25
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; jgroves@micron.com;
>vsalve@micron.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [PATCH v12 02/17] EDAC: Add EDAC scrub control driver
>
>On Wed, Sep 11, 2024 at 10:04:31AM +0100, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Add generic EDAC scrub control driver supports configuring the memory
>> scrubbers
>
>s/supports configuring the/in order to configure/
Will change.

>
>> in the system. The device with scrub feature, get the scrub descriptor
>> from the EDAC scrub and registers with the EDAC RAS feature driver,
>> which adds the sysfs scrub control interface.
>
>That sentence reads wrong.
You are right.  Will update.
>
>> The scrub control attributes for a scrub instance are available to
>> userspace in /sys/bus/edac/devices/<dev-name>/scrub*/.
>>
>> Generic EDAC scrub driver and the common sysfs scrub interface
>> promotes unambiguous access from the userspace irrespective of the
>> underlying scrub devices.
>
>Huh?
>
>Do you wanna say something along the lines that the common sysfs scrub
>interface abstracts the control of an arbitrary scrubbing functionality into a
>common set of functions or so?
Sure. Will change.
>
>> The sysfs scrub attribute nodes would be present only if the client
>> driver has implemented the corresponding attribute callback function
>> and pass in ops to the EDAC RAS feature driver during registration.
>>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  Documentation/ABI/testing/sysfs-edac-scrub |  69 ++++
>>  drivers/edac/Makefile                      |   1 +
>>  drivers/edac/edac_device.c                 |   6 +-
>>  drivers/edac/edac_scrub.c                  | 377 +++++++++++++++++++++
>>  include/linux/edac.h                       |  30 ++
>>  5 files changed, 482 insertions(+), 1 deletion(-)  create mode 100644
>> Documentation/ABI/testing/sysfs-edac-scrub
>>  create mode 100755 drivers/edac/edac_scrub.c
>>
>> diff --git a/Documentation/ABI/testing/sysfs-edac-scrub
>> b/Documentation/ABI/testing/sysfs-edac-scrub
>> new file mode 100644
>> index 000000000000..f465cc91423f
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-edac-scrub
>
>...
>
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrub*/current_cycle_duration
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) The current scrub cycle duration in seconds and must be
>> +		within the supported range by the memory scrubber.
>
>So in reading about that interface, where is the user doc explaining how one
>should use scrubbers?
User doc for scrub feature is part of the patches for 
CXL patrol scrub, https://lore.kernel.org/linux-mm/20240911090447.751-11-shiju.jose@huawei.com/
RAS2 scrub, https://lore.kernel.org/linux-mm/20240911090447.751-15-shiju.jose@huawei.com/ 
>
>> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index
>> 4edfb83ffbee..fbf0e39ec678 100644
>> --- a/drivers/edac/Makefile
>> +++ b/drivers/edac/Makefile
>> @@ -10,6 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
>>
>>  edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
>>  edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
>> +edac_core-y	+= edac_scrub.o
>
>Just scrub.[co]. The file is already in drivers/edac/. Too many "edac"
>strings. :)
Ok. Will change the file names.
>
>>
>>  edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
>>
>> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
>> index e4a5d010ea2d..6381896b6424 100644
>> --- a/drivers/edac/edac_device.c
>> +++ b/drivers/edac/edac_device.c
>> @@ -608,12 +608,16 @@ static int edac_dev_feat_init(struct device *parent,
>>  			      const struct edac_dev_feature *ras_feat,
>>  			      const struct attribute_group **attr_groups)  {
>> -	int num;
>> +	int num, ret;
>>
>>  	switch (ras_feat->ft_type) {
>>  	case RAS_FEAT_SCRUB:
>>  		dev_data->scrub_ops = ras_feat->scrub_ops;
>>  		dev_data->private = ras_feat->ctx;
>> +		ret = edac_scrub_get_desc(parent, attr_groups,
>> +					  ras_feat->instance);
>> +		if (ret)
>> +			return ret;
>>  		return 1;
>>  	case RAS_FEAT_ECS:
>>  		num = ras_feat->ecs_info.num_media_frus;
>> diff --git a/drivers/edac/edac_scrub.c b/drivers/edac/edac_scrub.c new
>> file mode 100755 index 000000000000..3f8f37629acf
>> --- /dev/null
>> +++ b/drivers/edac/edac_scrub.c
>> @@ -0,0 +1,377 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Generic EDAC scrub driver supports controlling the memory
>> + * scrubbers in the system and the common sysfs scrub interface
>> + * promotes unambiguous access from the userspace.
>> + *
>> + * Copyright (c) 2024 HiSilicon Limited.
>> + */
>> +
>> +#define pr_fmt(fmt)     "EDAC SCRUB: " fmt
>> +
>> +#include <linux/edac.h>
>> +
>> +enum edac_scrub_attributes {
>> +	SCRUB_ADDR_RANGE_BASE,
>> +	SCRUB_ADDR_RANGE_SIZE,
>> +	SCRUB_ENABLE_BACKGROUND,
>> +	SCRUB_ENABLE_ON_DEMAND,
>> +	SCRUB_MIN_CYCLE_DURATION,
>> +	SCRUB_MAX_CYCLE_DURATION,
>> +	SCRUB_CURRENT_CYCLE_DURATION,
>> +	SCRUB_MAX_ATTRS
>> +};
>> +
>> +struct edac_scrub_dev_attr {
>> +	struct device_attribute dev_attr;
>> +	u8 instance;
>> +};
>> +
>> +struct edac_scrub_context {
>> +	char name[EDAC_FEAT_NAME_LEN];
>> +	struct edac_scrub_dev_attr scrub_dev_attr[SCRUB_MAX_ATTRS];
>> +	struct attribute *scrub_attrs[SCRUB_MAX_ATTRS + 1];
>> +	struct attribute_group group;
>> +};
>> +
>> +#define to_scrub_dev_attr(_dev_attr)      \
>> +		container_of(_dev_attr, struct edac_scrub_dev_attr, dev_attr)
>> +
>> +static ssize_t addr_range_base_show(struct device *ras_feat_dev,
>> +				    struct device_attribute *attr,
>> +				    char *buf)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u64 base, size;
>> +	int ret;
>> +
>> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>&base, &size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "0x%llx\n", base); }
>> +
>> +static ssize_t addr_range_size_show(struct device *ras_feat_dev,
>> +				    struct device_attribute *attr,
>> +				    char *buf)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u64 base, size;
>> +	int ret;
>> +
>> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>&base, &size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "0x%llx\n", size); }
>> +
>> +static ssize_t addr_range_base_store(struct device *ras_feat_dev,
>> +				     struct device_attribute *attr,
>> +				     const char *buf, size_t len) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u64 base, size;
>> +	int ret;
>> +
>> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>&base, &size);
>> +	if (ret)
>> +		return ret;
>
>> +
>> +	ret = kstrtou64(buf, 0, &base);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = ops->write_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>base, size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t addr_range_size_store(struct device *ras_feat_dev,
>> +				     struct device_attribute *attr,
>> +				     const char *buf,
>> +				     size_t len)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u64 base, size;
>> +	int ret;
>> +
>> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>&base, &size);
>> +	if (ret)
>> +		return ret;
>> +
>
>Can all that repetitive code be abstracted away in macros pls?
>
>Below too.
Sure.  Will do.
>
>> +	ret = kstrtou64(buf, 0, &size);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = ops->write_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>base, size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t enable_background_store(struct device *ras_feat_dev,
>> +				       struct device_attribute *attr,
>> +				       const char *buf, size_t len) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = kstrtobool(buf, &enable);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = ops->set_enabled_bg(ras_feat_dev->parent, ctx-
>>scrub[inst].private, enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t enable_background_show(struct device *ras_feat_dev,
>> +				      struct device_attribute *attr, char *buf) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = ops->get_enabled_bg(ras_feat_dev->parent, ctx-
>>scrub[inst].private, &enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%d\n", enable); }
>> +
>> +static ssize_t enable_on_demand_show(struct device *ras_feat_dev,
>> +				     struct device_attribute *attr, char *buf) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = ops->get_enabled_od(ras_feat_dev->parent, ctx-
>>scrub[inst].private, &enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%d\n", enable); }
>> +
>> +static ssize_t enable_on_demand_store(struct device *ras_feat_dev,
>> +				      struct device_attribute *attr,
>> +				      const char *buf, size_t len) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = kstrtobool(buf, &enable);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = ops->set_enabled_od(ras_feat_dev->parent, ctx-
>>scrub[inst].private, enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t min_cycle_duration_show(struct device *ras_feat_dev,
>> +				       struct device_attribute *attr,
>> +				       char *buf)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u32 val;
>> +	int ret;
>> +
>> +	ret = ops->min_cycle_read(ras_feat_dev->parent, ctx-
>>scrub[inst].private, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%u\n", val); }
>> +
>> +static ssize_t max_cycle_duration_show(struct device *ras_feat_dev,
>> +				       struct device_attribute *attr,
>> +				       char *buf)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u32 val;
>> +	int ret;
>> +
>> +	ret = ops->max_cycle_read(ras_feat_dev->parent, ctx-
>>scrub[inst].private, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%u\n", val); }
>> +
>> +static ssize_t current_cycle_duration_show(struct device *ras_feat_dev,
>> +					   struct device_attribute *attr,
>> +					   char *buf)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u32 val;
>> +	int ret;
>> +
>> +	ret = ops->cycle_duration_read(ras_feat_dev->parent, ctx-
>>scrub[inst].private, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%u\n", val); }
>> +
>> +static ssize_t current_cycle_duration_store(struct device *ras_feat_dev,
>> +					    struct device_attribute *attr,
>> +					    const char *buf, size_t len) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	long val;
>> +	int ret;
>> +
>> +	ret = kstrtol(buf, 0, &val);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = ops->cycle_duration_write(ras_feat_dev->parent, ctx-
>>scrub[inst].private, val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static umode_t scrub_attr_visible(struct kobject *kobj,
>> +				  struct attribute *a, int attr_id) {
>> +	struct device *ras_feat_dev = kobj_to_dev(kobj);
>> +	struct device_attribute *dev_attr =
>> +				container_of(a, struct device_attribute, attr);
>
>No silly linebreaks like that pls. Check your whole patchset.
Will change.
>
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(dev_attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +
>> +	switch (attr_id) {
>> +	case SCRUB_ADDR_RANGE_BASE:
>> +	case SCRUB_ADDR_RANGE_SIZE:
>> +		if (ops->read_range && ops->write_range)
>> +			return a->mode;
>> +		if (ops->read_range)
>> +			return 0444;
>
>		if (...read_range) {
>			if (...write_range)
>				return a->mode;
>			else
>				return 0444:
>		}
>		break;
>
>and now put a single "return 0;" at the end of the function.
Will do, if  increase in  lines of code is ok? 
>
>Below too.
>
>> +		return 0;
>> +	case SCRUB_ENABLE_BACKGROUND:
>> +		if (ops->get_enabled_bg && ops->set_enabled_bg)
>> +			return a->mode;
>> +		if (ops->get_enabled_bg)
>> +			return 0444;
>> +		return 0;
>> +	case SCRUB_ENABLE_ON_DEMAND:
>> +		if (ops->get_enabled_od && ops->set_enabled_od)
>> +			return a->mode;
>> +		if (ops->get_enabled_od)
>> +			return 0444;
>> +		return 0;
>> +	case SCRUB_MIN_CYCLE_DURATION:
>> +		return ops->min_cycle_read ? a->mode : 0;
>
>		if (ops->min_cycle_read)
>			return a->mode;
>
>> +	case SCRUB_MAX_CYCLE_DURATION:
>> +		return ops->max_cycle_read ? a->mode : 0;
>> +	case SCRUB_CURRENT_CYCLE_DURATION:
>> +		if (ops->cycle_duration_read && ops->cycle_duration_write)
>> +			return a->mode;
>> +		if (ops->cycle_duration_read)
>> +			return 0444;
>> +		return 0;
>> +	default:
>> +		return 0;
>> +	}
>> +}
>> +
>> +#define EDAC_SCRUB_ATTR_RO(_name, _instance)       \
>> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_RO(_name), \
>> +				     .instance = _instance })
>> +
>> +#define EDAC_SCRUB_ATTR_WO(_name, _instance)       \
>> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_WO(_name), \
>> +				     .instance = _instance })
>> +
>> +#define EDAC_SCRUB_ATTR_RW(_name, _instance)       \
>> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_RW(_name), \
>> +				     .instance = _instance })
>> +
>> +static int scrub_create_desc(struct device *scrub_dev,
>> +			     const struct attribute_group **attr_groups,
>> +			     u8 instance)
>> +{
>> +	struct edac_scrub_context *scrub_ctx;
>> +	struct attribute_group *group;
>> +	int i;
>> +
>> +	scrub_ctx = devm_kzalloc(scrub_dev, sizeof(*scrub_ctx), GFP_KERNEL);
>> +	if (!scrub_ctx)
>> +		return -ENOMEM;
>> +
>> +	group = &scrub_ctx->group;
>> +	scrub_ctx->scrub_dev_attr[0] =
>EDAC_SCRUB_ATTR_RW(addr_range_base, instance);
>> +	scrub_ctx->scrub_dev_attr[1] =
>EDAC_SCRUB_ATTR_RW(addr_range_size, instance);
>> +	scrub_ctx->scrub_dev_attr[2] =
>EDAC_SCRUB_ATTR_RW(enable_background, instance);
>> +	scrub_ctx->scrub_dev_attr[3] =
>EDAC_SCRUB_ATTR_RW(enable_on_demand, instance);
>> +	scrub_ctx->scrub_dev_attr[4] =
>EDAC_SCRUB_ATTR_RO(min_cycle_duration, instance);
>> +	scrub_ctx->scrub_dev_attr[5] =
>EDAC_SCRUB_ATTR_RO(max_cycle_duration, instance);
>> +	scrub_ctx->scrub_dev_attr[6] =
>> +EDAC_SCRUB_ATTR_RW(current_cycle_duration, instance);
>
>Why use the naked numbers when you have enum edac_scrub_attributes?
Sure. 
>
>> +	for (i = 0; i < SCRUB_MAX_ATTRS; i++)
>> +		scrub_ctx->scrub_attrs[i] =
>> +&scrub_ctx->scrub_dev_attr[i].dev_attr.attr;
>> +
>> +	sprintf(scrub_ctx->name, "%s%d", "scrub", instance);
>> +	group->name = scrub_ctx->name;
>> +	group->attrs = scrub_ctx->scrub_attrs;
>> +	group->is_visible  = scrub_attr_visible;
>> +
>> +	attr_groups[0] = group;
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * edac_scrub_get_desc - get EDAC scrub descriptors
>> + * @scrub_dev: client device, with scrub support
>> + * @attr_groups: pointer to attrribute group container
>
>+ * @attr_groups: pointer to attrribute group container
>Unknown word [attrribute] in comment.
>Suggestions: ['attribute', 'attributed', 'attributes', "attribute's", 'attributive',
>'tribute']
>
>Please introduce a spellchecker into your patch creation workflow.
Sure.
>
>--
>Regards/Gruss,
>    Boris.
>
>https://people.kernel.org/tglx/notes-about-netiquette
>

Thanks,
Shiju



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 01/17] EDAC: Add support for EDAC device features control
  2024-09-16  9:21     ` Shiju Jose
@ 2024-09-16 10:50       ` Jonathan Cameron
  2024-09-16 16:16         ` Shiju Jose
  0 siblings, 1 reply; 39+ messages in thread
From: Jonathan Cameron @ 2024-09-16 10:50 UTC (permalink / raw)
  To: Shiju Jose
  Cc: Borislav Petkov, linux-edac, linux-cxl, linux-acpi, linux-mm,
	linux-kernel, tony.luck, rafael, lenb, mchehab, dan.j.williams,
	dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	david, Vilas.Sridharan, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	jgroves, vsalve, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm

On Mon, 16 Sep 2024 10:21:58 +0100
Shiju Jose <shiju.jose@huawei.com> wrote:

> Thanks for reviewing.
> 
> >-----Original Message-----
> >From: Borislav Petkov <bp@alien8.de>
> >Sent: 13 September 2024 17:41
> >To: Shiju Jose <shiju.jose@huawei.com>
> >Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
> >acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
> >tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
> >mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
> >Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
> >alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
> >david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
> >Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
> >Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
> >naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
> >somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
> >duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com;
> >wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
> >wbs@os.amperecomputing.com; nifan.cxl@gmail.com; jgroves@micron.com;
> >vsalve@micron.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
> ><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
> >kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
> >Linuxarm <linuxarm@huawei.com>
> >Subject: Re: [PATCH v12 01/17] EDAC: Add support for EDAC device features
> >control
> >
> >On Wed, Sep 11, 2024 at 10:04:30AM +0100, shiju.jose@huawei.com wrote:  
> >> +/**
> >> + * edac_dev_feature_init - Init a RAS feature
> >> + * @parent: client device.
> >> + * @dev_data: pointer to the edac_dev_data structure, which contains
> >> + * client device specific info.
> >> + * @feat: pointer to struct edac_dev_feature.
> >> + * @attr_groups: pointer to attribute group's container.
> >> + *
> >> + * Returns number of scrub features attribute groups on success,  
> >
> >Not "scrub" - this is an interface initializing a generic feature.  
> Will correct.
> >  
> >> + * error otherwise.
> >> + */
> >> +static int edac_dev_feat_init(struct device *parent,
> >> +			      struct edac_dev_data *dev_data,
> >> +			      const struct edac_dev_feature *ras_feat,
> >> +			      const struct attribute_group **attr_groups) {
> >> +	int num;
> >> +
> >> +	switch (ras_feat->ft_type) {
> >> +	case RAS_FEAT_SCRUB:
> >> +		dev_data->scrub_ops = ras_feat->scrub_ops;
> >> +		dev_data->private = ras_feat->ctx;
> >> +		return 1;
> >> +	case RAS_FEAT_ECS:
> >> +		num = ras_feat->ecs_info.num_media_frus;
> >> +		dev_data->ecs_ops = ras_feat->ecs_ops;
> >> +		dev_data->private = ras_feat->ctx;
> >> +		return num;
> >> +	case RAS_FEAT_PPR:
> >> +		dev_data->ppr_ops = ras_feat->ppr_ops;
> >> +		dev_data->private = ras_feat->ctx;
> >> +		return 1;
> >> +	default:
> >> +		return -EINVAL;
> >> +	}
> >> +}  
> >
> >And why does this function even exist and has kernel-doc comments when all it
> >does is assign a couple of values? And it gets called exactly once?
> >
> >Just merge its body into the call site. There you can reuse the switch-case there
> >too. No need for too much noodling around.  
> edac_dev_feat_init () function is updated with feature specific function call() etc in subsequent
> EDAC feature specific patches. Thus added a separate function.   
> >  
> >> diff --git a/include/linux/edac.h b/include/linux/edac.h index
> >> b4ee8961e623..b337254cf5b8 100644
> >> --- a/include/linux/edac.h
> >> +++ b/include/linux/edac.h
> >> @@ -661,4 +661,59 @@ static inline struct dimm_info
> >> *edac_get_dimm(struct mem_ctl_info *mci,
> >>
> >>  	return mci->dimms[index];
> >>  }
> >> +
> >> +/* EDAC device features */
> >> +
> >> +#define EDAC_FEAT_NAME_LEN	128
> >> +
> >> +/* RAS feature type */
> >> +enum edac_dev_feat {
> >> +	RAS_FEAT_SCRUB,
> >> +	RAS_FEAT_ECS,
> >> +	RAS_FEAT_PPR,
> >> +	RAS_FEAT_MAX  
> >
> >I still don't know what ECS or PPR is.  
> I will add comment/documentation here with a short explanation of features
> if that make sense?
> Each feature is described in the subsequent EDAC feature specific patches. 
Can you bring the enum entries in with those patches?
That way there is no reference to them before we have the information
on what they are.

J
> >
> >--
> >Regards/Gruss,
> >    Boris.
> >
> >https://people.kernel.org/tglx/notes-about-netiquette  
> 
> Thanks,
> Shiju
> 



^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 01/17] EDAC: Add support for EDAC device features control
  2024-09-16 10:50       ` Jonathan Cameron
@ 2024-09-16 16:16         ` Shiju Jose
  0 siblings, 0 replies; 39+ messages in thread
From: Shiju Jose @ 2024-09-16 16:16 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Borislav Petkov, linux-edac, linux-cxl, linux-acpi, linux-mm,
	linux-kernel, tony.luck, rafael, lenb, mchehab, dan.j.williams,
	dave, dave.jiang, alison.schofield, vishal.l.verma, ira.weiny,
	david, Vilas.Sridharan, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	jgroves, vsalve, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm

>-----Original Message-----
>From: Jonathan Cameron <jonathan.cameron@huawei.com>
>Sent: 16 September 2024 11:50
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: Borislav Petkov <bp@alien8.de>; linux-edac@vger.kernel.org; linux-
>cxl@vger.kernel.org; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-
>kernel@vger.kernel.org; tony.luck@intel.com; rafael@kernel.org;
>lenb@kernel.org; mchehab@kernel.org; dan.j.williams@intel.com;
>dave@stgolabs.net; dave.jiang@intel.com; alison.schofield@intel.com;
>vishal.l.verma@intel.com; ira.weiny@intel.com; david@redhat.com;
>Vilas.Sridharan@amd.com; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>rientjes@google.com; jiaqiyan@google.com; Jon.Grimm@amd.com;
>dave.hansen@linux.intel.com; naoya.horiguchi@nec.com;
>james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; jgroves@micron.com;
>vsalve@micron.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [PATCH v12 01/17] EDAC: Add support for EDAC device features
>control
>
>On Mon, 16 Sep 2024 10:21:58 +0100
>Shiju Jose <shiju.jose@huawei.com> wrote:
>
>> Thanks for reviewing.
>>
>> >-----Original Message-----
>> >From: Borislav Petkov <bp@alien8.de>
>> >Sent: 13 September 2024 17:41
>> >To: Shiju Jose <shiju.jose@huawei.com>
>> >Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>> >acpi@vger.kernel.org; linux-mm@kvack.org;
>> >linux-kernel@vger.kernel.org; tony.luck@intel.com; rafael@kernel.org;
>> >lenb@kernel.org; mchehab@kernel.org; dan.j.williams@intel.com;
>> >dave@stgolabs.net; Jonathan Cameron <jonathan.cameron@huawei.com>;
>> >dave.jiang@intel.com; alison.schofield@intel.com;
>> >vishal.l.verma@intel.com; ira.weiny@intel.com; david@redhat.com;
>> >Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com;
>> >rientjes@google.com; jiaqiyan@google.com; Jon.Grimm@amd.com;
>> >dave.hansen@linux.intel.com; naoya.horiguchi@nec.com;
>> >james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com;
>> >erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>> >mike.malvestuto@intel.com; gthelen@google.com;
>> >wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>> >wbs@os.amperecomputing.com; nifan.cxl@gmail.com; jgroves@micron.com;
>> >vsalve@micron.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
>> ><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>> >kangkang.shen@futurewei.com; wanghuiqiang
><wanghuiqiang@huawei.com>;
>> >Linuxarm <linuxarm@huawei.com>
>> >Subject: Re: [PATCH v12 01/17] EDAC: Add support for EDAC device
>> >features control
>> >
>> >On Wed, Sep 11, 2024 at 10:04:30AM +0100, shiju.jose@huawei.com wrote:
>> >> +/**
>> >> + * edac_dev_feature_init - Init a RAS feature
>> >> + * @parent: client device.
>> >> + * @dev_data: pointer to the edac_dev_data structure, which
>> >> +contains
>> >> + * client device specific info.
>> >> + * @feat: pointer to struct edac_dev_feature.
>> >> + * @attr_groups: pointer to attribute group's container.
>> >> + *
>> >> + * Returns number of scrub features attribute groups on success,
>> >
>> >Not "scrub" - this is an interface initializing a generic feature.
>> Will correct.
>> >
>> >> + * error otherwise.
>> >> + */
>> >> +static int edac_dev_feat_init(struct device *parent,
>> >> +			      struct edac_dev_data *dev_data,
>> >> +			      const struct edac_dev_feature *ras_feat,
>> >> +			      const struct attribute_group **attr_groups) {
>> >> +	int num;
>> >> +
>> >> +	switch (ras_feat->ft_type) {
>> >> +	case RAS_FEAT_SCRUB:
>> >> +		dev_data->scrub_ops = ras_feat->scrub_ops;
>> >> +		dev_data->private = ras_feat->ctx;
>> >> +		return 1;
>> >> +	case RAS_FEAT_ECS:
>> >> +		num = ras_feat->ecs_info.num_media_frus;
>> >> +		dev_data->ecs_ops = ras_feat->ecs_ops;
>> >> +		dev_data->private = ras_feat->ctx;
>> >> +		return num;
>> >> +	case RAS_FEAT_PPR:
>> >> +		dev_data->ppr_ops = ras_feat->ppr_ops;
>> >> +		dev_data->private = ras_feat->ctx;
>> >> +		return 1;
>> >> +	default:
>> >> +		return -EINVAL;
>> >> +	}
>> >> +}
>> >
>> >And why does this function even exist and has kernel-doc comments
>> >when all it does is assign a couple of values? And it gets called exactly once?
>> >
>> >Just merge its body into the call site. There you can reuse the
>> >switch-case there too. No need for too much noodling around.
>> edac_dev_feat_init () function is updated with feature specific function call()
>etc in subsequent
>> EDAC feature specific patches. Thus added a separate function.
>> >
>> >> diff --git a/include/linux/edac.h b/include/linux/edac.h index
>> >> b4ee8961e623..b337254cf5b8 100644
>> >> --- a/include/linux/edac.h
>> >> +++ b/include/linux/edac.h
>> >> @@ -661,4 +661,59 @@ static inline struct dimm_info
>> >> *edac_get_dimm(struct mem_ctl_info *mci,
>> >>
>> >>  	return mci->dimms[index];
>> >>  }
>> >> +
>> >> +/* EDAC device features */
>> >> +
>> >> +#define EDAC_FEAT_NAME_LEN	128
>> >> +
>> >> +/* RAS feature type */
>> >> +enum edac_dev_feat {
>> >> +	RAS_FEAT_SCRUB,
>> >> +	RAS_FEAT_ECS,
>> >> +	RAS_FEAT_PPR,
>> >> +	RAS_FEAT_MAX
>> >
>> >I still don't know what ECS or PPR is.
>> I will add comment/documentation here with a short explanation of
>> features if that make sense?
>> Each feature is described in the subsequent EDAC feature specific patches.
>Can you bring the enum entries in with those patches?
>That way there is no reference to them before we have the information on what
>they are.
Will do.
>
>J
>> >
>> >--
>> >Regards/Gruss,
>> >    Boris.
>> >
>> >https://people.kernel.org/tglx/notes-about-netiquette
>>
Thanks,
Shiju





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 07/17] cxl: Add Get Supported Features command for kernel usage
  2024-09-11  9:04 ` [PATCH v12 07/17] cxl: Add Get Supported Features command for kernel usage shiju.jose
@ 2024-09-23 23:33   ` Dave Jiang
  2024-09-25 11:18     ` Shiju Jose
  0 siblings, 1 reply; 39+ messages in thread
From: Dave Jiang @ 2024-09-23 23:33 UTC (permalink / raw)
  To: shiju.jose, linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, alison.schofield, vishal.l.verma, ira.weiny,
	david, Vilas.Sridharan, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	jgroves, vsalve, tanxiaofei, prime.zeng, roberto.sassu,
	kangkang.shen, wanghuiqiang, linuxarm



On 9/11/24 2:04 AM, shiju.jose@huawei.com wrote:
> From: Dave Jiang <dave.jiang@intel.com>
> 
> CXL spec r3.1 8.2.9.6.1 Get Supported Features (Opcode 0500h)
> The command retrieve the list of supported device-specific features
> (identified by UUID) and general information about each Feature.
> 
> The driver will retrieve the feature entries in order to make checks and
> provide information for the Get Feature and Set Feature command. One of
> the main piece of information retrieved are the effects a Set Feature
> command would have for a particular feature.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  drivers/cxl/core/mbox.c      | 175 +++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h         |  51 ++++++++++
>  drivers/cxl/pci.c            |   4 +
>  include/uapi/linux/cxl_mem.h |   1 +
>  4 files changed, 231 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index fa1ee495a4e3..fe965ec5802f 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -67,6 +67,7 @@ static struct cxl_mem_command cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>  	CXL_CMD(SET_SHUTDOWN_STATE, 0x1, 0, 0),
>  	CXL_CMD(GET_SCAN_MEDIA_CAPS, 0x10, 0x4, 0),
>  	CXL_CMD(GET_TIMESTAMP, 0, 0x8, 0),
> +	CXL_CMD(GET_SUPPORTED_FEATURES, 0x8, CXL_VARIABLE_PAYLOAD, 0),
>  };
>  
>  /*
> @@ -790,6 +791,180 @@ static const uuid_t log_uuid[] = {
>  	[VENDOR_DEBUG_UUID] = DEFINE_CXL_VENDOR_DEBUG_UUID,
>  };
>  
> +static void cxl_free_features(void *features)
> +{
> +	kvfree(features);
> +}
> +
> +static int cxl_get_supported_features_count(struct cxl_dev_state *cxlds)
> +{
> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
> +	struct cxl_mbox_get_sup_feats_out mbox_out;
> +	struct cxl_mbox_get_sup_feats_in mbox_in;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	int rc;
> +
> +	memset(&mbox_in, 0, sizeof(mbox_in));
> +	mbox_in.count = sizeof(mbox_out);
> +	memset(&mbox_out, 0, sizeof(mbox_out));
> +	mbox_cmd = (struct cxl_mbox_cmd) {
> +		.opcode = CXL_MBOX_OP_GET_SUPPORTED_FEATURES,
> +		.size_in = sizeof(mbox_in),
> +		.payload_in = &mbox_in,
> +		.size_out = sizeof(mbox_out),
> +		.payload_out = &mbox_out,
> +		.min_out = sizeof(mbox_out),
> +	};
> +	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
> +	if (rc < 0)
> +		return rc;
> +
> +	cxlds->num_features = le16_to_cpu(mbox_out.supported_feats);
> +	if (!cxlds->num_features)
> +		return -ENOENT;
> +
> +	return 0;
> +}
> +
> +int cxl_get_supported_features(struct cxl_dev_state *cxlds)
> +{
> +	int remain_feats, max_size, max_feats, start, rc;
> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
> +	int feat_size = sizeof(struct cxl_feat_entry);
> +	struct cxl_mbox_get_sup_feats_out *mbox_out;
> +	struct cxl_mbox_get_sup_feats_in mbox_in;
> +	int hdr_size = sizeof(*mbox_out);
> +	struct cxl_mbox_cmd mbox_cmd;
> +	struct cxl_mem_command *cmd;
> +	void *ptr;
> +
> +	/* Get supported features is optional, need to check */
> +	cmd = cxl_mem_find_command(CXL_MBOX_OP_GET_SUPPORTED_FEATURES);
> +	if (!cmd)
> +		return -EOPNOTSUPP;
> +	if (!test_bit(cmd->info.id, cxl_mbox->enabled_cmds))
> +		return -EOPNOTSUPP;
> +
> +	rc = cxl_get_supported_features_count(cxlds);
> +	if (rc)
> +		return rc;
> +
> +	struct cxl_feat_entry *entries __free(kvfree) =
> +		kvmalloc(cxlds->num_features * feat_size, GFP_KERNEL);
> +
> +	if (!entries)
> +		return -ENOMEM;
> +
> +	cxlds->entries = no_free_ptr(entries);
> +	rc = devm_add_action_or_reset(cxl_mbox->host, cxl_free_features,
> +				      cxlds->entries);
> +	if (rc)
> +		return rc;
> +
> +	max_size = cxl_mbox->payload_size - hdr_size;
> +	/* max feat entries that can fit in mailbox max payload size */
> +	max_feats = max_size / feat_size;
> +	ptr = &cxlds->entries[0];
> +
> +	mbox_out = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
> +	if (!mbox_out)
> +		return -ENOMEM;
> +
> +	start = 0;
> +	remain_feats = cxlds->num_features;
> +	do {
> +		int retrieved, alloc_size, copy_feats;
> +
> +		if (remain_feats > max_feats) {
> +			alloc_size = sizeof(*mbox_out) + max_feats * feat_size;
> +			remain_feats = remain_feats - max_feats;
> +			copy_feats = max_feats;
> +		} else {
> +			alloc_size = sizeof(*mbox_out) + remain_feats * feat_size;
> +			copy_feats = remain_feats;
> +			remain_feats = 0;
> +		}
> +
> +		memset(&mbox_in, 0, sizeof(mbox_in));
> +		mbox_in.count = alloc_size;
> +		mbox_in.start_idx = start;
> +		memset(mbox_out, 0, alloc_size);
> +		mbox_cmd = (struct cxl_mbox_cmd) {
> +			.opcode = CXL_MBOX_OP_GET_SUPPORTED_FEATURES,
> +			.size_in = sizeof(mbox_in),
> +			.payload_in = &mbox_in,
> +			.size_out = alloc_size,
> +			.payload_out = mbox_out,
> +			.min_out = hdr_size,
> +		};
> +		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
> +		if (rc < 0) {
> +			kfree(mbox_out);
> +			return rc;
> +		}
> +		if (mbox_cmd.size_out <= hdr_size) {
> +			rc = -ENXIO;
> +			goto err;
> +		}
> +
> +		/*
> +		 * Make sure retrieved out buffer is multiple of feature
> +		 * entries.
> +		 */
> +		retrieved = mbox_cmd.size_out - hdr_size;
> +		if (retrieved % feat_size) {
> +			rc = -ENXIO;
> +			goto err;
> +		}
> +
> +		/*
> +		 * If the reported output entries * defined entry size !=
> +		 * retrieved output bytes, then the output package is incorrect.
> +		 */
> +		if (mbox_out->num_entries * feat_size != retrieved) {
> +			rc = -ENXIO;
> +			goto err;
> +		}
> +
> +		memcpy(ptr, mbox_out->ents, retrieved);
> +		ptr += retrieved;
> +		/*
> +		 * If the number of output entries is less than expected, add the
> +		 * remaining entries to the next batch.
> +		 */
> +		remain_feats += copy_feats - mbox_out->num_entries;
> +		start += mbox_out->num_entries;
> +	} while (remain_feats);
> +
> +	kfree(mbox_out);
> +	return 0;
> +
> +err:
> +	kfree(mbox_out);
> +	cxlds->num_features = 0;
> +	return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
> +
> +int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t *feat_uuid,
> +				    struct cxl_feat_entry *feat_entry_out)
> +{
> +	struct cxl_feat_entry *feat_entry;
> +	int count;
> +
> +	/* Check CXL dev supports the feature */
> +	feat_entry = &cxlds->entries[0];
> +	for (count = 0; count < cxlds->num_features; count++, feat_entry++) {
> +		if (uuid_equal(&feat_entry->uuid, feat_uuid)) {
> +			memcpy(feat_entry_out, feat_entry, sizeof(*feat_entry_out));
> +			return 0;
> +		}
> +	}
> +
> +	return -EOPNOTSUPP;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_supported_feature_entry, CXL);
> +
>  /**
>   * cxl_enumerate_cmds() - Enumerate commands for a device.
>   * @mds: The driver data for the operation
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index d7c6ffe2a884..5d149e64c247 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -389,6 +389,8 @@ struct cxl_dpa_perf {
>   * @ram_res: Active Volatile memory capacity configuration
>   * @serial: PCIe Device Serial Number
>   * @type: Generic Memory Class device or Vendor Specific Memory device
> + * @features: number of supported features
> + * @entries: list of supported feature entries.
>   */
>  struct cxl_dev_state {
>  	struct device *dev;
> @@ -404,6 +406,8 @@ struct cxl_dev_state {
>  	u64 serial;
>  	enum cxl_devtype type;
>  	struct cxl_mailbox cxl_mbox;
> +	int num_features;
> +	struct cxl_feat_entry *entries;

Hi Shiju,
Going back and refactoring the fwctl code, I think this needs to stay with cxl_mailbox. Otherwise it makes it really hard to verify the feature effects for userspace. Preferably I don't think we want to expose 'struct cxl_dev_state' to fwctl if we don't need to.

DJ

>  };
>  
>  /**
> @@ -482,6 +486,7 @@ enum cxl_opcode {
>  	CXL_MBOX_OP_GET_LOG_CAPS	= 0x0402,
>  	CXL_MBOX_OP_CLEAR_LOG           = 0x0403,
>  	CXL_MBOX_OP_GET_SUP_LOG_SUBLIST = 0x0405,
> +	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
>  	CXL_MBOX_OP_IDENTIFY		= 0x4000,
>  	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
>  	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
> @@ -765,6 +770,48 @@ enum {
>  	CXL_PMEM_SEC_PASS_USER,
>  };
>  
> +/* Get Supported Features (0x500h) CXL r3.1 8.2.9.6.1 */
> +struct cxl_mbox_get_sup_feats_in {
> +	__le32 count;
> +	__le16 start_idx;
> +	u8 reserved[2];
> +} __packed;
> +
> +/* Supported Feature Entry : Payload out attribute flags */
> +#define CXL_FEAT_ENTRY_FLAG_CHANGABLE	BIT(0)
> +#define CXL_FEAT_ENTRY_FLAG_DEEPEST_RESET_PERSISTENCE_MASK	GENMASK(3, 1)
> +#define CXL_FEAT_ENTRY_FLAG_PERSIST_ACROSS_FIRMWARE_UPDATE	BIT(4)
> +#define CXL_FEAT_ENTRY_FLAG_SUPPORT_DEFAULT_SELECTION	BIT(5)
> +#define CXL_FEAT_ENTRY_FLAG_SUPPORT_SAVED_SELECTION	BIT(6)
> +
> +enum cxl_feat_attr_value_persistence {
> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_NONE,
> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_CXL_RESET,
> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_HOT_RESET,
> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_WARM_RESET,
> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_COLD_RESET,
> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_MAX
> +};
> +
> +struct cxl_feat_entry {
> +	uuid_t uuid;
> +	__le16 id;
> +	__le16 get_feat_size;
> +	__le16 set_feat_size;
> +	__le32 attr_flags;
> +	u8 get_feat_ver;
> +	u8 set_feat_ver;
> +	__le16 set_effects;
> +	u8 reserved[18];
> +} __packed;
> +
> +struct cxl_mbox_get_sup_feats_out {
> +	__le16 num_entries;
> +	__le16 supported_feats;
> +	u8 reserved[4];
> +	struct cxl_feat_entry ents[] __counted_by(le32_to_cpu(supported_feats));
> +} __packed;
> +
>  int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
>  			  struct cxl_mbox_cmd *cmd);
>  int cxl_dev_state_identify(struct cxl_memdev_state *mds);
> @@ -824,4 +871,8 @@ struct cxl_hdm {
>  struct seq_file;
>  struct dentry *cxl_debugfs_create_dir(const char *dir);
>  void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds);
> +
> +int cxl_get_supported_features(struct cxl_dev_state *cxlds);
> +int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t *feat_uuid,
> +				    struct cxl_feat_entry *feat_entry_out);
>  #endif /* __CXL_MEM_H__ */
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 3c73de475bf3..cec88e3a1754 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -872,6 +872,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (rc)
>  		return rc;
>  
> +	rc = cxl_get_supported_features(cxlds);
> +	if (rc)
> +		dev_dbg(&pdev->dev, "No features enumerated.\n");
> +
>  	rc = cxl_set_timestamp(mds);
>  	if (rc)
>  		return rc;
> diff --git a/include/uapi/linux/cxl_mem.h b/include/uapi/linux/cxl_mem.h
> index c6c0fe27495d..bd2535962f70 100644
> --- a/include/uapi/linux/cxl_mem.h
> +++ b/include/uapi/linux/cxl_mem.h
> @@ -50,6 +50,7 @@
>  	___C(GET_LOG_CAPS, "Get Log Capabilities"),			  \
>  	___C(CLEAR_LOG, "Clear Log"),					  \
>  	___C(GET_SUP_LOG_SUBLIST, "Get Supported Logs Sub-List"),	  \
> +	___C(GET_SUPPORTED_FEATURES, "Get Supported Features"),		  \
>  	___C(MAX, "invalid / last command")
>  
>  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a



^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 07/17] cxl: Add Get Supported Features command for kernel usage
  2024-09-23 23:33   ` Dave Jiang
@ 2024-09-25 11:18     ` Shiju Jose
  0 siblings, 0 replies; 39+ messages in thread
From: Shiju Jose @ 2024-09-25 11:18 UTC (permalink / raw)
  To: Dave Jiang, linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel
  Cc: bp, tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	Jonathan Cameron, alison.schofield, vishal.l.verma, ira.weiny,
	david, Vilas.Sridharan, leo.duran, Yazen.Ghannam, rientjes,
	jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi, james.morse,
	jthoughton, somasundaram.a, erdemaktas, pgonda, duenwen,
	mike.malvestuto, gthelen, wschwartz, dferguson, wbs, nifan.cxl,
	jgroves, vsalve, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm



>-----Original Message-----
>From: Dave Jiang <dave.jiang@intel.com>
>Sent: 24 September 2024 00:34
>To: Shiju Jose <shiju.jose@huawei.com>; linux-edac@vger.kernel.org; linux-
>cxl@vger.kernel.org; linux-acpi@vger.kernel.org; linux-mm@kvack.org; linux-
>kernel@vger.kernel.org
>Cc: bp@alien8.de; tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; alison.schofield@intel.com;
>vishal.l.verma@intel.com; ira.weiny@intel.com; david@redhat.com;
>Vilas.Sridharan@amd.com; leo.duran@amd.com; Yazen.Ghannam@amd.com;
>rientjes@google.com; jiaqiyan@google.com; Jon.Grimm@amd.com;
>dave.hansen@linux.intel.com; naoya.horiguchi@nec.com;
>james.morse@arm.com; jthoughton@google.com; somasundaram.a@hpe.com;
>erdemaktas@google.com; pgonda@google.com; duenwen@google.com;
>mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; jgroves@micron.com;
>vsalve@micron.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [PATCH v12 07/17] cxl: Add Get Supported Features command for
>kernel usage
>
>
>
>On 9/11/24 2:04 AM, shiju.jose@huawei.com wrote:
>> From: Dave Jiang <dave.jiang@intel.com>
>>
>> CXL spec r3.1 8.2.9.6.1 Get Supported Features (Opcode 0500h) The
>> command retrieve the list of supported device-specific features
>> (identified by UUID) and general information about each Feature.
>>
>> The driver will retrieve the feature entries in order to make checks
>> and provide information for the Get Feature and Set Feature command.
>> One of the main piece of information retrieved are the effects a Set
>> Feature command would have for a particular feature.
>>
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  drivers/cxl/core/mbox.c      | 175 +++++++++++++++++++++++++++++++++++
>>  drivers/cxl/cxlmem.h         |  51 ++++++++++
>>  drivers/cxl/pci.c            |   4 +
>>  include/uapi/linux/cxl_mem.h |   1 +
>>  4 files changed, 231 insertions(+)
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index
>> fa1ee495a4e3..fe965ec5802f 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -67,6 +67,7 @@ static struct cxl_mem_command
>cxl_mem_commands[CXL_MEM_COMMAND_ID_MAX] = {
>>  	CXL_CMD(SET_SHUTDOWN_STATE, 0x1, 0, 0),
>>  	CXL_CMD(GET_SCAN_MEDIA_CAPS, 0x10, 0x4, 0),
>>  	CXL_CMD(GET_TIMESTAMP, 0, 0x8, 0),
>> +	CXL_CMD(GET_SUPPORTED_FEATURES, 0x8, CXL_VARIABLE_PAYLOAD,
>0),
>>  };
>>
>>  /*
>> @@ -790,6 +791,180 @@ static const uuid_t log_uuid[] = {
>>  	[VENDOR_DEBUG_UUID] = DEFINE_CXL_VENDOR_DEBUG_UUID,  };
>>
>> +static void cxl_free_features(void *features) {
>> +	kvfree(features);
>> +}
>> +
>> +static int cxl_get_supported_features_count(struct cxl_dev_state
>> +*cxlds) {
>> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>> +	struct cxl_mbox_get_sup_feats_out mbox_out;
>> +	struct cxl_mbox_get_sup_feats_in mbox_in;
>> +	struct cxl_mbox_cmd mbox_cmd;
>> +	int rc;
>> +
>> +	memset(&mbox_in, 0, sizeof(mbox_in));
>> +	mbox_in.count = sizeof(mbox_out);
>> +	memset(&mbox_out, 0, sizeof(mbox_out));
>> +	mbox_cmd = (struct cxl_mbox_cmd) {
>> +		.opcode = CXL_MBOX_OP_GET_SUPPORTED_FEATURES,
>> +		.size_in = sizeof(mbox_in),
>> +		.payload_in = &mbox_in,
>> +		.size_out = sizeof(mbox_out),
>> +		.payload_out = &mbox_out,
>> +		.min_out = sizeof(mbox_out),
>> +	};
>> +	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
>> +	if (rc < 0)
>> +		return rc;
>> +
>> +	cxlds->num_features = le16_to_cpu(mbox_out.supported_feats);
>> +	if (!cxlds->num_features)
>> +		return -ENOENT;
>> +
>> +	return 0;
>> +}
>> +
>> +int cxl_get_supported_features(struct cxl_dev_state *cxlds) {
>> +	int remain_feats, max_size, max_feats, start, rc;
>> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
>> +	int feat_size = sizeof(struct cxl_feat_entry);
>> +	struct cxl_mbox_get_sup_feats_out *mbox_out;
>> +	struct cxl_mbox_get_sup_feats_in mbox_in;
>> +	int hdr_size = sizeof(*mbox_out);
>> +	struct cxl_mbox_cmd mbox_cmd;
>> +	struct cxl_mem_command *cmd;
>> +	void *ptr;
>> +
>> +	/* Get supported features is optional, need to check */
>> +	cmd =
>cxl_mem_find_command(CXL_MBOX_OP_GET_SUPPORTED_FEATURES);
>> +	if (!cmd)
>> +		return -EOPNOTSUPP;
>> +	if (!test_bit(cmd->info.id, cxl_mbox->enabled_cmds))
>> +		return -EOPNOTSUPP;
>> +
>> +	rc = cxl_get_supported_features_count(cxlds);
>> +	if (rc)
>> +		return rc;
>> +
>> +	struct cxl_feat_entry *entries __free(kvfree) =
>> +		kvmalloc(cxlds->num_features * feat_size, GFP_KERNEL);
>> +
>> +	if (!entries)
>> +		return -ENOMEM;
>> +
>> +	cxlds->entries = no_free_ptr(entries);
>> +	rc = devm_add_action_or_reset(cxl_mbox->host, cxl_free_features,
>> +				      cxlds->entries);
>> +	if (rc)
>> +		return rc;
>> +
>> +	max_size = cxl_mbox->payload_size - hdr_size;
>> +	/* max feat entries that can fit in mailbox max payload size */
>> +	max_feats = max_size / feat_size;
>> +	ptr = &cxlds->entries[0];
>> +
>> +	mbox_out = kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
>> +	if (!mbox_out)
>> +		return -ENOMEM;
>> +
>> +	start = 0;
>> +	remain_feats = cxlds->num_features;
>> +	do {
>> +		int retrieved, alloc_size, copy_feats;
>> +
>> +		if (remain_feats > max_feats) {
>> +			alloc_size = sizeof(*mbox_out) + max_feats * feat_size;
>> +			remain_feats = remain_feats - max_feats;
>> +			copy_feats = max_feats;
>> +		} else {
>> +			alloc_size = sizeof(*mbox_out) + remain_feats *
>feat_size;
>> +			copy_feats = remain_feats;
>> +			remain_feats = 0;
>> +		}
>> +
>> +		memset(&mbox_in, 0, sizeof(mbox_in));
>> +		mbox_in.count = alloc_size;
>> +		mbox_in.start_idx = start;
>> +		memset(mbox_out, 0, alloc_size);
>> +		mbox_cmd = (struct cxl_mbox_cmd) {
>> +			.opcode = CXL_MBOX_OP_GET_SUPPORTED_FEATURES,
>> +			.size_in = sizeof(mbox_in),
>> +			.payload_in = &mbox_in,
>> +			.size_out = alloc_size,
>> +			.payload_out = mbox_out,
>> +			.min_out = hdr_size,
>> +		};
>> +		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
>> +		if (rc < 0) {
>> +			kfree(mbox_out);
>> +			return rc;
>> +		}
>> +		if (mbox_cmd.size_out <= hdr_size) {
>> +			rc = -ENXIO;
>> +			goto err;
>> +		}
>> +
>> +		/*
>> +		 * Make sure retrieved out buffer is multiple of feature
>> +		 * entries.
>> +		 */
>> +		retrieved = mbox_cmd.size_out - hdr_size;
>> +		if (retrieved % feat_size) {
>> +			rc = -ENXIO;
>> +			goto err;
>> +		}
>> +
>> +		/*
>> +		 * If the reported output entries * defined entry size !=
>> +		 * retrieved output bytes, then the output package is incorrect.
>> +		 */
>> +		if (mbox_out->num_entries * feat_size != retrieved) {
>> +			rc = -ENXIO;
>> +			goto err;
>> +		}
>> +
>> +		memcpy(ptr, mbox_out->ents, retrieved);
>> +		ptr += retrieved;
>> +		/*
>> +		 * If the number of output entries is less than expected, add the
>> +		 * remaining entries to the next batch.
>> +		 */
>> +		remain_feats += copy_feats - mbox_out->num_entries;
>> +		start += mbox_out->num_entries;
>> +	} while (remain_feats);
>> +
>> +	kfree(mbox_out);
>> +	return 0;
>> +
>> +err:
>> +	kfree(mbox_out);
>> +	cxlds->num_features = 0;
>> +	return rc;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_supported_features, CXL);
>> +
>> +int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t
>*feat_uuid,
>> +				    struct cxl_feat_entry *feat_entry_out) {
>> +	struct cxl_feat_entry *feat_entry;
>> +	int count;
>> +
>> +	/* Check CXL dev supports the feature */
>> +	feat_entry = &cxlds->entries[0];
>> +	for (count = 0; count < cxlds->num_features; count++, feat_entry++) {
>> +		if (uuid_equal(&feat_entry->uuid, feat_uuid)) {
>> +			memcpy(feat_entry_out, feat_entry,
>sizeof(*feat_entry_out));
>> +			return 0;
>> +		}
>> +	}
>> +
>> +	return -EOPNOTSUPP;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_supported_feature_entry, CXL);
>> +
>>  /**
>>   * cxl_enumerate_cmds() - Enumerate commands for a device.
>>   * @mds: The driver data for the operation diff --git
>> a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index
>> d7c6ffe2a884..5d149e64c247 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -389,6 +389,8 @@ struct cxl_dpa_perf {
>>   * @ram_res: Active Volatile memory capacity configuration
>>   * @serial: PCIe Device Serial Number
>>   * @type: Generic Memory Class device or Vendor Specific Memory
>> device
>> + * @features: number of supported features
>> + * @entries: list of supported feature entries.
>>   */
>>  struct cxl_dev_state {
>>  	struct device *dev;
>> @@ -404,6 +406,8 @@ struct cxl_dev_state {
>>  	u64 serial;
>>  	enum cxl_devtype type;
>>  	struct cxl_mailbox cxl_mbox;
>> +	int num_features;
>> +	struct cxl_feat_entry *entries;
>
>Hi Shiju,
>Going back and refactoring the fwctl code, I think this needs to stay with
>cxl_mailbox. Otherwise it makes it really hard to verify the feature effects for
>userspace. Preferably I don't think we want to expose 'struct cxl_dev_state' to
>fwctl if we don't need to.
>
>DJ

Hi Dave,
Ok. Revert to stay with struct cxl_mailbox and tested. Changes will be in the next version.

Thanks,
Shiju
>
>>  };
>>
>>  /**
>> @@ -482,6 +486,7 @@ enum cxl_opcode {
>>  	CXL_MBOX_OP_GET_LOG_CAPS	= 0x0402,
>>  	CXL_MBOX_OP_CLEAR_LOG           = 0x0403,
>>  	CXL_MBOX_OP_GET_SUP_LOG_SUBLIST = 0x0405,
>> +	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
>>  	CXL_MBOX_OP_IDENTIFY		= 0x4000,
>>  	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
>>  	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
>> @@ -765,6 +770,48 @@ enum {
>>  	CXL_PMEM_SEC_PASS_USER,
>>  };
>>
>> +/* Get Supported Features (0x500h) CXL r3.1 8.2.9.6.1 */ struct
>> +cxl_mbox_get_sup_feats_in {
>> +	__le32 count;
>> +	__le16 start_idx;
>> +	u8 reserved[2];
>> +} __packed;
>> +
>> +/* Supported Feature Entry : Payload out attribute flags */
>> +#define CXL_FEAT_ENTRY_FLAG_CHANGABLE	BIT(0)
>> +#define CXL_FEAT_ENTRY_FLAG_DEEPEST_RESET_PERSISTENCE_MASK
>	GENMASK(3, 1)
>> +#define CXL_FEAT_ENTRY_FLAG_PERSIST_ACROSS_FIRMWARE_UPDATE
>	BIT(4)
>> +#define CXL_FEAT_ENTRY_FLAG_SUPPORT_DEFAULT_SELECTION	BIT(5)
>> +#define CXL_FEAT_ENTRY_FLAG_SUPPORT_SAVED_SELECTION	BIT(6)
>> +
>> +enum cxl_feat_attr_value_persistence {
>> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_NONE,
>> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_CXL_RESET,
>> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_HOT_RESET,
>> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_WARM_RESET,
>> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_COLD_RESET,
>> +	CXL_FEAT_ATTR_VALUE_PERSISTENCE_MAX
>> +};
>> +
>> +struct cxl_feat_entry {
>> +	uuid_t uuid;
>> +	__le16 id;
>> +	__le16 get_feat_size;
>> +	__le16 set_feat_size;
>> +	__le32 attr_flags;
>> +	u8 get_feat_ver;
>> +	u8 set_feat_ver;
>> +	__le16 set_effects;
>> +	u8 reserved[18];
>> +} __packed;
>> +
>> +struct cxl_mbox_get_sup_feats_out {
>> +	__le16 num_entries;
>> +	__le16 supported_feats;
>> +	u8 reserved[4];
>> +	struct cxl_feat_entry ents[]
>> +__counted_by(le32_to_cpu(supported_feats));
>> +} __packed;
>> +
>>  int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
>>  			  struct cxl_mbox_cmd *cmd);
>>  int cxl_dev_state_identify(struct cxl_memdev_state *mds); @@ -824,4
>> +871,8 @@ struct cxl_hdm {  struct seq_file;  struct dentry
>> *cxl_debugfs_create_dir(const char *dir);  void cxl_dpa_debug(struct
>> seq_file *file, struct cxl_dev_state *cxlds);
>> +
>> +int cxl_get_supported_features(struct cxl_dev_state *cxlds); int
>> +cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t
>*feat_uuid,
>> +				    struct cxl_feat_entry *feat_entry_out);
>>  #endif /* __CXL_MEM_H__ */
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c index
>> 3c73de475bf3..cec88e3a1754 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -872,6 +872,10 @@ static int cxl_pci_probe(struct pci_dev *pdev, const
>struct pci_device_id *id)
>>  	if (rc)
>>  		return rc;
>>
>> +	rc = cxl_get_supported_features(cxlds);
>> +	if (rc)
>> +		dev_dbg(&pdev->dev, "No features enumerated.\n");
>> +
>>  	rc = cxl_set_timestamp(mds);
>>  	if (rc)
>>  		return rc;
>> diff --git a/include/uapi/linux/cxl_mem.h
>> b/include/uapi/linux/cxl_mem.h index c6c0fe27495d..bd2535962f70 100644
>> --- a/include/uapi/linux/cxl_mem.h
>> +++ b/include/uapi/linux/cxl_mem.h
>> @@ -50,6 +50,7 @@
>>  	___C(GET_LOG_CAPS, "Get Log Capabilities"),			  \
>>  	___C(CLEAR_LOG, "Clear Log"),					  \
>>  	___C(GET_SUP_LOG_SUBLIST, "Get Supported Logs Sub-List"),	  \
>> +	___C(GET_SUPPORTED_FEATURES, "Get Supported Features"),
>	  \
>>  	___C(MAX, "invalid / last command")
>>
>>  #define ___C(a, b) CXL_MEM_COMMAND_ID_##a
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 02/17] EDAC: Add EDAC scrub control driver
  2024-09-11  9:04 ` [PATCH v12 02/17] EDAC: Add EDAC scrub control driver shiju.jose
  2024-09-13 17:25   ` Borislav Petkov
@ 2024-09-26 23:04   ` Fan Ni
  2024-09-27 11:17     ` Shiju Jose
  1 sibling, 1 reply; 39+ messages in thread
From: Fan Ni @ 2024-09-26 23:04 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm

On Wed, Sep 11, 2024 at 10:04:31AM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add generic EDAC scrub control driver supports configuring the memory scrubbers
> in the system. The device with scrub feature, get the scrub descriptor from the
> EDAC scrub and registers with the EDAC RAS feature driver, which adds the sysfs
> scrub control interface. The scrub control attributes for a scrub instance are
> available to userspace in /sys/bus/edac/devices/<dev-name>/scrub*/.
> 
> Generic EDAC scrub driver and the common sysfs scrub interface promotes
> unambiguous access from the userspace irrespective of the underlying scrub
> devices.
> 
> The sysfs scrub attribute nodes would be present only if the client driver
> has implemented the corresponding attribute callback function and pass in ops
> to the EDAC RAS feature driver during registration.
> 
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  Documentation/ABI/testing/sysfs-edac-scrub |  69 ++++
>  drivers/edac/Makefile                      |   1 +
>  drivers/edac/edac_device.c                 |   6 +-
>  drivers/edac/edac_scrub.c                  | 377 +++++++++++++++++++++
>  include/linux/edac.h                       |  30 ++
>  5 files changed, 482 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-edac-scrub
>  create mode 100755 drivers/edac/edac_scrub.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-edac-scrub b/Documentation/ABI/testing/sysfs-edac-scrub
> new file mode 100644
> index 000000000000..f465cc91423f
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-edac-scrub
> @@ -0,0 +1,69 @@
> +What:		/sys/bus/edac/devices/<dev-name>/scrub*

Based on the code below, we can only have scrub0, scrub1, etc.
So should we use scrubX instead of scrub* here.

The same for below.

Fan

> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		The sysfs EDAC bus devices /<dev-name>/scrub* subdirectory
> +		belongs to an instance of memory scrub control feature,
> +		where <dev-name> directory corresponds to a device/memory
> +		region registered with the EDAC scrub driver and thus
> +		registered with the generic EDAC RAS driver.
> +		The sysfs scrub attr nodes would be present only if the
> +		client driver has implemented the corresponding attr
> +		callback function and pass in ops to the EDAC RAS feature
> +		driver during registration.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/scrub*/addr_range_base
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) The base of the address range of the memory region
> +		to be scrubbed (on-demand scrubbing).
> +
> +What:		/sys/bus/edac/devices/<dev-name>/scrub*/addr_range_size
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) The size of the address range of the memory region
> +		to be scrubbed (on-demand scrubbing).
> +
> +What:		/sys/bus/edac/devices/<dev-name>/scrub*/enable_background
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Start/Stop background(patrol) scrubbing if supported.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/scrub*/enable_on_demand
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) Start/Stop on-demand scrubbing the memory region
> +		if supported.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/scrub*/min_cycle_duration
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) Supported minimum scrub cycle duration in seconds
> +		by the memory scrubber.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/scrub*/max_cycle_duration
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) Supported maximum scrub cycle duration in seconds
> +		by the memory scrubber.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/scrub*/current_cycle_duration
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) The current scrub cycle duration in seconds and must be
> +		within the supported range by the memory scrubber.
> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
> index 4edfb83ffbee..fbf0e39ec678 100644
> --- a/drivers/edac/Makefile
> +++ b/drivers/edac/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
>  
>  edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
>  edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
> +edac_core-y	+= edac_scrub.o
>  
>  edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
>  
> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> index e4a5d010ea2d..6381896b6424 100644
> --- a/drivers/edac/edac_device.c
> +++ b/drivers/edac/edac_device.c
> @@ -608,12 +608,16 @@ static int edac_dev_feat_init(struct device *parent,
>  			      const struct edac_dev_feature *ras_feat,
>  			      const struct attribute_group **attr_groups)
>  {
> -	int num;
> +	int num, ret;
>  
>  	switch (ras_feat->ft_type) {
>  	case RAS_FEAT_SCRUB:
>  		dev_data->scrub_ops = ras_feat->scrub_ops;
>  		dev_data->private = ras_feat->ctx;
> +		ret = edac_scrub_get_desc(parent, attr_groups,
> +					  ras_feat->instance);
> +		if (ret)
> +			return ret;
>  		return 1;
>  	case RAS_FEAT_ECS:
>  		num = ras_feat->ecs_info.num_media_frus;
> diff --git a/drivers/edac/edac_scrub.c b/drivers/edac/edac_scrub.c
> new file mode 100755
> index 000000000000..3f8f37629acf
> --- /dev/null
> +++ b/drivers/edac/edac_scrub.c
> @@ -0,0 +1,377 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Generic EDAC scrub driver supports controlling the memory
> + * scrubbers in the system and the common sysfs scrub interface
> + * promotes unambiguous access from the userspace.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + */
> +
> +#define pr_fmt(fmt)     "EDAC SCRUB: " fmt
> +
> +#include <linux/edac.h>
> +
> +enum edac_scrub_attributes {
> +	SCRUB_ADDR_RANGE_BASE,
> +	SCRUB_ADDR_RANGE_SIZE,
> +	SCRUB_ENABLE_BACKGROUND,
> +	SCRUB_ENABLE_ON_DEMAND,
> +	SCRUB_MIN_CYCLE_DURATION,
> +	SCRUB_MAX_CYCLE_DURATION,
> +	SCRUB_CURRENT_CYCLE_DURATION,
> +	SCRUB_MAX_ATTRS
> +};
> +
> +struct edac_scrub_dev_attr {
> +	struct device_attribute dev_attr;
> +	u8 instance;
> +};
> +
> +struct edac_scrub_context {
> +	char name[EDAC_FEAT_NAME_LEN];
> +	struct edac_scrub_dev_attr scrub_dev_attr[SCRUB_MAX_ATTRS];
> +	struct attribute *scrub_attrs[SCRUB_MAX_ATTRS + 1];
> +	struct attribute_group group;
> +};
> +
> +#define to_scrub_dev_attr(_dev_attr)      \
> +		container_of(_dev_attr, struct edac_scrub_dev_attr, dev_attr)
> +
> +static ssize_t addr_range_base_show(struct device *ras_feat_dev,
> +				    struct device_attribute *attr,
> +				    char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u64 base, size;
> +	int ret;
> +
> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "0x%llx\n", base);
> +}
> +
> +static ssize_t addr_range_size_show(struct device *ras_feat_dev,
> +				    struct device_attribute *attr,
> +				    char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u64 base, size;
> +	int ret;
> +
> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "0x%llx\n", size);
> +}
> +
> +static ssize_t addr_range_base_store(struct device *ras_feat_dev,
> +				     struct device_attribute *attr,
> +				     const char *buf, size_t len)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u64 base, size;
> +	int ret;
> +
> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
> +	if (ret)
> +		return ret;
> +
> +	ret = kstrtou64(buf, 0, &base);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->write_range(ras_feat_dev->parent, ctx->scrub[inst].private, base, size);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t addr_range_size_store(struct device *ras_feat_dev,
> +				     struct device_attribute *attr,
> +				     const char *buf,
> +				     size_t len)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u64 base, size;
> +	int ret;
> +
> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private, &base, &size);
> +	if (ret)
> +		return ret;
> +
> +	ret = kstrtou64(buf, 0, &size);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->write_range(ras_feat_dev->parent, ctx->scrub[inst].private, base, size);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t enable_background_store(struct device *ras_feat_dev,
> +				       struct device_attribute *attr,
> +				       const char *buf, size_t len)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	bool enable;
> +	int ret;
> +
> +	ret = kstrtobool(buf, &enable);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->set_enabled_bg(ras_feat_dev->parent, ctx->scrub[inst].private, enable);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t enable_background_show(struct device *ras_feat_dev,
> +				      struct device_attribute *attr, char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	bool enable;
> +	int ret;
> +
> +	ret = ops->get_enabled_bg(ras_feat_dev->parent, ctx->scrub[inst].private, &enable);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%d\n", enable);
> +}
> +
> +static ssize_t enable_on_demand_show(struct device *ras_feat_dev,
> +				     struct device_attribute *attr, char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	bool enable;
> +	int ret;
> +
> +	ret = ops->get_enabled_od(ras_feat_dev->parent, ctx->scrub[inst].private, &enable);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%d\n", enable);
> +}
> +
> +static ssize_t enable_on_demand_store(struct device *ras_feat_dev,
> +				      struct device_attribute *attr,
> +				      const char *buf, size_t len)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	bool enable;
> +	int ret;
> +
> +	ret = kstrtobool(buf, &enable);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->set_enabled_od(ras_feat_dev->parent, ctx->scrub[inst].private, enable);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t min_cycle_duration_show(struct device *ras_feat_dev,
> +				       struct device_attribute *attr,
> +				       char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->min_cycle_read(ras_feat_dev->parent, ctx->scrub[inst].private, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t max_cycle_duration_show(struct device *ras_feat_dev,
> +				       struct device_attribute *attr,
> +				       char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->max_cycle_read(ras_feat_dev->parent, ctx->scrub[inst].private, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t current_cycle_duration_show(struct device *ras_feat_dev,
> +					   struct device_attribute *attr,
> +					   char *buf)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->cycle_duration_read(ras_feat_dev->parent, ctx->scrub[inst].private, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t current_cycle_duration_store(struct device *ras_feat_dev,
> +					    struct device_attribute *attr,
> +					    const char *buf, size_t len)
> +{
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +	long val;
> +	int ret;
> +
> +	ret = kstrtol(buf, 0, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->cycle_duration_write(ras_feat_dev->parent, ctx->scrub[inst].private, val);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static umode_t scrub_attr_visible(struct kobject *kobj,
> +				  struct attribute *a, int attr_id)
> +{
> +	struct device *ras_feat_dev = kobj_to_dev(kobj);
> +	struct device_attribute *dev_attr =
> +				container_of(a, struct device_attribute, attr);
> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(dev_attr))->instance;
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
> +
> +	switch (attr_id) {
> +	case SCRUB_ADDR_RANGE_BASE:
> +	case SCRUB_ADDR_RANGE_SIZE:
> +		if (ops->read_range && ops->write_range)
> +			return a->mode;
> +		if (ops->read_range)
> +			return 0444;
> +		return 0;
> +	case SCRUB_ENABLE_BACKGROUND:
> +		if (ops->get_enabled_bg && ops->set_enabled_bg)
> +			return a->mode;
> +		if (ops->get_enabled_bg)
> +			return 0444;
> +		return 0;
> +	case SCRUB_ENABLE_ON_DEMAND:
> +		if (ops->get_enabled_od && ops->set_enabled_od)
> +			return a->mode;
> +		if (ops->get_enabled_od)
> +			return 0444;
> +		return 0;
> +	case SCRUB_MIN_CYCLE_DURATION:
> +		return ops->min_cycle_read ? a->mode : 0;
> +	case SCRUB_MAX_CYCLE_DURATION:
> +		return ops->max_cycle_read ? a->mode : 0;
> +	case SCRUB_CURRENT_CYCLE_DURATION:
> +		if (ops->cycle_duration_read && ops->cycle_duration_write)
> +			return a->mode;
> +		if (ops->cycle_duration_read)
> +			return 0444;
> +		return 0;
> +	default:
> +		return 0;
> +	}
> +}
> +
> +#define EDAC_SCRUB_ATTR_RO(_name, _instance)       \
> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_RO(_name), \
> +				     .instance = _instance })
> +
> +#define EDAC_SCRUB_ATTR_WO(_name, _instance)       \
> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_WO(_name), \
> +				     .instance = _instance })
> +
> +#define EDAC_SCRUB_ATTR_RW(_name, _instance)       \
> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_RW(_name), \
> +				     .instance = _instance })
> +
> +static int scrub_create_desc(struct device *scrub_dev,
> +			     const struct attribute_group **attr_groups,
> +			     u8 instance)
> +{
> +	struct edac_scrub_context *scrub_ctx;
> +	struct attribute_group *group;
> +	int i;
> +
> +	scrub_ctx = devm_kzalloc(scrub_dev, sizeof(*scrub_ctx), GFP_KERNEL);
> +	if (!scrub_ctx)
> +		return -ENOMEM;
> +
> +	group = &scrub_ctx->group;
> +	scrub_ctx->scrub_dev_attr[0] = EDAC_SCRUB_ATTR_RW(addr_range_base, instance);
> +	scrub_ctx->scrub_dev_attr[1] = EDAC_SCRUB_ATTR_RW(addr_range_size, instance);
> +	scrub_ctx->scrub_dev_attr[2] = EDAC_SCRUB_ATTR_RW(enable_background, instance);
> +	scrub_ctx->scrub_dev_attr[3] = EDAC_SCRUB_ATTR_RW(enable_on_demand, instance);
> +	scrub_ctx->scrub_dev_attr[4] = EDAC_SCRUB_ATTR_RO(min_cycle_duration, instance);
> +	scrub_ctx->scrub_dev_attr[5] = EDAC_SCRUB_ATTR_RO(max_cycle_duration, instance);
> +	scrub_ctx->scrub_dev_attr[6] = EDAC_SCRUB_ATTR_RW(current_cycle_duration, instance);
> +	for (i = 0; i < SCRUB_MAX_ATTRS; i++)
> +		scrub_ctx->scrub_attrs[i] = &scrub_ctx->scrub_dev_attr[i].dev_attr.attr;
> +
> +	sprintf(scrub_ctx->name, "%s%d", "scrub", instance);
> +	group->name = scrub_ctx->name;
> +	group->attrs = scrub_ctx->scrub_attrs;
> +	group->is_visible  = scrub_attr_visible;
> +
> +	attr_groups[0] = group;
> +
> +	return 0;
> +}
> +
> +/**
> + * edac_scrub_get_desc - get EDAC scrub descriptors
> + * @scrub_dev: client device, with scrub support
> + * @attr_groups: pointer to attrribute group container
> + * @instance: device's scrub instance number.
> + *
> + * Returns 0 on success, error otherwise.
> + */
> +int edac_scrub_get_desc(struct device *scrub_dev,
> +			const struct attribute_group **attr_groups,
> +			u8 instance)
> +{
> +	if (!scrub_dev || !attr_groups)
> +		return -EINVAL;
> +
> +	return scrub_create_desc(scrub_dev, attr_groups, instance);
> +}
> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index b337254cf5b8..aae8262b9863 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -674,6 +674,36 @@ enum edac_dev_feat {
>  	RAS_FEAT_MAX
>  };
>  
> +/**
> + * struct scrub_ops - scrub device operations (all elements optional)
> + * @read_range: read base and offset of scrubbing range.
> + * @write_range: set the base and offset of the scrubbing range.
> + * @get_enabled_bg: check if currently performing background scrub.
> + * @set_enabled_bg: start or stop a bg-scrub.
> + * @get_enabled_od: check if currently performing on-demand scrub.
> + * @set_enabled_od: start or stop an on-demand scrub.
> + * @min_cycle_read: minimum supported scrub cycle duration in seconds.
> + * @max_cycle_read: maximum supported scrub cycle duration in seconds.
> + * @cycle_duration_read: get the scrub cycle duration in seconds.
> + * @cycle_duration_write: set the scrub cycle duration in seconds.
> + */
> +struct edac_scrub_ops {
> +	int (*read_range)(struct device *dev, void *drv_data, u64 *base, u64 *size);
> +	int (*write_range)(struct device *dev, void *drv_data, u64 base, u64 size);
> +	int (*get_enabled_bg)(struct device *dev, void *drv_data, bool *enable);
> +	int (*set_enabled_bg)(struct device *dev, void *drv_data, bool enable);
> +	int (*get_enabled_od)(struct device *dev, void *drv_data, bool *enable);
> +	int (*set_enabled_od)(struct device *dev, void *drv_data, bool enable);
> +	int (*min_cycle_read)(struct device *dev, void *drv_data,  u32 *min);
> +	int (*max_cycle_read)(struct device *dev, void *drv_data,  u32 *max);
> +	int (*cycle_duration_read)(struct device *dev, void *drv_data, u32 *cycle);
> +	int (*cycle_duration_write)(struct device *dev, void *drv_data, u32 cycle);
> +};
> +
> +int edac_scrub_get_desc(struct device *scrub_dev,
> +			const struct attribute_group **attr_groups,
> +			u8 instance);
> +
>  struct edac_ecs_ex_info {
>  	u16 num_media_frus;
>  };
> -- 
> 2.34.1
> 

-- 
Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 02/17] EDAC: Add EDAC scrub control driver
  2024-09-26 23:04   ` Fan Ni
@ 2024-09-27 11:17     ` Shiju Jose
  0 siblings, 0 replies; 39+ messages in thread
From: Shiju Jose @ 2024-09-27 11:17 UTC (permalink / raw)
  To: Fan Ni
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	Jonathan Cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	jgroves, vsalve, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm



>-----Original Message-----
>From: Fan Ni <nifan.cxl@gmail.com>
>Sent: 27 September 2024 00:05
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>bp@alien8.de; tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; jgroves@micron.com;
>vsalve@micron.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [PATCH v12 02/17] EDAC: Add EDAC scrub control driver
>
>On Wed, Sep 11, 2024 at 10:04:31AM +0100, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> Add generic EDAC scrub control driver supports configuring the memory
>> scrubbers in the system. The device with scrub feature, get the scrub
>> descriptor from the EDAC scrub and registers with the EDAC RAS feature
>> driver, which adds the sysfs scrub control interface. The scrub
>> control attributes for a scrub instance are available to userspace in
>/sys/bus/edac/devices/<dev-name>/scrub*/.
>>
>> Generic EDAC scrub driver and the common sysfs scrub interface
>> promotes unambiguous access from the userspace irrespective of the
>> underlying scrub devices.
>>
>> The sysfs scrub attribute nodes would be present only if the client
>> driver has implemented the corresponding attribute callback function
>> and pass in ops to the EDAC RAS feature driver during registration.
>>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  Documentation/ABI/testing/sysfs-edac-scrub |  69 ++++
>>  drivers/edac/Makefile                      |   1 +
>>  drivers/edac/edac_device.c                 |   6 +-
>>  drivers/edac/edac_scrub.c                  | 377 +++++++++++++++++++++
>>  include/linux/edac.h                       |  30 ++
>>  5 files changed, 482 insertions(+), 1 deletion(-)  create mode 100644
>> Documentation/ABI/testing/sysfs-edac-scrub
>>  create mode 100755 drivers/edac/edac_scrub.c
>>
>> diff --git a/Documentation/ABI/testing/sysfs-edac-scrub
>> b/Documentation/ABI/testing/sysfs-edac-scrub
>> new file mode 100644
>> index 000000000000..f465cc91423f
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-edac-scrub
>> @@ -0,0 +1,69 @@
>> +What:		/sys/bus/edac/devices/<dev-name>/scrub*
>
>Based on the code below, we can only have scrub0, scrub1, etc.
>So should we use scrubX instead of scrub* here.
>
>The same for below.
>
Thanks . Changed, in other patches as well.

>Fan
>
Thanks,
Shiju

>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		The sysfs EDAC bus devices /<dev-name>/scrub* subdirectory
>> +		belongs to an instance of memory scrub control feature,
>> +		where <dev-name> directory corresponds to a device/memory
>> +		region registered with the EDAC scrub driver and thus
>> +		registered with the generic EDAC RAS driver.
>> +		The sysfs scrub attr nodes would be present only if the
>> +		client driver has implemented the corresponding attr
>> +		callback function and pass in ops to the EDAC RAS feature
>> +		driver during registration.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrub*/addr_range_base
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) The base of the address range of the memory region
>> +		to be scrubbed (on-demand scrubbing).
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrub*/addr_range_size
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) The size of the address range of the memory region
>> +		to be scrubbed (on-demand scrubbing).
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrub*/enable_background
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Start/Stop background(patrol) scrubbing if supported.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrub*/enable_on_demand
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) Start/Stop on-demand scrubbing the memory region
>> +		if supported.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrub*/min_cycle_duration
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RO) Supported minimum scrub cycle duration in seconds
>> +		by the memory scrubber.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrub*/max_cycle_duration
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RO) Supported maximum scrub cycle duration in seconds
>> +		by the memory scrubber.
>> +
>> +What:		/sys/bus/edac/devices/<dev-
>name>/scrub*/current_cycle_duration
>> +Date:		Oct 2024
>> +KernelVersion:	6.12
>> +Contact:	linux-edac@vger.kernel.org
>> +Description:
>> +		(RW) The current scrub cycle duration in seconds and must be
>> +		within the supported range by the memory scrubber.
>> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile index
>> 4edfb83ffbee..fbf0e39ec678 100644
>> --- a/drivers/edac/Makefile
>> +++ b/drivers/edac/Makefile
>> @@ -10,6 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
>>
>>  edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
>>  edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
>> +edac_core-y	+= edac_scrub.o
>>
>>  edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
>>
>> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
>> index e4a5d010ea2d..6381896b6424 100644
>> --- a/drivers/edac/edac_device.c
>> +++ b/drivers/edac/edac_device.c
>> @@ -608,12 +608,16 @@ static int edac_dev_feat_init(struct device *parent,
>>  			      const struct edac_dev_feature *ras_feat,
>>  			      const struct attribute_group **attr_groups)  {
>> -	int num;
>> +	int num, ret;
>>
>>  	switch (ras_feat->ft_type) {
>>  	case RAS_FEAT_SCRUB:
>>  		dev_data->scrub_ops = ras_feat->scrub_ops;
>>  		dev_data->private = ras_feat->ctx;
>> +		ret = edac_scrub_get_desc(parent, attr_groups,
>> +					  ras_feat->instance);
>> +		if (ret)
>> +			return ret;
>>  		return 1;
>>  	case RAS_FEAT_ECS:
>>  		num = ras_feat->ecs_info.num_media_frus;
>> diff --git a/drivers/edac/edac_scrub.c b/drivers/edac/edac_scrub.c new
>> file mode 100755 index 000000000000..3f8f37629acf
>> --- /dev/null
>> +++ b/drivers/edac/edac_scrub.c
>> @@ -0,0 +1,377 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Generic EDAC scrub driver supports controlling the memory
>> + * scrubbers in the system and the common sysfs scrub interface
>> + * promotes unambiguous access from the userspace.
>> + *
>> + * Copyright (c) 2024 HiSilicon Limited.
>> + */
>> +
>> +#define pr_fmt(fmt)     "EDAC SCRUB: " fmt
>> +
>> +#include <linux/edac.h>
>> +
>> +enum edac_scrub_attributes {
>> +	SCRUB_ADDR_RANGE_BASE,
>> +	SCRUB_ADDR_RANGE_SIZE,
>> +	SCRUB_ENABLE_BACKGROUND,
>> +	SCRUB_ENABLE_ON_DEMAND,
>> +	SCRUB_MIN_CYCLE_DURATION,
>> +	SCRUB_MAX_CYCLE_DURATION,
>> +	SCRUB_CURRENT_CYCLE_DURATION,
>> +	SCRUB_MAX_ATTRS
>> +};
>> +
>> +struct edac_scrub_dev_attr {
>> +	struct device_attribute dev_attr;
>> +	u8 instance;
>> +};
>> +
>> +struct edac_scrub_context {
>> +	char name[EDAC_FEAT_NAME_LEN];
>> +	struct edac_scrub_dev_attr scrub_dev_attr[SCRUB_MAX_ATTRS];
>> +	struct attribute *scrub_attrs[SCRUB_MAX_ATTRS + 1];
>> +	struct attribute_group group;
>> +};
>> +
>> +#define to_scrub_dev_attr(_dev_attr)      \
>> +		container_of(_dev_attr, struct edac_scrub_dev_attr, dev_attr)
>> +
>> +static ssize_t addr_range_base_show(struct device *ras_feat_dev,
>> +				    struct device_attribute *attr,
>> +				    char *buf)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u64 base, size;
>> +	int ret;
>> +
>> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>&base, &size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "0x%llx\n", base); }
>> +
>> +static ssize_t addr_range_size_show(struct device *ras_feat_dev,
>> +				    struct device_attribute *attr,
>> +				    char *buf)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u64 base, size;
>> +	int ret;
>> +
>> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>&base, &size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "0x%llx\n", size); }
>> +
>> +static ssize_t addr_range_base_store(struct device *ras_feat_dev,
>> +				     struct device_attribute *attr,
>> +				     const char *buf, size_t len) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u64 base, size;
>> +	int ret;
>> +
>> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>&base, &size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = kstrtou64(buf, 0, &base);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = ops->write_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>base, size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t addr_range_size_store(struct device *ras_feat_dev,
>> +				     struct device_attribute *attr,
>> +				     const char *buf,
>> +				     size_t len)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u64 base, size;
>> +	int ret;
>> +
>> +	ret = ops->read_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>&base, &size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = kstrtou64(buf, 0, &size);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = ops->write_range(ras_feat_dev->parent, ctx->scrub[inst].private,
>base, size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t enable_background_store(struct device *ras_feat_dev,
>> +				       struct device_attribute *attr,
>> +				       const char *buf, size_t len) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = kstrtobool(buf, &enable);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = ops->set_enabled_bg(ras_feat_dev->parent, ctx-
>>scrub[inst].private, enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t enable_background_show(struct device *ras_feat_dev,
>> +				      struct device_attribute *attr, char *buf) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = ops->get_enabled_bg(ras_feat_dev->parent, ctx-
>>scrub[inst].private, &enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%d\n", enable); }
>> +
>> +static ssize_t enable_on_demand_show(struct device *ras_feat_dev,
>> +				     struct device_attribute *attr, char *buf) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = ops->get_enabled_od(ras_feat_dev->parent, ctx-
>>scrub[inst].private, &enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%d\n", enable); }
>> +
>> +static ssize_t enable_on_demand_store(struct device *ras_feat_dev,
>> +				      struct device_attribute *attr,
>> +				      const char *buf, size_t len) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	bool enable;
>> +	int ret;
>> +
>> +	ret = kstrtobool(buf, &enable);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = ops->set_enabled_od(ras_feat_dev->parent, ctx-
>>scrub[inst].private, enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static ssize_t min_cycle_duration_show(struct device *ras_feat_dev,
>> +				       struct device_attribute *attr,
>> +				       char *buf)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u32 val;
>> +	int ret;
>> +
>> +	ret = ops->min_cycle_read(ras_feat_dev->parent, ctx-
>>scrub[inst].private, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%u\n", val); }
>> +
>> +static ssize_t max_cycle_duration_show(struct device *ras_feat_dev,
>> +				       struct device_attribute *attr,
>> +				       char *buf)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u32 val;
>> +	int ret;
>> +
>> +	ret = ops->max_cycle_read(ras_feat_dev->parent, ctx-
>>scrub[inst].private, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%u\n", val); }
>> +
>> +static ssize_t current_cycle_duration_show(struct device *ras_feat_dev,
>> +					   struct device_attribute *attr,
>> +					   char *buf)
>> +{
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	u32 val;
>> +	int ret;
>> +
>> +	ret = ops->cycle_duration_read(ras_feat_dev->parent, ctx-
>>scrub[inst].private, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return sysfs_emit(buf, "%u\n", val); }
>> +
>> +static ssize_t current_cycle_duration_store(struct device *ras_feat_dev,
>> +					    struct device_attribute *attr,
>> +					    const char *buf, size_t len) {
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +	long val;
>> +	int ret;
>> +
>> +	ret = kstrtol(buf, 0, &val);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	ret = ops->cycle_duration_write(ras_feat_dev->parent, ctx-
>>scrub[inst].private, val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return len;
>> +}
>> +
>> +static umode_t scrub_attr_visible(struct kobject *kobj,
>> +				  struct attribute *a, int attr_id) {
>> +	struct device *ras_feat_dev = kobj_to_dev(kobj);
>> +	struct device_attribute *dev_attr =
>> +				container_of(a, struct device_attribute, attr);
>> +	u8 inst = ((struct edac_scrub_dev_attr *)to_scrub_dev_attr(dev_attr))-
>>instance;
>> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
>> +	const struct edac_scrub_ops *ops = ctx->scrub[inst].scrub_ops;
>> +
>> +	switch (attr_id) {
>> +	case SCRUB_ADDR_RANGE_BASE:
>> +	case SCRUB_ADDR_RANGE_SIZE:
>> +		if (ops->read_range && ops->write_range)
>> +			return a->mode;
>> +		if (ops->read_range)
>> +			return 0444;
>> +		return 0;
>> +	case SCRUB_ENABLE_BACKGROUND:
>> +		if (ops->get_enabled_bg && ops->set_enabled_bg)
>> +			return a->mode;
>> +		if (ops->get_enabled_bg)
>> +			return 0444;
>> +		return 0;
>> +	case SCRUB_ENABLE_ON_DEMAND:
>> +		if (ops->get_enabled_od && ops->set_enabled_od)
>> +			return a->mode;
>> +		if (ops->get_enabled_od)
>> +			return 0444;
>> +		return 0;
>> +	case SCRUB_MIN_CYCLE_DURATION:
>> +		return ops->min_cycle_read ? a->mode : 0;
>> +	case SCRUB_MAX_CYCLE_DURATION:
>> +		return ops->max_cycle_read ? a->mode : 0;
>> +	case SCRUB_CURRENT_CYCLE_DURATION:
>> +		if (ops->cycle_duration_read && ops->cycle_duration_write)
>> +			return a->mode;
>> +		if (ops->cycle_duration_read)
>> +			return 0444;
>> +		return 0;
>> +	default:
>> +		return 0;
>> +	}
>> +}
>> +
>> +#define EDAC_SCRUB_ATTR_RO(_name, _instance)       \
>> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_RO(_name), \
>> +				     .instance = _instance })
>> +
>> +#define EDAC_SCRUB_ATTR_WO(_name, _instance)       \
>> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_WO(_name), \
>> +				     .instance = _instance })
>> +
>> +#define EDAC_SCRUB_ATTR_RW(_name, _instance)       \
>> +	((struct edac_scrub_dev_attr) { .dev_attr = __ATTR_RW(_name), \
>> +				     .instance = _instance })
>> +
>> +static int scrub_create_desc(struct device *scrub_dev,
>> +			     const struct attribute_group **attr_groups,
>> +			     u8 instance)
>> +{
>> +	struct edac_scrub_context *scrub_ctx;
>> +	struct attribute_group *group;
>> +	int i;
>> +
>> +	scrub_ctx = devm_kzalloc(scrub_dev, sizeof(*scrub_ctx), GFP_KERNEL);
>> +	if (!scrub_ctx)
>> +		return -ENOMEM;
>> +
>> +	group = &scrub_ctx->group;
>> +	scrub_ctx->scrub_dev_attr[0] =
>EDAC_SCRUB_ATTR_RW(addr_range_base, instance);
>> +	scrub_ctx->scrub_dev_attr[1] =
>EDAC_SCRUB_ATTR_RW(addr_range_size, instance);
>> +	scrub_ctx->scrub_dev_attr[2] =
>EDAC_SCRUB_ATTR_RW(enable_background, instance);
>> +	scrub_ctx->scrub_dev_attr[3] =
>EDAC_SCRUB_ATTR_RW(enable_on_demand, instance);
>> +	scrub_ctx->scrub_dev_attr[4] =
>EDAC_SCRUB_ATTR_RO(min_cycle_duration, instance);
>> +	scrub_ctx->scrub_dev_attr[5] =
>EDAC_SCRUB_ATTR_RO(max_cycle_duration, instance);
>> +	scrub_ctx->scrub_dev_attr[6] =
>EDAC_SCRUB_ATTR_RW(current_cycle_duration, instance);
>> +	for (i = 0; i < SCRUB_MAX_ATTRS; i++)
>> +		scrub_ctx->scrub_attrs[i] =
>> +&scrub_ctx->scrub_dev_attr[i].dev_attr.attr;
>> +
>> +	sprintf(scrub_ctx->name, "%s%d", "scrub", instance);
>> +	group->name = scrub_ctx->name;
>> +	group->attrs = scrub_ctx->scrub_attrs;
>> +	group->is_visible  = scrub_attr_visible;
>> +
>> +	attr_groups[0] = group;
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * edac_scrub_get_desc - get EDAC scrub descriptors
>> + * @scrub_dev: client device, with scrub support
>> + * @attr_groups: pointer to attrribute group container
>> + * @instance: device's scrub instance number.
>> + *
>> + * Returns 0 on success, error otherwise.
>> + */
>> +int edac_scrub_get_desc(struct device *scrub_dev,
>> +			const struct attribute_group **attr_groups,
>> +			u8 instance)
>> +{
>> +	if (!scrub_dev || !attr_groups)
>> +		return -EINVAL;
>> +
>> +	return scrub_create_desc(scrub_dev, attr_groups, instance); }
>> diff --git a/include/linux/edac.h b/include/linux/edac.h index
>> b337254cf5b8..aae8262b9863 100644
>> --- a/include/linux/edac.h
>> +++ b/include/linux/edac.h
>> @@ -674,6 +674,36 @@ enum edac_dev_feat {
>>  	RAS_FEAT_MAX
>>  };
>>
>> +/**
>> + * struct scrub_ops - scrub device operations (all elements optional)
>> + * @read_range: read base and offset of scrubbing range.
>> + * @write_range: set the base and offset of the scrubbing range.
>> + * @get_enabled_bg: check if currently performing background scrub.
>> + * @set_enabled_bg: start or stop a bg-scrub.
>> + * @get_enabled_od: check if currently performing on-demand scrub.
>> + * @set_enabled_od: start or stop an on-demand scrub.
>> + * @min_cycle_read: minimum supported scrub cycle duration in seconds.
>> + * @max_cycle_read: maximum supported scrub cycle duration in seconds.
>> + * @cycle_duration_read: get the scrub cycle duration in seconds.
>> + * @cycle_duration_write: set the scrub cycle duration in seconds.
>> + */
>> +struct edac_scrub_ops {
>> +	int (*read_range)(struct device *dev, void *drv_data, u64 *base, u64
>*size);
>> +	int (*write_range)(struct device *dev, void *drv_data, u64 base, u64
>size);
>> +	int (*get_enabled_bg)(struct device *dev, void *drv_data, bool *enable);
>> +	int (*set_enabled_bg)(struct device *dev, void *drv_data, bool enable);
>> +	int (*get_enabled_od)(struct device *dev, void *drv_data, bool *enable);
>> +	int (*set_enabled_od)(struct device *dev, void *drv_data, bool enable);
>> +	int (*min_cycle_read)(struct device *dev, void *drv_data,  u32 *min);
>> +	int (*max_cycle_read)(struct device *dev, void *drv_data,  u32 *max);
>> +	int (*cycle_duration_read)(struct device *dev, void *drv_data, u32
>*cycle);
>> +	int (*cycle_duration_write)(struct device *dev, void *drv_data, u32
>> +cycle); };
>> +
>> +int edac_scrub_get_desc(struct device *scrub_dev,
>> +			const struct attribute_group **attr_groups,
>> +			u8 instance);
>> +
>>  struct edac_ecs_ex_info {
>>  	u16 num_media_frus;
>>  };
>> --
>> 2.34.1
>>
>
>--
>Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 03/17] EDAC: Add EDAC ECS control driver
  2024-09-11  9:04 ` [PATCH v12 03/17] EDAC: Add EDAC ECS " shiju.jose
@ 2024-09-27 16:28   ` Fan Ni
  0 siblings, 0 replies; 39+ messages in thread
From: Fan Ni @ 2024-09-27 16:28 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm

On Wed, Sep 11, 2024 at 10:04:32AM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add EDAC ECS (Error Check Scrub) control driver supports configuring
s/supports/to support/
> the memory device's ECS feature.
> 
> The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
> Specification (JESD79-5) and allows the DRAM to internally read, correct
> single-bit errors, and write back corrected data bits to the DRAM array
> while providing transparency to error counts.
> 
> The DDR5 device contains number of memory media FRUs per device. The
> DDR5 ECS feature and thus the ECS control driver supports configuring
> the ECS parameters per FRU.
> 
> The memory devices support ECS feature register with the EDAC ECS driver
> and thus with the generic EDAC RAS feature driver, which adds the sysfs
> ECS control interface. The ECS control attributes are exposed to
> userspace in /sys/bus/edac/devices/<dev-name>/ecs_fruX/.
> 
> Generic EDAC ECS driver and the common sysfs ECS interface promotes
> unambiguous control from the userspace irrespective of the underlying
> devices, support ECS feature.
s/, support/which support/   ???
> 
> The support for ECS feature is added separately because the DDR5 ECS
> features control attributes are dissimilar from those of the scrub
> feature.
> 
> The sysfs ECS attr nodes would be present only if the client driver
> has implemented the corresponding attr callback function and pass
s/pass/passed/
> in ops to the EDAC RAS feature driver during registration.
> 
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  Documentation/ABI/testing/sysfs-edac-ecs |  78 +++++
>  drivers/edac/Makefile                    |   2 +-
>  drivers/edac/edac_device.c               |   3 +
>  drivers/edac/edac_ecs.c                  | 376 +++++++++++++++++++++++
>  include/linux/edac.h                     |  33 ++
>  5 files changed, 491 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-edac-ecs
>  create mode 100755 drivers/edac/edac_ecs.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-edac-ecs b/Documentation/ABI/testing/sysfs-edac-ecs
> new file mode 100644
> index 000000000000..1eb35acd4e5e
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-edac-ecs
> @@ -0,0 +1,78 @@
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*

Maybe s/ecs_fru*/ecs_fruX/?? 

The same for below.

> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		The sysfs EDAC bus devices /<dev-name>/ecs_fru* subdirectory
> +		belongs to the memory media ECS (Error Check Scrub) control
> +		feature, where <dev-name> directory corresponds to a device
> +		registered with the EDAC ECS driver and thus registered with
> +		the generic EDAC RAS driver too.
> +		/ecs_fru* belongs to the media FRUs (Field replaceable unit)
> +		under the memory device.
> +		The sysfs ECS attr nodes would be present only if the client
> +		driver has implemented the corresponding attr callback
> +		function and pass in ops to the EDAC RAS feature driver
s/pass/passed/

Fan
> +		during registration.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/log_entry_type
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) The log entry type of how the DDR5 ECS log is reported.
> +		00b - per DRAM.
> +		01b - per memory media FRU.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/log_entry_type_per_dram
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current log entry type is per DRAM.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/log_entry_type_per_memory_media
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current log entry type is per memory media FRU.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/mode
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) The mode of how the DDR5 ECS counts the errors.
> +		0 - ECS counts rows with errors.
> +		1 - ECS counts codewords with errors.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/mode_counts_rows
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current mode is ECS counts rows with errors.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/mode_counts_codewords
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RO) True if current mode is ECS counts codewords with errors.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/reset
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(WO) ECS reset ECC counter.
> +		0 - normal, ECC counter running actively.
> +		1 - reset ECC counter to the default value.
> +
> +What:		/sys/bus/edac/devices/<dev-name>/ecs_fru*/threshold
> +Date:		Oct 2024
> +KernelVersion:	6.12
> +Contact:	linux-edac@vger.kernel.org
> +Description:
> +		(RW) ECS threshold count per GB of memory cells.
> diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
> index fbf0e39ec678..62115eff6a9a 100644
> --- a/drivers/edac/Makefile
> +++ b/drivers/edac/Makefile
> @@ -10,7 +10,7 @@ obj-$(CONFIG_EDAC)			:= edac_core.o
>  
>  edac_core-y	:= edac_mc.o edac_device.o edac_mc_sysfs.o
>  edac_core-y	+= edac_module.o edac_device_sysfs.o wq.o
> -edac_core-y	+= edac_scrub.o
> +edac_core-y	+= edac_scrub.o edac_ecs.o
>  
>  edac_core-$(CONFIG_EDAC_DEBUG)		+= debugfs.o
>  
> diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
> index 6381896b6424..9cac9ae75080 100644
> --- a/drivers/edac/edac_device.c
> +++ b/drivers/edac/edac_device.c
> @@ -623,6 +623,9 @@ static int edac_dev_feat_init(struct device *parent,
>  		num = ras_feat->ecs_info.num_media_frus;
>  		dev_data->ecs_ops = ras_feat->ecs_ops;
>  		dev_data->private = ras_feat->ctx;
> +		ret = edac_ecs_get_desc(parent, attr_groups, num);
> +		if (ret)
> +			return ret;
>  		return num;
>  	case RAS_FEAT_PPR:
>  		dev_data->ppr_ops = ras_feat->ppr_ops;
> diff --git a/drivers/edac/edac_ecs.c b/drivers/edac/edac_ecs.c
> new file mode 100755
> index 000000000000..50915ab1e769
> --- /dev/null
> +++ b/drivers/edac/edac_ecs.c
> @@ -0,0 +1,376 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * ECS driver supporting controlling on die error check scrub
> + * (e.g. DDR5 ECS). The common sysfs ECS interface promotes
> + * unambiguous access from the userspace.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + */
> +
> +#define pr_fmt(fmt)     "EDAC ECS: " fmt
> +
> +#include <linux/edac.h>
> +
> +#define EDAC_ECS_FRU_NAME "ecs_fru"
> +
> +enum edac_ecs_attributes {
> +	ECS_LOG_ENTRY_TYPE,
> +	ECS_LOG_ENTRY_TYPE_PER_DRAM,
> +	ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA,
> +	ECS_MODE,
> +	ECS_MODE_COUNTS_ROWS,
> +	ECS_MODE_COUNTS_CODEWORDS,
> +	ECS_RESET,
> +	ECS_THRESHOLD,
> +	ECS_MAX_ATTRS
> +};
> +
> +struct edac_ecs_dev_attr {
> +	struct device_attribute dev_attr;
> +	int fru_id;
> +};
> +
> +struct edac_ecs_fru_context {
> +	char name[EDAC_FEAT_NAME_LEN];
> +	struct edac_ecs_dev_attr ecs_dev_attr[ECS_MAX_ATTRS];
> +	struct attribute *ecs_attrs[ECS_MAX_ATTRS + 1];
> +	struct attribute_group group;
> +};
> +
> +struct edac_ecs_context {
> +	u16 num_media_frus;
> +	struct edac_ecs_fru_context *fru_ctxs;
> +};
> +
> +#define to_ecs_dev_attr(_dev_attr)	\
> +	container_of(_dev_attr, struct edac_ecs_dev_attr, dev_attr)
> +
> +static ssize_t log_entry_type_show(struct device *ras_feat_dev,
> +				   struct device_attribute *attr,
> +				   char *buf)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->get_log_entry_type(ras_feat_dev->parent, ctx->ecs.private,
> +				      ecs_dev_attr->fru_id, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t log_entry_type_store(struct device *ras_feat_dev,
> +				    struct device_attribute *attr,
> +				    const char *buf, size_t len)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	long val;
> +	int ret;
> +
> +	ret = kstrtol(buf, 0, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->set_log_entry_type(ras_feat_dev->parent, ctx->ecs.private,
> +				      ecs_dev_attr->fru_id, val);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t log_entry_type_per_dram_show(struct device *ras_feat_dev,
> +					    struct device_attribute *attr,
> +					    char *buf)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->get_log_entry_type_per_dram(ras_feat_dev->parent, ctx->ecs.private,
> +					       ecs_dev_attr->fru_id, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t log_entry_type_per_memory_media_show(struct device *ras_feat_dev,
> +						    struct device_attribute *attr,
> +						    char *buf)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->get_log_entry_type_per_memory_media(ras_feat_dev->parent,
> +						       ctx->ecs.private,
> +						       ecs_dev_attr->fru_id, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t mode_show(struct device *ras_feat_dev,
> +			 struct device_attribute *attr,
> +			 char *buf)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->get_mode(ras_feat_dev->parent, ctx->ecs.private,
> +			    ecs_dev_attr->fru_id, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t mode_store(struct device *ras_feat_dev,
> +			  struct device_attribute *attr,
> +			  const char *buf, size_t len)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	long val;
> +	int ret;
> +
> +	ret = kstrtol(buf, 0, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->set_mode(ras_feat_dev->parent, ctx->ecs.private,
> +			    ecs_dev_attr->fru_id, val);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t mode_counts_rows_show(struct device *ras_feat_dev,
> +				     struct device_attribute *attr,
> +				     char *buf)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->get_mode_counts_rows(ras_feat_dev->parent, ctx->ecs.private,
> +					ecs_dev_attr->fru_id, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t mode_counts_codewords_show(struct device *ras_feat_dev,
> +					  struct device_attribute *attr,
> +					  char *buf)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	u32 val;
> +	int ret;
> +
> +	ret = ops->get_mode_counts_codewords(ras_feat_dev->parent, ctx->ecs.private,
> +					     ecs_dev_attr->fru_id, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t reset_store(struct device *ras_feat_dev,
> +			   struct device_attribute *attr,
> +			   const char *buf, size_t len)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	long val;
> +	int ret;
> +
> +	ret = kstrtol(buf, 0, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->reset(ras_feat_dev->parent, ctx->ecs.private,
> +			 ecs_dev_attr->fru_id, val);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static ssize_t threshold_show(struct device *ras_feat_dev,
> +			      struct device_attribute *attr, char *buf)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	int ret;
> +	u32 val;
> +
> +	ret = ops->get_threshold(ras_feat_dev->parent, ctx->ecs.private,
> +				 ecs_dev_attr->fru_id, &val);
> +	if (ret)
> +		return ret;
> +
> +	return sysfs_emit(buf, "%u\n", val);
> +}
> +
> +static ssize_t threshold_store(struct device *ras_feat_dev,
> +			       struct device_attribute *attr,
> +			       const char *buf, size_t len)
> +{
> +	struct edac_ecs_dev_attr *ecs_dev_attr = to_ecs_dev_attr(attr);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +	long val;
> +	int ret;
> +
> +	ret = kstrtol(buf, 0, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = ops->set_threshold(ras_feat_dev->parent, ctx->ecs.private,
> +				 ecs_dev_attr->fru_id, val);
> +	if (ret)
> +		return ret;
> +
> +	return len;
> +}
> +
> +static umode_t ecs_attr_visible(struct kobject *kobj,
> +				struct attribute *a, int attr_id)
> +{
> +	struct device *ras_feat_dev = kobj_to_dev(kobj);
> +	struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
> +	const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
> +
> +	switch (attr_id) {
> +	case ECS_LOG_ENTRY_TYPE:
> +		if (ops->get_log_entry_type && ops->set_log_entry_type)
> +			return a->mode;
> +		if (ops->get_log_entry_type)
> +			return 0444;
> +		return 0;
> +	case ECS_LOG_ENTRY_TYPE_PER_DRAM:
> +		return ops->get_log_entry_type_per_dram ? a->mode : 0;
> +	case ECS_LOG_ENTRY_TYPE_PER_MEMORY_MEDIA:
> +		return ops->get_log_entry_type_per_memory_media ? a->mode : 0;
> +	case ECS_MODE:
> +		if (ops->get_mode && ops->set_mode)
> +			return a->mode;
> +		if (ops->get_mode)
> +			return 0444;
> +		return 0;
> +	case ECS_MODE_COUNTS_ROWS:
> +		return ops->get_mode_counts_rows ? a->mode : 0;
> +	case ECS_MODE_COUNTS_CODEWORDS:
> +		return ops->get_mode_counts_codewords ? a->mode : 0;
> +	case ECS_RESET:
> +		return ops->reset ? a->mode : 0;
> +	case ECS_THRESHOLD:
> +		if (ops->get_threshold && ops->set_threshold)
> +			return a->mode;
> +		if (ops->get_threshold)
> +			return 0444;
> +		return 0;
> +	default:
> +		return 0;
> +	}
> +}
> +
> +#define EDAC_ECS_ATTR_RO(_name, _fru_id)       \
> +	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RO(_name), \
> +				     .fru_id = _fru_id })
> +
> +#define EDAC_ECS_ATTR_WO(_name, _fru_id)       \
> +	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_WO(_name), \
> +				     .fru_id = _fru_id })
> +
> +#define EDAC_ECS_ATTR_RW(_name, _fru_id)       \
> +	((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RW(_name), \
> +				     .fru_id = _fru_id })
> +
> +static int ecs_create_desc(struct device *ecs_dev,
> +			   const struct attribute_group **attr_groups,
> +			   u16 num_media_frus)
> +{
> +	struct edac_ecs_context *ecs_ctx;
> +	u32 fru;
> +
> +	ecs_ctx = devm_kzalloc(ecs_dev, sizeof(*ecs_ctx), GFP_KERNEL);
> +	if (!ecs_ctx)
> +		return -ENOMEM;
> +
> +	ecs_ctx->num_media_frus = num_media_frus;
> +	ecs_ctx->fru_ctxs = devm_kcalloc(ecs_dev, num_media_frus,
> +					 sizeof(*ecs_ctx->fru_ctxs),
> +					 GFP_KERNEL);
> +	if (!ecs_ctx->fru_ctxs)
> +		return -ENOMEM;
> +
> +	for (fru = 0; fru < num_media_frus; fru++) {
> +		struct edac_ecs_fru_context *fru_ctx = &ecs_ctx->fru_ctxs[fru];
> +		struct attribute_group *group = &fru_ctx->group;
> +		int i;
> +
> +		fru_ctx->ecs_dev_attr[0] = EDAC_ECS_ATTR_RW(log_entry_type, fru);
> +		fru_ctx->ecs_dev_attr[1] = EDAC_ECS_ATTR_RO(log_entry_type_per_dram, fru);
> +		fru_ctx->ecs_dev_attr[2] = EDAC_ECS_ATTR_RO(log_entry_type_per_memory_media, fru);
> +		fru_ctx->ecs_dev_attr[3] = EDAC_ECS_ATTR_RW(mode, fru);
> +		fru_ctx->ecs_dev_attr[4] = EDAC_ECS_ATTR_RO(mode_counts_rows, fru);
> +		fru_ctx->ecs_dev_attr[5] = EDAC_ECS_ATTR_RO(mode_counts_codewords, fru);
> +		fru_ctx->ecs_dev_attr[6] = EDAC_ECS_ATTR_WO(reset, fru);
> +		fru_ctx->ecs_dev_attr[7] = EDAC_ECS_ATTR_RW(threshold, fru);
> +		for (i = 0; i < ECS_MAX_ATTRS; i++)
> +			fru_ctx->ecs_attrs[i] = &fru_ctx->ecs_dev_attr[i].dev_attr.attr;
> +
> +		sprintf(fru_ctx->name, "%s%d", EDAC_ECS_FRU_NAME, fru);
> +		group->name = fru_ctx->name;
> +		group->attrs = fru_ctx->ecs_attrs;
> +		group->is_visible  = ecs_attr_visible;
> +
> +		attr_groups[fru] = group;
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * edac_ecs_get_desc - get EDAC ECS descriptors
> + * @ecs_dev: client device, supports ECS feature
> + * @attr_groups: pointer to attrribute group container
> + * @num_media_frus: number of media FRUs in the device
> + *
> + * Returns 0 on success, error otherwise.
> + */
> +int edac_ecs_get_desc(struct device *ecs_dev,
> +		      const struct attribute_group **attr_groups,
> +		      u16 num_media_frus)
> +{
> +	if (!ecs_dev || !attr_groups || !num_media_frus)
> +		return -EINVAL;
> +
> +	return ecs_create_desc(ecs_dev, attr_groups, num_media_frus);
> +}
> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index aae8262b9863..90cb90cf5272 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -704,10 +704,43 @@ int edac_scrub_get_desc(struct device *scrub_dev,
>  			const struct attribute_group **attr_groups,
>  			u8 instance);
>  
> +/**
> + * struct ecs_ops - ECS device operations (all elements optional)
> + * @get_log_entry_type: read the log entry type value.
> + * @set_log_entry_type: set the log entry type value.
> + * @get_log_entry_type_per_dram: read the log entry type per dram value.
> + * @get_log_entry_type_memory_media: read the log entry type per memory media value.
> + * @get_mode: read the mode value.
> + * @set_mode: set the mode value.
> + * @get_mode_counts_rows: read the mode counts rows value.
> + * @get_mode_counts_codewords: read the mode counts codewords value.
> + * @reset: reset the ECS counter.
> + * @get_threshold: read the threshold value.
> + * @set_threshold: set the threshold value.
> + */
> +struct edac_ecs_ops {
> +	int (*get_log_entry_type)(struct device *dev, void *drv_data, int fru_id, u32 *val);
> +	int (*set_log_entry_type)(struct device *dev, void *drv_data, int fru_id, u32 val);
> +	int (*get_log_entry_type_per_dram)(struct device *dev, void *drv_data,
> +					   int fru_id, u32 *val);
> +	int (*get_log_entry_type_per_memory_media)(struct device *dev, void *drv_data,
> +						   int fru_id, u32 *val);
> +	int (*get_mode)(struct device *dev, void *drv_data, int fru_id, u32 *val);
> +	int (*set_mode)(struct device *dev, void *drv_data, int fru_id, u32 val);
> +	int (*get_mode_counts_rows)(struct device *dev, void *drv_data, int fru_id, u32 *val);
> +	int (*get_mode_counts_codewords)(struct device *dev, void *drv_data, int fru_id, u32 *val);
> +	int (*reset)(struct device *dev, void *drv_data, int fru_id, u32 val);
> +	int (*get_threshold)(struct device *dev, void *drv_data, int fru_id, u32 *threshold);
> +	int (*set_threshold)(struct device *dev, void *drv_data, int fru_id, u32 threshold);
> +};
> +
>  struct edac_ecs_ex_info {
>  	u16 num_media_frus;
>  };
>  
> +int edac_ecs_get_desc(struct device *ecs_dev,
> +		      const struct attribute_group **attr_groups,
> +		      u16 num_media_frus);
>  /*
>   * EDAC device feature information structure
>   */
> -- 
> 2.34.1
> 

-- 
Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 08/17] cxl/mbox: Add GET_FEATURE mailbox command
  2024-09-11  9:04 ` [PATCH v12 08/17] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
@ 2024-09-30 16:17   ` Fan Ni
  0 siblings, 0 replies; 39+ messages in thread
From: Fan Ni @ 2024-09-30 16:17 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm

On Wed, Sep 11, 2024 at 10:04:37AM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add support for GET_FEATURE mailbox command.
> 
> CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
> The settings of a feature can be retrieved using Get Feature command.
> CXL spec 3.1 section 8.2.9.6.2 describes Get Feature command.
> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---

Reviewed-by: Fan Ni <fan.ni@samsung.com>

>  drivers/cxl/core/mbox.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h    | 26 ++++++++++++++++++++++++++
>  2 files changed, 67 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index fe965ec5802f..3dfe411c6556 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -965,6 +965,47 @@ int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t *f
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_get_supported_feature_entry, CXL);
>  
> +size_t cxl_get_feature(struct cxl_dev_state *cxlds, const uuid_t feat_uuid,
> +		       enum cxl_get_feat_selection selection,
> +		       void *feat_out, size_t feat_out_size)
> +{
> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
> +	size_t data_to_rd_size, size_out;
> +	struct cxl_mbox_get_feat_in pi;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	size_t data_rcvd_size = 0;
> +	int rc;
> +
> +	if (!feat_out || !feat_out_size)
> +		return 0;
> +
> +	size_out = min(feat_out_size, cxl_mbox->payload_size);
> +	pi.uuid = feat_uuid;
> +	pi.selection = selection;
> +	do {
> +		data_to_rd_size = min(feat_out_size - data_rcvd_size,
> +				      cxl_mbox->payload_size);
> +		pi.offset = cpu_to_le16(data_rcvd_size);
> +		pi.count = cpu_to_le16(data_to_rd_size);
> +
> +		mbox_cmd = (struct cxl_mbox_cmd) {
> +			.opcode = CXL_MBOX_OP_GET_FEATURE,
> +			.size_in = sizeof(pi),
> +			.payload_in = &pi,
> +			.size_out = size_out,
> +			.payload_out = feat_out + data_rcvd_size,
> +			.min_out = data_to_rd_size,
> +		};
> +		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
> +		if (rc < 0 || !mbox_cmd.size_out)
> +			return 0;
> +		data_rcvd_size += mbox_cmd.size_out;
> +	} while (data_rcvd_size < feat_out_size);
> +
> +	return data_rcvd_size;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
> +
>  /**
>   * cxl_enumerate_cmds() - Enumerate commands for a device.
>   * @mds: The driver data for the operation
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 5d149e64c247..57c9294bb7f3 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -487,6 +487,7 @@ enum cxl_opcode {
>  	CXL_MBOX_OP_CLEAR_LOG           = 0x0403,
>  	CXL_MBOX_OP_GET_SUP_LOG_SUBLIST = 0x0405,
>  	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
> +	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
>  	CXL_MBOX_OP_IDENTIFY		= 0x4000,
>  	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
>  	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
> @@ -812,6 +813,28 @@ struct cxl_mbox_get_sup_feats_out {
>  	struct cxl_feat_entry ents[] __counted_by(le32_to_cpu(supported_feats));
>  } __packed;
>  
> +/*
> + * Get Feature CXL 3.1 Spec 8.2.9.6.2
> + */
> +
> +/*
> + * Get Feature input payload
> + * CXL rev 3.1 section 8.2.9.6.2 Table 8-99
> + */
> +enum cxl_get_feat_selection {
> +	CXL_GET_FEAT_SEL_CURRENT_VALUE,
> +	CXL_GET_FEAT_SEL_DEFAULT_VALUE,
> +	CXL_GET_FEAT_SEL_SAVED_VALUE,
> +	CXL_GET_FEAT_SEL_MAX
> +};
> +
> +struct cxl_mbox_get_feat_in {
> +	uuid_t uuid;
> +	__le16 offset;
> +	__le16 count;
> +	u8 selection;
> +}  __packed;
> +
>  int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
>  			  struct cxl_mbox_cmd *cmd);
>  int cxl_dev_state_identify(struct cxl_memdev_state *mds);
> @@ -875,4 +898,7 @@ void cxl_dpa_debug(struct seq_file *file, struct cxl_dev_state *cxlds);
>  int cxl_get_supported_features(struct cxl_dev_state *cxlds);
>  int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t *feat_uuid,
>  				    struct cxl_feat_entry *feat_entry_out);
> +size_t cxl_get_feature(struct cxl_dev_state *cxlds, const uuid_t feat_uuid,
> +		       enum cxl_get_feat_selection selection,
> +		       void *feat_out, size_t feat_out_size);
>  #endif /* __CXL_MEM_H__ */
> -- 
> 2.34.1
> 

-- 
Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 09/17] cxl/mbox: Add SET_FEATURE mailbox command
  2024-09-11  9:04 ` [PATCH v12 09/17] cxl/mbox: Add SET_FEATURE " shiju.jose
@ 2024-09-30 16:58   ` Fan Ni
  0 siblings, 0 replies; 39+ messages in thread
From: Fan Ni @ 2024-09-30 16:58 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm

On Wed, Sep 11, 2024 at 10:04:38AM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add support for SET_FEATURE mailbox command.
> 
> CXL spec 3.1 section 8.2.9.6 describes optional device specific features.
> CXL devices supports features with changeable attributes.
s/supports/support/

Fan
> The settings of a feature can be optionally modified using Set Feature
> command.
> CXL spec 3.1 section 8.2.9.6.3 describes Set Feature command.
> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  drivers/cxl/core/mbox.c | 73 +++++++++++++++++++++++++++++++++++++++++
>  drivers/cxl/cxlmem.h    | 34 +++++++++++++++++++
>  2 files changed, 107 insertions(+)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 3dfe411c6556..806b1c8087b0 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1006,6 +1006,79 @@ size_t cxl_get_feature(struct cxl_dev_state *cxlds, const uuid_t feat_uuid,
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_get_feature, CXL);
>  
> +/*
> + * FEAT_DATA_MIN_PAYLOAD_SIZE - min extra number of bytes should be
> + * available in the mailbox for storing the actual feature data so that
> + * the feature data transfer would work as expected.
> + */
> +#define FEAT_DATA_MIN_PAYLOAD_SIZE 10
> +int cxl_set_feature(struct cxl_dev_state *cxlds,
> +		    const uuid_t feat_uuid, u8 feat_version,
> +		    void *feat_data, size_t feat_data_size,
> +		    u8 feat_flag)
> +{
> +	struct cxl_mailbox *cxl_mbox = &cxlds->cxl_mbox;
> +	struct cxl_memdev_set_feat_pi {
> +		struct cxl_mbox_set_feat_hdr hdr;
> +		u8 feat_data[];
> +	}  __packed;
> +	size_t data_in_size, data_sent_size = 0;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	size_t hdr_size;
> +	int rc = 0;
> +
> +	struct cxl_memdev_set_feat_pi *pi __free(kfree) =
> +					kmalloc(cxl_mbox->payload_size, GFP_KERNEL);
> +	pi->hdr.uuid = feat_uuid;
> +	pi->hdr.version = feat_version;
> +	feat_flag &= ~CXL_SET_FEAT_FLAG_DATA_TRANSFER_MASK;
> +	feat_flag |= CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET;
> +	hdr_size = sizeof(pi->hdr);
> +	/*
> +	 * Check minimum mbox payload size is available for
> +	 * the feature data transfer.
> +	 */
> +	if (hdr_size + FEAT_DATA_MIN_PAYLOAD_SIZE > cxl_mbox->payload_size)
> +		return -ENOMEM;
> +
> +	if ((hdr_size + feat_data_size) <= cxl_mbox->payload_size) {
> +		pi->hdr.flags = cpu_to_le32(feat_flag |
> +				       CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER);
> +		data_in_size = feat_data_size;
> +	} else {
> +		pi->hdr.flags = cpu_to_le32(feat_flag |
> +				       CXL_SET_FEAT_FLAG_INITIATE_DATA_TRANSFER);
> +		data_in_size = cxl_mbox->payload_size - hdr_size;
> +	}
> +
> +	do {
> +		pi->hdr.offset = cpu_to_le16(data_sent_size);
> +		memcpy(pi->feat_data, feat_data + data_sent_size, data_in_size);
> +		mbox_cmd = (struct cxl_mbox_cmd) {
> +			.opcode = CXL_MBOX_OP_SET_FEATURE,
> +			.size_in = hdr_size + data_in_size,
> +			.payload_in = pi,
> +		};
> +		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
> +		if (rc < 0)
> +			return rc;
> +
> +		data_sent_size += data_in_size;
> +		if (data_sent_size >= feat_data_size)
> +			return 0;
> +
> +		if ((feat_data_size - data_sent_size) <= (cxl_mbox->payload_size - hdr_size)) {
> +			data_in_size = feat_data_size - data_sent_size;
> +			pi->hdr.flags = cpu_to_le32(feat_flag |
> +					       CXL_SET_FEAT_FLAG_FINISH_DATA_TRANSFER);
> +		} else {
> +			pi->hdr.flags = cpu_to_le32(feat_flag |
> +					       CXL_SET_FEAT_FLAG_CONTINUE_DATA_TRANSFER);
> +		}
> +	} while (true);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_set_feature, CXL);
> +
>  /**
>   * cxl_enumerate_cmds() - Enumerate commands for a device.
>   * @mds: The driver data for the operation
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 57c9294bb7f3..b565a061a4e3 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -488,6 +488,7 @@ enum cxl_opcode {
>  	CXL_MBOX_OP_GET_SUP_LOG_SUBLIST = 0x0405,
>  	CXL_MBOX_OP_GET_SUPPORTED_FEATURES	= 0x0500,
>  	CXL_MBOX_OP_GET_FEATURE		= 0x0501,
> +	CXL_MBOX_OP_SET_FEATURE		= 0x0502,
>  	CXL_MBOX_OP_IDENTIFY		= 0x4000,
>  	CXL_MBOX_OP_GET_PARTITION_INFO	= 0x4100,
>  	CXL_MBOX_OP_SET_PARTITION_INFO	= 0x4101,
> @@ -835,6 +836,35 @@ struct cxl_mbox_get_feat_in {
>  	u8 selection;
>  }  __packed;
>  
> +/*
> + * Set Feature CXL 3.1 Spec 8.2.9.6.3
> + */
> +
> +/*
> + * Set Feature input payload
> + * CXL rev 3.1 section 8.2.9.6.3 Table 8-101
> + */
> +/* Set Feature : Payload in flags */
> +#define CXL_SET_FEAT_FLAG_DATA_TRANSFER_MASK	GENMASK(2, 0)
> +enum cxl_set_feat_flag_data_transfer {
> +	CXL_SET_FEAT_FLAG_FULL_DATA_TRANSFER,
> +	CXL_SET_FEAT_FLAG_INITIATE_DATA_TRANSFER,
> +	CXL_SET_FEAT_FLAG_CONTINUE_DATA_TRANSFER,
> +	CXL_SET_FEAT_FLAG_FINISH_DATA_TRANSFER,
> +	CXL_SET_FEAT_FLAG_ABORT_DATA_TRANSFER,
> +	CXL_SET_FEAT_FLAG_DATA_TRANSFER_MAX
> +};
> +
> +#define CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET	BIT(3)
> +
> +struct cxl_mbox_set_feat_hdr {
> +	uuid_t uuid;
> +	__le32 flags;
> +	__le16 offset;
> +	u8 version;
> +	u8 rsvd[9];
> +}  __packed;
> +
>  int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
>  			  struct cxl_mbox_cmd *cmd);
>  int cxl_dev_state_identify(struct cxl_memdev_state *mds);
> @@ -901,4 +931,8 @@ int cxl_get_supported_feature_entry(struct cxl_dev_state *cxlds, const uuid_t *f
>  size_t cxl_get_feature(struct cxl_dev_state *cxlds, const uuid_t feat_uuid,
>  		       enum cxl_get_feat_selection selection,
>  		       void *feat_out, size_t feat_out_size);
> +int cxl_set_feature(struct cxl_dev_state *cxlds,
> +		    const uuid_t feat_uuid, u8 feat_version,
> +		    void *feat_data, size_t feat_data_size,
> +		    u8 feat_flag);
>  #endif /* __CXL_MEM_H__ */
> -- 
> 2.34.1
> 

-- 
Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol scrub control feature
  2024-09-11  9:04 ` [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
@ 2024-09-30 17:38   ` Fan Ni
  2024-10-01  8:38     ` Shiju Jose
  2024-10-01 19:47   ` Fan Ni
  1 sibling, 1 reply; 39+ messages in thread
From: Fan Ni @ 2024-09-30 17:38 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm

On Wed, Sep 11, 2024 at 10:04:39AM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
> feature. The device patrol scrub proactively locates and makes corrections
> to errors in regular cycle.
> 
> Allow specifying the number of hours within which the patrol scrub must be
> completed, subject to minimum and maximum limits reported by the device.
> Also allow disabling scrub allowing trade-off error rates against
> performance.
> 
> Add support for CXL memory device based patrol scrub control.
> Register with EDAC RAS control feature driver, which gets the scrub attr
> descriptors from the EDAC scrub and expose sysfs scrub control attributes
> to the userspace.
> For example CXL device based scrub control for the CXL mem0 device is
> exposed in /sys/bus/edac/devices/cxl_mem0/scrub*/
> 
> Also add support for region based CXL memory patrol scrub control.
> CXL memory region may be interleaved across one or more CXL memory devices.
> For example region based scrub control for CXL region1 is exposed in
> /sys/bus/edac/devices/cxl_region1/scrub*/
> 
> Open Questions:
> Q1: CXL 3.1 spec defined patrol scrub control feature at CXL memory devices
> with supporting set scrub cycle and enable/disable scrub. but not based on
> HPA range. Thus presently scrub control for a region is implemented based
> on all associated CXL memory devices.
> What is the exact use case for the CXL region based scrub control?
> How the HPA range, which Dan asked for region based scrubbing is used?
> Does spec change is required for patrol scrub control feature with support
> for setting the HPA range?
> 
> Q2: Both CXL device based and CXL region based scrub control would be
> enabled at the same time in a system?
> 
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>

Hi Shiju,

When trying the following ops with this patchset, I acctually noticed
something unexpected.

---------------------------------
root@localhost:~# dmesg -C
root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/min_cycle_duration
3600
root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/max_cycle_duration
918000
root@localhost:~# echo 3200 > /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
-bash: echo: write error: Invalid argument
root@localhost:~# dmesg
[ 4950.038767] cxl_pci:__cxl_pci_mbox_send_cmd:263: cxl_pci 0000:0d:00.0: Sending command: 0x0501
[ 4950.038952] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms
[ 4972.487087] cxl_pci:__cxl_pci_mbox_send_cmd:263: cxl_pci 0000:0d:00.0: Sending command: 0x0501
[ 4972.487339] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms
[ 4972.487509] cxl_mem mem0: Invalid CXL patrol scrub cycle(0) to set
[ 4972.488287] cxl_mem mem0: Minimum supported CXL patrol scrub cycle in hour 0
-----------------------

If you check the last line of the dmesg output, it seems we did not
print out the minimum scrub cycle duration correctly.

Fan


> ---
>  Documentation/edac/edac-scrub.rst |  74 ++++++
>  drivers/cxl/Kconfig               |  18 ++
>  drivers/cxl/core/Makefile         |   1 +
>  drivers/cxl/core/memfeature.c     | 372 ++++++++++++++++++++++++++++++
>  drivers/cxl/core/region.c         |   6 +
>  drivers/cxl/cxlmem.h              |   7 +
>  drivers/cxl/mem.c                 |   4 +
>  7 files changed, 482 insertions(+)
>  create mode 100644 Documentation/edac/edac-scrub.rst
>  create mode 100644 drivers/cxl/core/memfeature.c
> 
> diff --git a/Documentation/edac/edac-scrub.rst b/Documentation/edac/edac-scrub.rst
> new file mode 100644
> index 000000000000..243035957e99
> --- /dev/null
> +++ b/Documentation/edac/edac-scrub.rst
> @@ -0,0 +1,74 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===================
> +EDAC Scrub control
> +===================
> +
> +Copyright (c) 2024 HiSilicon Limited.
> +
> +:Author:   Shiju Jose <shiju.jose@huawei.com>
> +:License:  The GNU Free Documentation License, Version 1.2
> +          (dual licensed under the GPL v2)
> +:Original Reviewers:
> +
> +- Written for: 6.12
> +- Updated for:
> +
> +Introduction
> +------------
> +The EDAC enhancement for RAS featurues exposes interfaces for controlling
> +the memory scrubbers in the system. The scrub device drivers in the
> +system register with the EDAC scrub. The driver exposes the
> +scrub controls to user in the sysfs.
> +
> +The File System
> +---------------
> +
> +The control attributes of the registered scrubber instance could be
> +accessed in the /sys/bus/edac/devices/<dev-name>/scrub*/
> +
> +sysfs
> +-----
> +
> +Sysfs files are documented in
> +`Documentation/ABI/testing/sysfs-edac-scrub-control`.
> +
> +Example
> +-------
> +
> +The usage takes the form shown in this example::
> +
> +1. CXL memory device patrol scrubber
> +1.1 device based
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/min_cycle_duration
> +3600
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/max_cycle_duration
> +918000
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
> +43200
> +root@localhost:~# echo 54000 > /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
> +54000
> +root@localhost:~# echo 1 > /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> +1
> +root@localhost:~# echo 0 > /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> +0
> +
> +1.2. region based
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/min_cycle_duration
> +3600
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/max_cycle_duration
> +918000
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
> +43200
> +root@localhost:~# echo 54000 > /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
> +54000
> +root@localhost:~# echo 1 > /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> +1
> +root@localhost:~# echo 0 > /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> +0
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 99b5c25be079..394bdbc4de87 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -145,4 +145,22 @@ config CXL_REGION_INVALIDATION_TEST
>  	  If unsure, or if this kernel is meant for production environments,
>  	  say N.
>  
> +config CXL_RAS_FEAT
> +	bool "CXL: Memory RAS features"
> +	depends on CXL_PCI
> +	depends on CXL_MEM
> +	depends on EDAC
> +	help
> +	  The CXL memory RAS feature control is optional allows host to control
> +	  the RAS features configurations of CXL Type 3 devices.
> +
> +	  Registers with the EDAC device subsystem to expose control attributes
> +	  of CXL memory device's RAS features to the user.
> +	  Provides interface functions to support configuring the CXL memory
> +	  device's RAS features.
> +
> +	  Say 'y/n' to enable/disable CXL.mem device'ss RAS features control.
> +	  See section 8.2.9.9.11 of CXL 3.1 specification for the detailed
> +	  information of CXL memory device features.
> +
>  endif
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 9259bcc6773c..2a3c7197bc23 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -16,3 +16,4 @@ cxl_core-y += pmu.o
>  cxl_core-y += cdat.o
>  cxl_core-$(CONFIG_TRACING) += trace.o
>  cxl_core-$(CONFIG_CXL_REGION) += region.o
> +cxl_core-$(CONFIG_CXL_RAS_FEAT) += memfeature.o
> diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c
> new file mode 100644
> index 000000000000..90c68d20b02b
> --- /dev/null
> +++ b/drivers/cxl/core/memfeature.c
> @@ -0,0 +1,372 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * CXL memory RAS feature driver.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + *
> + *  - Supports functions to configure RAS features of the
> + *    CXL memory devices.
> + *  - Registers with the EDAC device subsystem driver to expose
> + *    the features sysfs attributes to the user for configuring
> + *    CXL memory RAS feature.
> + */
> +
> +#define pr_fmt(fmt)	"CXL MEM FEAT: " fmt
> +
> +#include <cxlmem.h>
> +#include <linux/cleanup.h>
> +#include <linux/limits.h>
> +#include <cxl.h>
> +#include <linux/edac.h>
> +
> +#define CXL_DEV_NUM_RAS_FEATURES	1
> +#define CXL_DEV_HOUR_IN_SECS	3600
> +
> +#define CXL_SCRUB_NAME_LEN	128
> +
> +/* CXL memory patrol scrub control definitions */
> +static const uuid_t cxl_patrol_scrub_uuid =
> +	UUID_INIT(0x96dad7d6, 0xfde8, 0x482b, 0xa7, 0x33, 0x75, 0x77, 0x4e,     \
> +		  0x06, 0xdb, 0x8a);
> +
> +/* CXL memory patrol scrub control functions */
> +struct cxl_patrol_scrub_context {
> +	u8 instance;
> +	u16 get_feat_size;
> +	u16 set_feat_size;
> +	u8 get_version;
> +	u8 set_version;
> +	u16 set_effects;
> +	struct cxl_memdev *cxlmd;
> +	struct cxl_region *cxlr;
> +};
> +
> +/**
> + * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data structure.
> + * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
> + * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is changeable.
> + * @scrub_cycle_hrs:    [IN] Requested patrol scrub cycle in hours.
> + *                      [OUT] Current patrol scrub cycle in hours.
> + * @min_scrub_cycle_hrs:[OUT] minimum patrol scrub cycle in hours supported.
> + */
> +struct cxl_memdev_ps_params {
> +	bool enable;
> +	bool scrub_cycle_changeable;
> +	u16 scrub_cycle_hrs;
> +	u16 min_scrub_cycle_hrs;
> +};
> +
> +enum cxl_scrub_param {
> +	CXL_PS_PARAM_ENABLE,
> +	CXL_PS_PARAM_SCRUB_CYCLE,
> +};
> +
> +#define	CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK	BIT(0)
> +#define	CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK	BIT(1)
> +#define	CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK	GENMASK(7, 0)
> +#define	CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK	GENMASK(15, 8)
> +#define	CXL_MEMDEV_PS_FLAG_ENABLED_MASK	BIT(0)
> +
> +struct cxl_memdev_ps_rd_attrs {
> +	u8 scrub_cycle_cap;
> +	__le16 scrub_cycle_hrs;
> +	u8 scrub_flags;
> +}  __packed;
> +
> +struct cxl_memdev_ps_wr_attrs {
> +	u8 scrub_cycle_hrs;
> +	u8 scrub_flags;
> +}  __packed;
> +
> +static int cxl_mem_ps_get_attrs(struct cxl_dev_state *cxlds,
> +				struct cxl_memdev_ps_params *params)
> +{
> +	size_t rd_data_size = sizeof(struct cxl_memdev_ps_rd_attrs);
> +	size_t data_size;
> +	struct cxl_memdev_ps_rd_attrs *rd_attrs __free(kfree) =
> +						kmalloc(rd_data_size, GFP_KERNEL);
> +	if (!rd_attrs)
> +		return -ENOMEM;
> +
> +	data_size = cxl_get_feature(cxlds, cxl_patrol_scrub_uuid,
> +				    CXL_GET_FEAT_SEL_CURRENT_VALUE,
> +				    rd_attrs, rd_data_size);
> +	if (!data_size)
> +		return -EIO;
> +
> +	params->scrub_cycle_changeable = FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
> +						   rd_attrs->scrub_cycle_cap);
> +	params->enable = FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> +				   rd_attrs->scrub_flags);
> +	params->scrub_cycle_hrs = FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> +					    rd_attrs->scrub_cycle_hrs);
> +	params->min_scrub_cycle_hrs = FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK,
> +						rd_attrs->scrub_cycle_hrs);
> +
> +	return 0;
> +}
> +
> +static int cxl_ps_get_attrs(struct device *dev, void *drv_data,
> +			    struct cxl_memdev_ps_params *params)
> +{
> +	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
> +	struct cxl_memdev *cxlmd;
> +	struct cxl_dev_state *cxlds;
> +	u16 min_scrub_cycle = 0;
> +	int i, ret;
> +
> +	if (cxl_ps_ctx->cxlr) {
> +		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
> +		struct cxl_region_params *p = &cxlr->params;
> +
> +		for (i = p->interleave_ways - 1; i >= 0; i--) {
> +			struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> +			cxlmd = cxled_to_memdev(cxled);
> +			cxlds = cxlmd->cxlds;
> +			ret = cxl_mem_ps_get_attrs(cxlds, params);
> +			if (ret)
> +				return ret;
> +
> +			if (params->min_scrub_cycle_hrs > min_scrub_cycle)
> +				min_scrub_cycle = params->min_scrub_cycle_hrs;
> +		}
> +		params->min_scrub_cycle_hrs = min_scrub_cycle;
> +		return 0;
> +	}
> +	cxlmd = cxl_ps_ctx->cxlmd;
> +	cxlds = cxlmd->cxlds;
> +
> +	return cxl_mem_ps_get_attrs(cxlds, params);
> +}
> +
> +static int cxl_mem_ps_set_attrs(struct device *dev, void *drv_data,
> +				struct cxl_dev_state *cxlds,
> +				struct cxl_memdev_ps_params *params,
> +				enum cxl_scrub_param param_type)
> +{
> +	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
> +	struct cxl_memdev_ps_wr_attrs wr_attrs;
> +	struct cxl_memdev_ps_params rd_params;
> +	int ret;
> +
> +	ret = cxl_mem_ps_get_attrs(cxlds, &rd_params);
> +	if (ret) {
> +		dev_err(dev, "Get cxlmemdev patrol scrub params failed ret=%d\n",
> +			ret);
> +		return ret;
> +	}
> +
> +	switch (param_type) {
> +	case CXL_PS_PARAM_ENABLE:
> +		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> +						  params->enable);
> +		wr_attrs.scrub_cycle_hrs = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> +						      rd_params.scrub_cycle_hrs);
> +		break;
> +	case CXL_PS_PARAM_SCRUB_CYCLE:
> +		if (params->scrub_cycle_hrs < rd_params.min_scrub_cycle_hrs) {
> +			dev_err(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
> +				params->scrub_cycle_hrs);
> +			dev_err(dev, "Minimum supported CXL patrol scrub cycle in hour %d\n",
> +				params->min_scrub_cycle_hrs);
> +			return -EINVAL;
> +		}
> +		wr_attrs.scrub_cycle_hrs = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> +						      params->scrub_cycle_hrs);
> +		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> +						  rd_params.enable);
> +		break;
> +	}
> +
> +	ret = cxl_set_feature(cxlds, cxl_patrol_scrub_uuid,
> +			      cxl_ps_ctx->set_version,
> +			      &wr_attrs, sizeof(wr_attrs),
> +			      CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
> +	if (ret) {
> +		dev_err(dev, "CXL patrol scrub set feature failed ret=%d\n", ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int cxl_ps_set_attrs(struct device *dev, void *drv_data,
> +			    struct cxl_memdev_ps_params *params,
> +			    enum cxl_scrub_param param_type)
> +{
> +	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
> +	struct cxl_memdev *cxlmd;
> +	struct cxl_dev_state *cxlds;
> +	int ret, i;
> +
> +	if (cxl_ps_ctx->cxlr) {
> +		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
> +		struct cxl_region_params *p = &cxlr->params;
> +
> +		for (i = p->interleave_ways - 1; i >= 0; i--) {
> +			struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> +			cxlmd = cxled_to_memdev(cxled);
> +			cxlds = cxlmd->cxlds;
> +			ret = cxl_mem_ps_set_attrs(dev, drv_data, cxlds,
> +						   params, param_type);
> +			if (ret)
> +				return ret;
> +		}
> +	} else {
> +		cxlmd = cxl_ps_ctx->cxlmd;
> +		cxlds = cxlmd->cxlds;
> +
> +		return cxl_mem_ps_set_attrs(dev, drv_data, cxlds, params, param_type);
> +	}
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, void *drv_data, bool *enabled)
> +{
> +	struct cxl_memdev_ps_params params;
> +	int ret;
> +
> +	ret = cxl_ps_get_attrs(dev, drv_data, &params);
> +	if (ret)
> +		return ret;
> +
> +	*enabled = params.enable;
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable)
> +{
> +	struct cxl_memdev_ps_params params = {
> +		.enable = enable,
> +	};
> +
> +	return cxl_ps_set_attrs(dev, drv_data, &params, CXL_PS_PARAM_ENABLE);
> +}
> +
> +static int cxl_patrol_scrub_read_min_scrub_cycle(struct device *dev, void *drv_data,
> +						 u32 *min)
> +{
> +	struct cxl_memdev_ps_params params;
> +	int ret;
> +
> +	ret = cxl_ps_get_attrs(dev, drv_data, &params);
> +	if (ret)
> +		return ret;
> +	*min = params.min_scrub_cycle_hrs * CXL_DEV_HOUR_IN_SECS;
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_read_max_scrub_cycle(struct device *dev, void *drv_data,
> +						 u32 *max)
> +{
> +	*max = U8_MAX * CXL_DEV_HOUR_IN_SECS; /* Max set by register size */
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_read_scrub_cycle(struct device *dev, void *drv_data,
> +					     u32 *scrub_cycle_secs)
> +{
> +	struct cxl_memdev_ps_params params;
> +	int ret;
> +
> +	ret = cxl_ps_get_attrs(dev, drv_data, &params);
> +	if (ret)
> +		return ret;
> +
> +	*scrub_cycle_secs = params.scrub_cycle_hrs * CXL_DEV_HOUR_IN_SECS;
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_write_scrub_cycle(struct device *dev, void *drv_data,
> +					      u32 scrub_cycle_secs)
> +{
> +	struct cxl_memdev_ps_params params = {
> +		.scrub_cycle_hrs = scrub_cycle_secs / CXL_DEV_HOUR_IN_SECS,
> +	};
> +
> +	return cxl_ps_set_attrs(dev, drv_data, &params, CXL_PS_PARAM_SCRUB_CYCLE);
> +}
> +
> +static const struct edac_scrub_ops cxl_ps_scrub_ops = {
> +	.get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
> +	.set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
> +	.min_cycle_read = cxl_patrol_scrub_read_min_scrub_cycle,
> +	.max_cycle_read = cxl_patrol_scrub_read_max_scrub_cycle,
> +	.cycle_duration_read = cxl_patrol_scrub_read_scrub_cycle,
> +	.cycle_duration_write = cxl_patrol_scrub_write_scrub_cycle,
> +};
> +
> +int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
> +{
> +	struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
> +	struct cxl_dev_state *cxlds;
> +	struct cxl_patrol_scrub_context *cxl_ps_ctx;
> +	struct cxl_feat_entry feat_entry;
> +	char cxl_dev_name[CXL_SCRUB_NAME_LEN];
> +	int rc, i, num_ras_features = 0;
> +
> +	if (cxlr) {
> +		struct cxl_region_params *p = &cxlr->params;
> +
> +		for (i = p->interleave_ways - 1; i >= 0; i--) {
> +			struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> +			cxlmd = cxled_to_memdev(cxled);
> +			cxlds = cxlmd->cxlds;
> +			memset(&feat_entry, 0, sizeof(feat_entry));
> +			rc = cxl_get_supported_feature_entry(cxlds, &cxl_patrol_scrub_uuid,
> +							     &feat_entry);
> +			if (rc < 0)
> +				return rc;
> +			if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
> +				return -EOPNOTSUPP;
> +		}
> +	} else {
> +		cxlds = cxlmd->cxlds;
> +		rc = cxl_get_supported_feature_entry(cxlds, &cxl_patrol_scrub_uuid,
> +						     &feat_entry);
> +		if (rc < 0)
> +			return rc;
> +
> +		if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
> +			return -EOPNOTSUPP;
> +	}
> +
> +	cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
> +	if (!cxl_ps_ctx)
> +		return -ENOMEM;
> +
> +	*cxl_ps_ctx = (struct cxl_patrol_scrub_context) {
> +		.instance = cxl_ps_ctx->instance,
> +		.get_feat_size = feat_entry.get_feat_size,
> +		.set_feat_size = feat_entry.set_feat_size,
> +		.get_version = feat_entry.get_feat_ver,
> +		.set_version = feat_entry.set_feat_ver,
> +		.set_effects = feat_entry.set_effects,
> +	};
> +	if (cxlr) {
> +		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
> +			 "cxl_region%d", cxlr->id);
> +		cxl_ps_ctx->cxlr = cxlr;
> +	} else {
> +		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
> +			 "%s_%s", "cxl", dev_name(&cxlmd->dev));
> +		cxl_ps_ctx->cxlmd = cxlmd;
> +	}
> +
> +	ras_features[num_ras_features].ft_type = RAS_FEAT_SCRUB;
> +	ras_features[num_ras_features].scrub_ops = &cxl_ps_scrub_ops;
> +	ras_features[num_ras_features].ctx = cxl_ps_ctx;
> +	num_ras_features++;
> +
> +	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
> +				 num_ras_features, ras_features);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_ras_features_init, CXL);
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 21ad5f242875..1cc29ec9ffac 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3434,6 +3434,12 @@ static int cxl_region_probe(struct device *dev)
>  					p->res->start, p->res->end, cxlr,
>  					is_system_ram) > 0)
>  			return 0;
> +
> +		rc = cxl_mem_ras_features_init(NULL, cxlr);
> +		if (rc)
> +			dev_warn(&cxlr->dev, "CXL RAS features init for region_id=%d failed\n",
> +				 cxlr->id);
> +
>  		return devm_cxl_add_dax_region(cxlr);
>  	default:
>  		dev_dbg(&cxlr->dev, "unsupported region mode: %d\n",
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index b565a061a4e3..2187c3378eaa 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -889,6 +889,13 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
>  int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
>  int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
>  
> +#ifdef CONFIG_CXL_RAS_FEAT
> +int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr);
> +#else
> +static inline int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
> +{ return 0; }
> +#endif
> +
>  #ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);
>  void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 7de232eaeb17..be2e69548909 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -117,6 +117,10 @@ static int cxl_mem_probe(struct device *dev)
>  	if (!cxlds->media_ready)
>  		return -EBUSY;
>  
> +	rc = cxl_mem_ras_features_init(cxlmd, NULL);
> +	if (rc)
> +		dev_warn(&cxlmd->dev, "CXL RAS features init failed\n");
> +
>  	/*
>  	 * Someone is trying to reattach this device after it lost its port
>  	 * connection (an endpoint port previously registered by this memdev was
> -- 
> 2.34.1
> 

-- 
Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 11/17] cxl/memfeature: Add CXL memory device ECS control feature
  2024-09-11  9:04 ` [PATCH v12 11/17] cxl/memfeature: Add CXL memory device ECS " shiju.jose
@ 2024-09-30 18:12   ` Fan Ni
  2024-10-01  8:39     ` Shiju Jose
  0 siblings, 1 reply; 39+ messages in thread
From: Fan Ni @ 2024-09-30 18:12 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm

On Wed, Sep 11, 2024 at 10:04:40AM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> CXL spec 3.1 section 8.2.9.9.11.2 describes the DDR5 ECS (Error Check
> Scrub) control feature.
> The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
> Specification (JESD79-5) and allows the DRAM to internally read, correct
> single-bit errors, and write back corrected data bits to the DRAM array
> while providing transparency to error counts.
> 
> The ECS control allows the requester to change the log entry type, the ECS
> threshold count provided that the request is within the definition
> specified in DDR5 mode registers, change mode between codeword mode and
> row count mode, and reset the ECS counter.
> 
> Register with EDAC RAS control feature driver, which gets the ECS attr
> descriptors from the EDAC ECS and expose sysfs ECS control attributes
> to the userspace.
> For example ECS control for the memory media FRU 0 in CXL mem0 device is
> in /sys/bus/edac/devices/cxl_mem0/ecs_fru0/
> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  drivers/cxl/core/memfeature.c | 439 +++++++++++++++++++++++++++++++++-
>  1 file changed, 438 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c
> index 90c68d20b02b..5d4057fa304c 100644
> --- a/drivers/cxl/core/memfeature.c
> +++ b/drivers/cxl/core/memfeature.c
> @@ -19,7 +19,7 @@
>  #include <cxl.h>
>  #include <linux/edac.h>
>  
> -#define CXL_DEV_NUM_RAS_FEATURES	1
> +#define CXL_DEV_NUM_RAS_FEATURES	2
>  #define CXL_DEV_HOUR_IN_SECS	3600
>  
>  #define CXL_SCRUB_NAME_LEN	128
> @@ -303,6 +303,405 @@ static const struct edac_scrub_ops cxl_ps_scrub_ops = {
>  	.cycle_duration_write = cxl_patrol_scrub_write_scrub_cycle,
>  };
>  
...
> +	case CXL_ECS_PARAM_THRESHOLD:
> +		wr_attrs[fru_id].ecs_config &= ~CXL_ECS_THRESHOLD_COUNT_MASK;
> +		switch (params->threshold) {
> +		case 256:
> +			wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
> +								  ECS_THRESHOLD_256);
> +			break;
> +		case 1024:
> +			wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
> +								  ECS_THRESHOLD_1024);
> +			break;
> +		case 4096:
> +			wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
> +								  ECS_THRESHOLD_4096);
> +			break;
> +		default:
> +			dev_err(dev,
> +				"Invalid CXL ECS scrub threshold count(%d) to set\n",
> +				params->threshold);
> +			dev_err(dev,
> +				"Supported scrub threshold count: 256,1024,4096\n");
> +			return -EINVAL;
> +		}
> +		break;
> +	case CXL_ECS_PARAM_MODE:
> +		if (params->mode != ECS_MODE_COUNTS_ROWS &&
> +		    params->mode != ECS_MODE_COUNTS_CODEWORDS) {
> +			dev_err(dev,
> +				"Invalid CXL ECS scrub mode(%d) to set\n",
> +				params->mode);
> +			dev_err(dev,
> +				"Mode 0: ECS counts rows with errors"
> +				" 1: ECS counts codewords with errors\n");
The messaging here can be improved. When printed out in dmesg, it looks
like

root@localhost:~# echo 2 > /sys/bus/edac/devices/cxl_mem0/ecs_fru0/mode
----
[ 6099.073006] cxl_mem mem0: Invalid CXL ECS scrub mode(2) to set
[ 6099.074407] cxl_mem mem0: Mode 0: ECS counts rows with errors 1: ECS counts codewords with errors
----
Maybe use similar message format as threshold above, like
+			dev_err(dev,
+				"Supported ECS mode: 0: ECS counts rows with errors; 1: ECS counts codewords with errors\n");

Fan
> +			return -EINVAL;
> +		}
> +		wr_attrs[fru_id].ecs_config &= ~CXL_ECS_MODE_MASK;
> +		wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_MODE_MASK,
> +							  params->mode);
> +		break;
> +	case CXL_ECS_PARAM_RESET_COUNTER:
> +		wr_attrs[fru_id].ecs_config &= ~CXL_ECS_RESET_COUNTER_MASK;
> +		wr_attrs[fru_id].ecs_config |= FIELD_PREP(CXL_ECS_RESET_COUNTER_MASK,
> +							  params->reset_counter);
> +		break;
> +	default:
> +		dev_err(dev, "Invalid CXL ECS parameter to set\n");
> +		return -EINVAL;
> +	}
> +
> +	ret = cxl_set_feature(cxlds, cxl_ecs_uuid, cxl_ecs_ctx->set_version,
> +			      wr_attrs, wr_data_size,
> +			      CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
> +	if (ret) {
> +		dev_err(dev, "CXL ECS set feature failed ret=%d\n", ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int cxl_ecs_get_log_entry_type(struct device *dev, void *drv_data,
> +				      int fru_id, u32 *val)
> +{
> +	struct cxl_ecs_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
> +	if (ret)
> +		return ret;
> +
> +	*val = params.log_entry_type;
> +
> +	return 0;
> +}
> +
> +static int cxl_ecs_set_log_entry_type(struct device *dev, void *drv_data,
> +				      int fru_id, u32 val)
> +{
> +	struct cxl_ecs_params params = {
> +		.log_entry_type = val,
> +	};
> +
> +	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
> +				     &params, CXL_ECS_PARAM_LOG_ENTRY_TYPE);
> +}
> +
> +static int cxl_ecs_get_log_entry_type_per_dram(struct device *dev, void *drv_data,
> +					       int fru_id, u32 *val)
> +{
> +	struct cxl_ecs_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
> +	if (ret)
> +		return ret;
> +
> +	if (params.log_entry_type == ECS_LOG_ENTRY_TYPE_DRAM)
> +		*val = 1;
> +	else
> +		*val = 0;
> +
> +	return 0;
> +}
> +
> +static int cxl_ecs_get_log_entry_type_per_memory_media(struct device *dev,
> +						       void *drv_data,
> +						       int fru_id, u32 *val)
> +{
> +	struct cxl_ecs_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
> +	if (ret)
> +		return ret;
> +
> +	if (params.log_entry_type == ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU)
> +		*val = 1;
> +	else
> +		*val = 0;
> +
> +	return 0;
> +}
> +
> +static int cxl_ecs_get_mode(struct device *dev, void *drv_data,
> +			    int fru_id, u32 *val)
> +{
> +	struct cxl_ecs_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
> +	if (ret)
> +		return ret;
> +
> +	*val = params.mode;
> +
> +	return 0;
> +}
> +
> +static int cxl_ecs_set_mode(struct device *dev, void *drv_data,
> +			    int fru_id, u32 val)
> +{
> +	struct cxl_ecs_params params = {
> +		.mode = val,
> +	};
> +
> +	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
> +				     &params, CXL_ECS_PARAM_MODE);
> +}
> +
> +static int cxl_ecs_get_mode_counts_rows(struct device *dev, void *drv_data,
> +					int fru_id, u32 *val)
> +{
> +	struct cxl_ecs_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
> +	if (ret)
> +		return ret;
> +
> +	if (params.mode == ECS_MODE_COUNTS_ROWS)
> +		*val = 1;
> +	else
> +		*val = 0;
> +
> +	return 0;
> +}
> +
> +static int cxl_ecs_get_mode_counts_codewords(struct device *dev, void *drv_data,
> +					     int fru_id, u32 *val)
> +{
> +	struct cxl_ecs_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
> +	if (ret)
> +		return ret;
> +
> +	if (params.mode == ECS_MODE_COUNTS_CODEWORDS)
> +		*val = 1;
> +	else
> +		*val = 0;
> +
> +	return 0;
> +}
> +
> +static int cxl_ecs_reset(struct device *dev, void *drv_data, int fru_id, u32 val)
> +{
> +	struct cxl_ecs_params params = {
> +		.reset_counter = val,
> +	};
> +
> +	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
> +				     &params, CXL_ECS_PARAM_RESET_COUNTER);
> +}
> +
> +static int cxl_ecs_get_threshold(struct device *dev, void *drv_data,
> +				 int fru_id, u32 *val)
> +{
> +	struct cxl_ecs_params params;
> +	int ret;
> +
> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
> +	if (ret)
> +		return ret;
> +
> +	*val = params.threshold;
> +
> +	return 0;
> +}
> +
> +static int cxl_ecs_set_threshold(struct device *dev, void *drv_data,
> +				 int fru_id, u32 val)
> +{
> +	struct cxl_ecs_params params = {
> +		.threshold = val,
> +	};
> +
> +	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
> +				     &params, CXL_ECS_PARAM_THRESHOLD);
> +}
> +
> +static const struct edac_ecs_ops cxl_ecs_ops = {
> +	.get_log_entry_type = cxl_ecs_get_log_entry_type,
> +	.set_log_entry_type = cxl_ecs_set_log_entry_type,
> +	.get_log_entry_type_per_dram = cxl_ecs_get_log_entry_type_per_dram,
> +	.get_log_entry_type_per_memory_media =
> +				cxl_ecs_get_log_entry_type_per_memory_media,
> +	.get_mode = cxl_ecs_get_mode,
> +	.set_mode = cxl_ecs_set_mode,
> +	.get_mode_counts_codewords = cxl_ecs_get_mode_counts_codewords,
> +	.get_mode_counts_rows = cxl_ecs_get_mode_counts_rows,
> +	.reset = cxl_ecs_reset,
> +	.get_threshold = cxl_ecs_get_threshold,
> +	.set_threshold = cxl_ecs_set_threshold,
> +};
> +
>  int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
>  {
>  	struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
> @@ -310,7 +709,9 @@ int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
>  	struct cxl_patrol_scrub_context *cxl_ps_ctx;
>  	struct cxl_feat_entry feat_entry;
>  	char cxl_dev_name[CXL_SCRUB_NAME_LEN];
> +	struct cxl_ecs_context *cxl_ecs_ctx;
>  	int rc, i, num_ras_features = 0;
> +	int num_media_frus;
>  
>  	if (cxlr) {
>  		struct cxl_region_params *p = &cxlr->params;
> @@ -366,6 +767,42 @@ int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
>  	ras_features[num_ras_features].ctx = cxl_ps_ctx;
>  	num_ras_features++;
>  
> +	if (!cxlr) {
> +		rc = cxl_get_supported_feature_entry(cxlds, &cxl_ecs_uuid,
> +						     &feat_entry);
> +		if (rc < 0)
> +			goto feat_register;
> +
> +		if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
> +			goto feat_register;
> +		num_media_frus = feat_entry.get_feat_size /
> +					sizeof(struct cxl_ecs_rd_attrs);
> +		if (!num_media_frus)
> +			goto feat_register;
> +
> +		cxl_ecs_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ecs_ctx),
> +					   GFP_KERNEL);
> +		if (!cxl_ecs_ctx)
> +			goto feat_register;
> +		*cxl_ecs_ctx = (struct cxl_ecs_context) {
> +			.get_feat_size = feat_entry.get_feat_size,
> +			.set_feat_size = feat_entry.set_feat_size,
> +			.get_version = feat_entry.get_feat_ver,
> +			.set_version = feat_entry.set_feat_ver,
> +			.set_effects = feat_entry.set_effects,
> +			.num_media_frus = num_media_frus,
> +			.cxlmd = cxlmd,
> +		};
> +
> +		ras_features[num_ras_features].ft_type = RAS_FEAT_ECS;
> +		ras_features[num_ras_features].ecs_ops = &cxl_ecs_ops;
> +		ras_features[num_ras_features].ctx = cxl_ecs_ctx;
> +		ras_features[num_ras_features].ecs_info.num_media_frus =
> +								num_media_frus;
> +		num_ras_features++;
> +	}
> +
> +feat_register:
>  	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
>  				 num_ras_features, ras_features);
>  }
> -- 
> 2.34.1
> 

-- 
Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol scrub control feature
  2024-09-30 17:38   ` Fan Ni
@ 2024-10-01  8:38     ` Shiju Jose
  0 siblings, 0 replies; 39+ messages in thread
From: Shiju Jose @ 2024-10-01  8:38 UTC (permalink / raw)
  To: Fan Ni
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	Jonathan Cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	jgroves, vsalve, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm


>-----Original Message-----
>From: Fan Ni <nifan.cxl@gmail.com>
>Sent: 30 September 2024 18:39
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>bp@alien8.de; tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; jgroves@micron.com;
>vsalve@micron.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol
>scrub control feature
>
>On Wed, Sep 11, 2024 at 10:04:39AM +0100, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub
>> control feature. The device patrol scrub proactively locates and makes
>> corrections to errors in regular cycle.
>>
>> Allow specifying the number of hours within which the patrol scrub
>> must be completed, subject to minimum and maximum limits reported by the
>device.
>> Also allow disabling scrub allowing trade-off error rates against
>> performance.
>>
>> Add support for CXL memory device based patrol scrub control.
>> Register with EDAC RAS control feature driver, which gets the scrub
>> attr descriptors from the EDAC scrub and expose sysfs scrub control
>> attributes to the userspace.
>> For example CXL device based scrub control for the CXL mem0 device is
>> exposed in /sys/bus/edac/devices/cxl_mem0/scrub*/
>>
>> Also add support for region based CXL memory patrol scrub control.
>> CXL memory region may be interleaved across one or more CXL memory
>devices.
>> For example region based scrub control for CXL region1 is exposed in
>> /sys/bus/edac/devices/cxl_region1/scrub*/
>>
>> Open Questions:
>> Q1: CXL 3.1 spec defined patrol scrub control feature at CXL memory
>> devices with supporting set scrub cycle and enable/disable scrub. but
>> not based on HPA range. Thus presently scrub control for a region is
>> implemented based on all associated CXL memory devices.
>> What is the exact use case for the CXL region based scrub control?
>> How the HPA range, which Dan asked for region based scrubbing is used?
>> Does spec change is required for patrol scrub control feature with
>> support for setting the HPA range?
>>
>> Q2: Both CXL device based and CXL region based scrub control would be
>> enabled at the same time in a system?
>>
>> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>
>Hi Shiju,
>
>When trying the following ops with this patchset, I acctually noticed something
>unexpected.
>
>---------------------------------
>root@localhost:~# dmesg -C
>root@localhost:~# cat
>/sys/bus/edac/devices/cxl_mem0/scrub0/min_cycle_duration
>3600
>root@localhost:~# cat
>/sys/bus/edac/devices/cxl_mem0/scrub0/max_cycle_duration
>918000
>root@localhost:~# echo 3200 >
>/sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
>-bash: echo: write error: Invalid argument root@localhost:~# dmesg [
>4950.038767] cxl_pci:__cxl_pci_mbox_send_cmd:263: cxl_pci 0000:0d:00.0:
>Sending command: 0x0501 [ 4950.038952]
>cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait
>took 0ms [ 4972.487087] cxl_pci:__cxl_pci_mbox_send_cmd:263: cxl_pci
>0000:0d:00.0: Sending command: 0x0501 [ 4972.487339]
>cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait
>took 0ms [ 4972.487509] cxl_mem mem0: Invalid CXL patrol scrub cycle(0) to
>set [ 4972.488287] cxl_mem mem0: Minimum supported CXL patrol scrub cycle
>in hour 0
>-----------------------
>
>If you check the last line of the dmesg output, it seems we did not print out the
>minimum scrub cycle duration correctly.
Hi Fan,

Thanks for checking and reporting the bug.
dev_err(dev, "Minimum supported CXL patrol scrub cycle in hour %d\n",
	 params->min_scrub_cycle_hrs);
In the above error print,  
I will change, params->min_scrub_cycle_hrs to rd_params.min_scrub_cycle_hrs
>
>Fan

Thanks,
Shiju
>
>
>> ---
>>  Documentation/edac/edac-scrub.rst |  74 ++++++
>>  drivers/cxl/Kconfig               |  18 ++
>>  drivers/cxl/core/Makefile         |   1 +
>>  drivers/cxl/core/memfeature.c     | 372 ++++++++++++++++++++++++++++++
>>  drivers/cxl/core/region.c         |   6 +
>>  drivers/cxl/cxlmem.h              |   7 +
>>  drivers/cxl/mem.c                 |   4 +
>>  7 files changed, 482 insertions(+)
>>  create mode 100644 Documentation/edac/edac-scrub.rst  create mode
>> 100644 drivers/cxl/core/memfeature.c
>>
>> diff --git a/Documentation/edac/edac-scrub.rst
>> b/Documentation/edac/edac-scrub.rst
>> new file mode 100644
>> index 000000000000..243035957e99
>> --- /dev/null
>> +++ b/Documentation/edac/edac-scrub.rst
>> @@ -0,0 +1,74 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +===================
>> +EDAC Scrub control
>> +===================
>> +
>> +Copyright (c) 2024 HiSilicon Limited.
>> +
>> +:Author:   Shiju Jose <shiju.jose@huawei.com>
>> +:License:  The GNU Free Documentation License, Version 1.2
>> +          (dual licensed under the GPL v2) :Original Reviewers:
>> +
>> +- Written for: 6.12
>> +- Updated for:
>> +
>> +Introduction
>> +------------
>> +The EDAC enhancement for RAS featurues exposes interfaces for
>> +controlling the memory scrubbers in the system. The scrub device
>> +drivers in the system register with the EDAC scrub. The driver
>> +exposes the scrub controls to user in the sysfs.
>> +
>> +The File System
>> +---------------
>> +
>> +The control attributes of the registered scrubber instance could be
>> +accessed in the /sys/bus/edac/devices/<dev-name>/scrub*/
>> +
>> +sysfs
>> +-----
>> +
>> +Sysfs files are documented in
>> +`Documentation/ABI/testing/sysfs-edac-scrub-control`.
>> +
>> +Example
>> +-------
>> +
>> +The usage takes the form shown in this example::
>> +
>> +1. CXL memory device patrol scrubber
>> +1.1 device based
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_mem0/scrub0/min_cycle_duration
>> +3600
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_mem0/scrub0/max_cycle_duration
>> +918000
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
>> +43200
>> +root@localhost:~# echo 54000 >
>> +/sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
>> +54000
>> +root@localhost:~# echo 1 >
>> +/sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
>> +1
>> +root@localhost:~# echo 0 >
>> +/sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
>> +0
>> +
>> +1.2. region based
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_region0/scrub0/min_cycle_duration
>> +3600
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_region0/scrub0/max_cycle_duration
>> +918000
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
>> +43200
>> +root@localhost:~# echo 54000 >
>> +/sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
>> +54000
>> +root@localhost:~# echo 1 >
>> +/sys/bus/edac/devices/cxl_region0/scrub0/enable_background
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_region0/scrub0/enable_background
>> +1
>> +root@localhost:~# echo 0 >
>> +/sys/bus/edac/devices/cxl_region0/scrub0/enable_background
>> +root@localhost:~# cat
>> +/sys/bus/edac/devices/cxl_region0/scrub0/enable_background
>> +0
>> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index
>> 99b5c25be079..394bdbc4de87 100644
>> --- a/drivers/cxl/Kconfig
>> +++ b/drivers/cxl/Kconfig
>> @@ -145,4 +145,22 @@ config CXL_REGION_INVALIDATION_TEST
>>  	  If unsure, or if this kernel is meant for production environments,
>>  	  say N.
>>
>> +config CXL_RAS_FEAT
>> +	bool "CXL: Memory RAS features"
>> +	depends on CXL_PCI
>> +	depends on CXL_MEM
>> +	depends on EDAC
>> +	help
>> +	  The CXL memory RAS feature control is optional allows host to control
>> +	  the RAS features configurations of CXL Type 3 devices.
>> +
>> +	  Registers with the EDAC device subsystem to expose control attributes
>> +	  of CXL memory device's RAS features to the user.
>> +	  Provides interface functions to support configuring the CXL memory
>> +	  device's RAS features.
>> +
>> +	  Say 'y/n' to enable/disable CXL.mem device'ss RAS features control.
>> +	  See section 8.2.9.9.11 of CXL 3.1 specification for the detailed
>> +	  information of CXL memory device features.
>> +
>>  endif
>> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
>> index 9259bcc6773c..2a3c7197bc23 100644
>> --- a/drivers/cxl/core/Makefile
>> +++ b/drivers/cxl/core/Makefile
>> @@ -16,3 +16,4 @@ cxl_core-y += pmu.o
>>  cxl_core-y += cdat.o
>>  cxl_core-$(CONFIG_TRACING) += trace.o
>>  cxl_core-$(CONFIG_CXL_REGION) += region.o
>> +cxl_core-$(CONFIG_CXL_RAS_FEAT) += memfeature.o
>> diff --git a/drivers/cxl/core/memfeature.c
>> b/drivers/cxl/core/memfeature.c new file mode 100644 index
>> 000000000000..90c68d20b02b
>> --- /dev/null
>> +++ b/drivers/cxl/core/memfeature.c
>> @@ -0,0 +1,372 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * CXL memory RAS feature driver.
>> + *
>> + * Copyright (c) 2024 HiSilicon Limited.
>> + *
>> + *  - Supports functions to configure RAS features of the
>> + *    CXL memory devices.
>> + *  - Registers with the EDAC device subsystem driver to expose
>> + *    the features sysfs attributes to the user for configuring
>> + *    CXL memory RAS feature.
>> + */
>> +
>> +#define pr_fmt(fmt)	"CXL MEM FEAT: " fmt
>> +
>> +#include <cxlmem.h>
>> +#include <linux/cleanup.h>
>> +#include <linux/limits.h>
>> +#include <cxl.h>
>> +#include <linux/edac.h>
>> +
>> +#define CXL_DEV_NUM_RAS_FEATURES	1
>> +#define CXL_DEV_HOUR_IN_SECS	3600
>> +
>> +#define CXL_SCRUB_NAME_LEN	128
>> +
>> +/* CXL memory patrol scrub control definitions */ static const uuid_t
>> +cxl_patrol_scrub_uuid =
>> +	UUID_INIT(0x96dad7d6, 0xfde8, 0x482b, 0xa7, 0x33, 0x75, 0x77, 0x4e,
>\
>> +		  0x06, 0xdb, 0x8a);
>> +
>> +/* CXL memory patrol scrub control functions */ struct
>> +cxl_patrol_scrub_context {
>> +	u8 instance;
>> +	u16 get_feat_size;
>> +	u16 set_feat_size;
>> +	u8 get_version;
>> +	u8 set_version;
>> +	u16 set_effects;
>> +	struct cxl_memdev *cxlmd;
>> +	struct cxl_region *cxlr;
>> +};
>> +
>> +/**
>> + * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data
>structure.
>> + * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
>> + * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is
>changeable.
>> + * @scrub_cycle_hrs:    [IN] Requested patrol scrub cycle in hours.
>> + *                      [OUT] Current patrol scrub cycle in hours.
>> + * @min_scrub_cycle_hrs:[OUT] minimum patrol scrub cycle in hours
>supported.
>> + */
>> +struct cxl_memdev_ps_params {
>> +	bool enable;
>> +	bool scrub_cycle_changeable;
>> +	u16 scrub_cycle_hrs;
>> +	u16 min_scrub_cycle_hrs;
>> +};
>> +
>> +enum cxl_scrub_param {
>> +	CXL_PS_PARAM_ENABLE,
>> +	CXL_PS_PARAM_SCRUB_CYCLE,
>> +};
>> +
>> +#define	CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK	BIT(0)
>> +#define
>	CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK
>	BIT(1)
>> +#define	CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK	GENMASK(7, 0)
>> +#define	CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK	GENMASK(15,
>8)
>> +#define	CXL_MEMDEV_PS_FLAG_ENABLED_MASK	BIT(0)
>> +
>> +struct cxl_memdev_ps_rd_attrs {
>> +	u8 scrub_cycle_cap;
>> +	__le16 scrub_cycle_hrs;
>> +	u8 scrub_flags;
>> +}  __packed;
>> +
>> +struct cxl_memdev_ps_wr_attrs {
>> +	u8 scrub_cycle_hrs;
>> +	u8 scrub_flags;
>> +}  __packed;
>> +
>> +static int cxl_mem_ps_get_attrs(struct cxl_dev_state *cxlds,
>> +				struct cxl_memdev_ps_params *params) {
>> +	size_t rd_data_size = sizeof(struct cxl_memdev_ps_rd_attrs);
>> +	size_t data_size;
>> +	struct cxl_memdev_ps_rd_attrs *rd_attrs __free(kfree) =
>> +						kmalloc(rd_data_size,
>GFP_KERNEL);
>> +	if (!rd_attrs)
>> +		return -ENOMEM;
>> +
>> +	data_size = cxl_get_feature(cxlds, cxl_patrol_scrub_uuid,
>> +				    CXL_GET_FEAT_SEL_CURRENT_VALUE,
>> +				    rd_attrs, rd_data_size);
>> +	if (!data_size)
>> +		return -EIO;
>> +
>> +	params->scrub_cycle_changeable =
>FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
>> +						   rd_attrs->scrub_cycle_cap);
>> +	params->enable =
>FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
>> +				   rd_attrs->scrub_flags);
>> +	params->scrub_cycle_hrs =
>FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
>> +					    rd_attrs->scrub_cycle_hrs);
>> +	params->min_scrub_cycle_hrs =
>FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK,
>> +						rd_attrs->scrub_cycle_hrs);
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_ps_get_attrs(struct device *dev, void *drv_data,
>> +			    struct cxl_memdev_ps_params *params) {
>> +	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
>> +	struct cxl_memdev *cxlmd;
>> +	struct cxl_dev_state *cxlds;
>> +	u16 min_scrub_cycle = 0;
>> +	int i, ret;
>> +
>> +	if (cxl_ps_ctx->cxlr) {
>> +		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
>> +		struct cxl_region_params *p = &cxlr->params;
>> +
>> +		for (i = p->interleave_ways - 1; i >= 0; i--) {
>> +			struct cxl_endpoint_decoder *cxled = p->targets[i];
>> +
>> +			cxlmd = cxled_to_memdev(cxled);
>> +			cxlds = cxlmd->cxlds;
>> +			ret = cxl_mem_ps_get_attrs(cxlds, params);
>> +			if (ret)
>> +				return ret;
>> +
>> +			if (params->min_scrub_cycle_hrs > min_scrub_cycle)
>> +				min_scrub_cycle = params-
>>min_scrub_cycle_hrs;
>> +		}
>> +		params->min_scrub_cycle_hrs = min_scrub_cycle;
>> +		return 0;
>> +	}
>> +	cxlmd = cxl_ps_ctx->cxlmd;
>> +	cxlds = cxlmd->cxlds;
>> +
>> +	return cxl_mem_ps_get_attrs(cxlds, params); }
>> +
>> +static int cxl_mem_ps_set_attrs(struct device *dev, void *drv_data,
>> +				struct cxl_dev_state *cxlds,
>> +				struct cxl_memdev_ps_params *params,
>> +				enum cxl_scrub_param param_type)
>> +{
>> +	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
>> +	struct cxl_memdev_ps_wr_attrs wr_attrs;
>> +	struct cxl_memdev_ps_params rd_params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ps_get_attrs(cxlds, &rd_params);
>> +	if (ret) {
>> +		dev_err(dev, "Get cxlmemdev patrol scrub params failed
>ret=%d\n",
>> +			ret);
>> +		return ret;
>> +	}
>> +
>> +	switch (param_type) {
>> +	case CXL_PS_PARAM_ENABLE:
>> +		wr_attrs.scrub_flags =
>FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
>> +						  params->enable);
>> +		wr_attrs.scrub_cycle_hrs =
>FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
>> +
>rd_params.scrub_cycle_hrs);
>> +		break;
>> +	case CXL_PS_PARAM_SCRUB_CYCLE:
>> +		if (params->scrub_cycle_hrs < rd_params.min_scrub_cycle_hrs)
>{
>> +			dev_err(dev, "Invalid CXL patrol scrub cycle(%d) to
>set\n",
>> +				params->scrub_cycle_hrs);
>> +			dev_err(dev, "Minimum supported CXL patrol scrub
>cycle in hour %d\n",
>> +				params->min_scrub_cycle_hrs);
>> +			return -EINVAL;
>> +		}
>> +		wr_attrs.scrub_cycle_hrs =
>FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
>> +						      params->scrub_cycle_hrs);
>> +		wr_attrs.scrub_flags =
>FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
>> +						  rd_params.enable);
>> +		break;
>> +	}
>> +
>> +	ret = cxl_set_feature(cxlds, cxl_patrol_scrub_uuid,
>> +			      cxl_ps_ctx->set_version,
>> +			      &wr_attrs, sizeof(wr_attrs),
>> +
>CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
>> +	if (ret) {
>> +		dev_err(dev, "CXL patrol scrub set feature failed ret=%d\n",
>ret);
>> +		return ret;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_ps_set_attrs(struct device *dev, void *drv_data,
>> +			    struct cxl_memdev_ps_params *params,
>> +			    enum cxl_scrub_param param_type) {
>> +	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
>> +	struct cxl_memdev *cxlmd;
>> +	struct cxl_dev_state *cxlds;
>> +	int ret, i;
>> +
>> +	if (cxl_ps_ctx->cxlr) {
>> +		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
>> +		struct cxl_region_params *p = &cxlr->params;
>> +
>> +		for (i = p->interleave_ways - 1; i >= 0; i--) {
>> +			struct cxl_endpoint_decoder *cxled = p->targets[i];
>> +
>> +			cxlmd = cxled_to_memdev(cxled);
>> +			cxlds = cxlmd->cxlds;
>> +			ret = cxl_mem_ps_set_attrs(dev, drv_data, cxlds,
>> +						   params, param_type);
>> +			if (ret)
>> +				return ret;
>> +		}
>> +	} else {
>> +		cxlmd = cxl_ps_ctx->cxlmd;
>> +		cxlds = cxlmd->cxlds;
>> +
>> +		return cxl_mem_ps_set_attrs(dev, drv_data, cxlds, params,
>param_type);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, void
>> +*drv_data, bool *enabled) {
>> +	struct cxl_memdev_ps_params params;
>> +	int ret;
>> +
>> +	ret = cxl_ps_get_attrs(dev, drv_data, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	*enabled = params.enable;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, void
>> +*drv_data, bool enable) {
>> +	struct cxl_memdev_ps_params params = {
>> +		.enable = enable,
>> +	};
>> +
>> +	return cxl_ps_set_attrs(dev, drv_data, &params,
>> +CXL_PS_PARAM_ENABLE); }
>> +
>> +static int cxl_patrol_scrub_read_min_scrub_cycle(struct device *dev, void
>*drv_data,
>> +						 u32 *min)
>> +{
>> +	struct cxl_memdev_ps_params params;
>> +	int ret;
>> +
>> +	ret = cxl_ps_get_attrs(dev, drv_data, &params);
>> +	if (ret)
>> +		return ret;
>> +	*min = params.min_scrub_cycle_hrs * CXL_DEV_HOUR_IN_SECS;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_patrol_scrub_read_max_scrub_cycle(struct device *dev, void
>*drv_data,
>> +						 u32 *max)
>> +{
>> +	*max = U8_MAX * CXL_DEV_HOUR_IN_SECS; /* Max set by register size
>*/
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_patrol_scrub_read_scrub_cycle(struct device *dev, void
>*drv_data,
>> +					     u32 *scrub_cycle_secs)
>> +{
>> +	struct cxl_memdev_ps_params params;
>> +	int ret;
>> +
>> +	ret = cxl_ps_get_attrs(dev, drv_data, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	*scrub_cycle_secs = params.scrub_cycle_hrs *
>CXL_DEV_HOUR_IN_SECS;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_patrol_scrub_write_scrub_cycle(struct device *dev, void
>*drv_data,
>> +					      u32 scrub_cycle_secs)
>> +{
>> +	struct cxl_memdev_ps_params params = {
>> +		.scrub_cycle_hrs = scrub_cycle_secs /
>CXL_DEV_HOUR_IN_SECS,
>> +	};
>> +
>> +	return cxl_ps_set_attrs(dev, drv_data, &params,
>> +CXL_PS_PARAM_SCRUB_CYCLE); }
>> +
>> +static const struct edac_scrub_ops cxl_ps_scrub_ops = {
>> +	.get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
>> +	.set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
>> +	.min_cycle_read = cxl_patrol_scrub_read_min_scrub_cycle,
>> +	.max_cycle_read = cxl_patrol_scrub_read_max_scrub_cycle,
>> +	.cycle_duration_read = cxl_patrol_scrub_read_scrub_cycle,
>> +	.cycle_duration_write = cxl_patrol_scrub_write_scrub_cycle,
>> +};
>> +
>> +int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct
>> +cxl_region *cxlr) {
>> +	struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
>> +	struct cxl_dev_state *cxlds;
>> +	struct cxl_patrol_scrub_context *cxl_ps_ctx;
>> +	struct cxl_feat_entry feat_entry;
>> +	char cxl_dev_name[CXL_SCRUB_NAME_LEN];
>> +	int rc, i, num_ras_features = 0;
>> +
>> +	if (cxlr) {
>> +		struct cxl_region_params *p = &cxlr->params;
>> +
>> +		for (i = p->interleave_ways - 1; i >= 0; i--) {
>> +			struct cxl_endpoint_decoder *cxled = p->targets[i];
>> +
>> +			cxlmd = cxled_to_memdev(cxled);
>> +			cxlds = cxlmd->cxlds;
>> +			memset(&feat_entry, 0, sizeof(feat_entry));
>> +			rc = cxl_get_supported_feature_entry(cxlds,
>&cxl_patrol_scrub_uuid,
>> +							     &feat_entry);
>> +			if (rc < 0)
>> +				return rc;
>> +			if (!(feat_entry.attr_flags &
>CXL_FEAT_ENTRY_FLAG_CHANGABLE))
>> +				return -EOPNOTSUPP;
>> +		}
>> +	} else {
>> +		cxlds = cxlmd->cxlds;
>> +		rc = cxl_get_supported_feature_entry(cxlds,
>&cxl_patrol_scrub_uuid,
>> +						     &feat_entry);
>> +		if (rc < 0)
>> +			return rc;
>> +
>> +		if (!(feat_entry.attr_flags &
>CXL_FEAT_ENTRY_FLAG_CHANGABLE))
>> +			return -EOPNOTSUPP;
>> +	}
>> +
>> +	cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx),
>GFP_KERNEL);
>> +	if (!cxl_ps_ctx)
>> +		return -ENOMEM;
>> +
>> +	*cxl_ps_ctx = (struct cxl_patrol_scrub_context) {
>> +		.instance = cxl_ps_ctx->instance,
>> +		.get_feat_size = feat_entry.get_feat_size,
>> +		.set_feat_size = feat_entry.set_feat_size,
>> +		.get_version = feat_entry.get_feat_ver,
>> +		.set_version = feat_entry.set_feat_ver,
>> +		.set_effects = feat_entry.set_effects,
>> +	};
>> +	if (cxlr) {
>> +		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
>> +			 "cxl_region%d", cxlr->id);
>> +		cxl_ps_ctx->cxlr = cxlr;
>> +	} else {
>> +		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
>> +			 "%s_%s", "cxl", dev_name(&cxlmd->dev));
>> +		cxl_ps_ctx->cxlmd = cxlmd;
>> +	}
>> +
>> +	ras_features[num_ras_features].ft_type = RAS_FEAT_SCRUB;
>> +	ras_features[num_ras_features].scrub_ops = &cxl_ps_scrub_ops;
>> +	ras_features[num_ras_features].ctx = cxl_ps_ctx;
>> +	num_ras_features++;
>> +
>> +	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
>> +				 num_ras_features, ras_features); }
>> +EXPORT_SYMBOL_NS_GPL(cxl_mem_ras_features_init, CXL);
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 21ad5f242875..1cc29ec9ffac 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -3434,6 +3434,12 @@ static int cxl_region_probe(struct device *dev)
>>  					p->res->start, p->res->end, cxlr,
>>  					is_system_ram) > 0)
>>  			return 0;
>> +
>> +		rc = cxl_mem_ras_features_init(NULL, cxlr);
>> +		if (rc)
>> +			dev_warn(&cxlr->dev, "CXL RAS features init for
>region_id=%d failed\n",
>> +				 cxlr->id);
>> +
>>  		return devm_cxl_add_dax_region(cxlr);
>>  	default:
>>  		dev_dbg(&cxlr->dev, "unsupported region mode: %d\n", diff --
>git
>> a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index
>> b565a061a4e3..2187c3378eaa 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -889,6 +889,13 @@ int cxl_trigger_poison_list(struct cxl_memdev
>> *cxlmd);  int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
>> int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
>>
>> +#ifdef CONFIG_CXL_RAS_FEAT
>> +int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct
>> +cxl_region *cxlr); #else static inline int
>> +cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region
>> +*cxlr) { return 0; } #endif
>> +
>>  #ifdef CONFIG_CXL_SUSPEND
>>  void cxl_mem_active_inc(void);
>>  void cxl_mem_active_dec(void);
>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index
>> 7de232eaeb17..be2e69548909 100644
>> --- a/drivers/cxl/mem.c
>> +++ b/drivers/cxl/mem.c
>> @@ -117,6 +117,10 @@ static int cxl_mem_probe(struct device *dev)
>>  	if (!cxlds->media_ready)
>>  		return -EBUSY;
>>
>> +	rc = cxl_mem_ras_features_init(cxlmd, NULL);
>> +	if (rc)
>> +		dev_warn(&cxlmd->dev, "CXL RAS features init failed\n");
>> +
>>  	/*
>>  	 * Someone is trying to reattach this device after it lost its port
>>  	 * connection (an endpoint port previously registered by this memdev
>> was
>> --
>> 2.34.1
>>
>
>--
>Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 11/17] cxl/memfeature: Add CXL memory device ECS control feature
  2024-09-30 18:12   ` Fan Ni
@ 2024-10-01  8:39     ` Shiju Jose
  0 siblings, 0 replies; 39+ messages in thread
From: Shiju Jose @ 2024-10-01  8:39 UTC (permalink / raw)
  To: Fan Ni
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	Jonathan Cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	jgroves, vsalve, tanxiaofei, Zengtao (B),
	Roberto Sassu, kangkang.shen, wanghuiqiang, Linuxarm

>-----Original Message-----
>From: Fan Ni <nifan.cxl@gmail.com>
>Sent: 30 September 2024 19:13
>To: Shiju Jose <shiju.jose@huawei.com>
>Cc: linux-edac@vger.kernel.org; linux-cxl@vger.kernel.org; linux-
>acpi@vger.kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
>bp@alien8.de; tony.luck@intel.com; rafael@kernel.org; lenb@kernel.org;
>mchehab@kernel.org; dan.j.williams@intel.com; dave@stgolabs.net; Jonathan
>Cameron <jonathan.cameron@huawei.com>; dave.jiang@intel.com;
>alison.schofield@intel.com; vishal.l.verma@intel.com; ira.weiny@intel.com;
>david@redhat.com; Vilas.Sridharan@amd.com; leo.duran@amd.com;
>Yazen.Ghannam@amd.com; rientjes@google.com; jiaqiyan@google.com;
>Jon.Grimm@amd.com; dave.hansen@linux.intel.com;
>naoya.horiguchi@nec.com; james.morse@arm.com; jthoughton@google.com;
>somasundaram.a@hpe.com; erdemaktas@google.com; pgonda@google.com;
>duenwen@google.com; mike.malvestuto@intel.com; gthelen@google.com;
>wschwartz@amperecomputing.com; dferguson@amperecomputing.com;
>wbs@os.amperecomputing.com; nifan.cxl@gmail.com; jgroves@micron.com;
>vsalve@micron.com; tanxiaofei <tanxiaofei@huawei.com>; Zengtao (B)
><prime.zeng@hisilicon.com>; Roberto Sassu <roberto.sassu@huawei.com>;
>kangkang.shen@futurewei.com; wanghuiqiang <wanghuiqiang@huawei.com>;
>Linuxarm <linuxarm@huawei.com>
>Subject: Re: [PATCH v12 11/17] cxl/memfeature: Add CXL memory device ECS
>control feature
>
>On Wed, Sep 11, 2024 at 10:04:40AM +0100, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> CXL spec 3.1 section 8.2.9.9.11.2 describes the DDR5 ECS (Error Check
>> Scrub) control feature.
>> The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM
>> Specification (JESD79-5) and allows the DRAM to internally read,
>> correct single-bit errors, and write back corrected data bits to the
>> DRAM array while providing transparency to error counts.
>>
>> The ECS control allows the requester to change the log entry type, the
>> ECS threshold count provided that the request is within the definition
>> specified in DDR5 mode registers, change mode between codeword mode
>> and row count mode, and reset the ECS counter.
>>
>> Register with EDAC RAS control feature driver, which gets the ECS attr
>> descriptors from the EDAC ECS and expose sysfs ECS control attributes
>> to the userspace.
>> For example ECS control for the memory media FRU 0 in CXL mem0 device
>> is in /sys/bus/edac/devices/cxl_mem0/ecs_fru0/
>>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> ---
>>  drivers/cxl/core/memfeature.c | 439
>> +++++++++++++++++++++++++++++++++-
>>  1 file changed, 438 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cxl/core/memfeature.c
>> b/drivers/cxl/core/memfeature.c index 90c68d20b02b..5d4057fa304c
>> 100644
>> --- a/drivers/cxl/core/memfeature.c
>> +++ b/drivers/cxl/core/memfeature.c
>> @@ -19,7 +19,7 @@
>>  #include <cxl.h>
>>  #include <linux/edac.h>
>>
>> -#define CXL_DEV_NUM_RAS_FEATURES	1
>> +#define CXL_DEV_NUM_RAS_FEATURES	2
>>  #define CXL_DEV_HOUR_IN_SECS	3600
>>
>>  #define CXL_SCRUB_NAME_LEN	128
>> @@ -303,6 +303,405 @@ static const struct edac_scrub_ops cxl_ps_scrub_ops
>= {
>>  	.cycle_duration_write = cxl_patrol_scrub_write_scrub_cycle,
>>  };
>>
>...
>> +	case CXL_ECS_PARAM_THRESHOLD:
>> +		wr_attrs[fru_id].ecs_config &=
>~CXL_ECS_THRESHOLD_COUNT_MASK;
>> +		switch (params->threshold) {
>> +		case 256:
>> +			wr_attrs[fru_id].ecs_config |=
>FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
>> +
>ECS_THRESHOLD_256);
>> +			break;
>> +		case 1024:
>> +			wr_attrs[fru_id].ecs_config |=
>FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
>> +
>ECS_THRESHOLD_1024);
>> +			break;
>> +		case 4096:
>> +			wr_attrs[fru_id].ecs_config |=
>FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK,
>> +
>ECS_THRESHOLD_4096);
>> +			break;
>> +		default:
>> +			dev_err(dev,
>> +				"Invalid CXL ECS scrub threshold count(%d) to
>set\n",
>> +				params->threshold);
>> +			dev_err(dev,
>> +				"Supported scrub threshold count:
>256,1024,4096\n");
>> +			return -EINVAL;
>> +		}
>> +		break;
>> +	case CXL_ECS_PARAM_MODE:
>> +		if (params->mode != ECS_MODE_COUNTS_ROWS &&
>> +		    params->mode != ECS_MODE_COUNTS_CODEWORDS) {
>> +			dev_err(dev,
>> +				"Invalid CXL ECS scrub mode(%d) to set\n",
>> +				params->mode);
>> +			dev_err(dev,
>> +				"Mode 0: ECS counts rows with errors"
>> +				" 1: ECS counts codewords with errors\n");
>The messaging here can be improved. When printed out in dmesg, it looks like
>
>root@localhost:~# echo 2 > /sys/bus/edac/devices/cxl_mem0/ecs_fru0/mode
>----
>[ 6099.073006] cxl_mem mem0: Invalid CXL ECS scrub mode(2) to set [
>6099.074407] cxl_mem mem0: Mode 0: ECS counts rows with errors 1: ECS
>counts codewords with errors
>----
>Maybe use similar message format as threshold above, like
>+			dev_err(dev,
>+				"Supported ECS mode: 0: ECS counts rows with
>errors; 1: ECS counts
>+codewords with errors\n");

Will modify.
>
>Fan

Thanks,
Shiju
>> +			return -EINVAL;
>> +		}
>> +		wr_attrs[fru_id].ecs_config &= ~CXL_ECS_MODE_MASK;
>> +		wr_attrs[fru_id].ecs_config |=
>FIELD_PREP(CXL_ECS_MODE_MASK,
>> +							  params->mode);
>> +		break;
>> +	case CXL_ECS_PARAM_RESET_COUNTER:
>> +		wr_attrs[fru_id].ecs_config &=
>~CXL_ECS_RESET_COUNTER_MASK;
>> +		wr_attrs[fru_id].ecs_config |=
>FIELD_PREP(CXL_ECS_RESET_COUNTER_MASK,
>> +							  params-
>>reset_counter);
>> +		break;
>> +	default:
>> +		dev_err(dev, "Invalid CXL ECS parameter to set\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	ret = cxl_set_feature(cxlds, cxl_ecs_uuid, cxl_ecs_ctx->set_version,
>> +			      wr_attrs, wr_data_size,
>> +
>CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
>> +	if (ret) {
>> +		dev_err(dev, "CXL ECS set feature failed ret=%d\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_ecs_get_log_entry_type(struct device *dev, void *drv_data,
>> +				      int fru_id, u32 *val)
>> +{
>> +	struct cxl_ecs_params params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	*val = params.log_entry_type;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_ecs_set_log_entry_type(struct device *dev, void *drv_data,
>> +				      int fru_id, u32 val)
>> +{
>> +	struct cxl_ecs_params params = {
>> +		.log_entry_type = val,
>> +	};
>> +
>> +	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
>> +				     &params,
>CXL_ECS_PARAM_LOG_ENTRY_TYPE); }
>> +
>> +static int cxl_ecs_get_log_entry_type_per_dram(struct device *dev, void
>*drv_data,
>> +					       int fru_id, u32 *val)
>> +{
>> +	struct cxl_ecs_params params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (params.log_entry_type == ECS_LOG_ENTRY_TYPE_DRAM)
>> +		*val = 1;
>> +	else
>> +		*val = 0;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_ecs_get_log_entry_type_per_memory_media(struct device
>*dev,
>> +						       void *drv_data,
>> +						       int fru_id, u32 *val)
>> +{
>> +	struct cxl_ecs_params params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (params.log_entry_type ==
>ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU)
>> +		*val = 1;
>> +	else
>> +		*val = 0;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_ecs_get_mode(struct device *dev, void *drv_data,
>> +			    int fru_id, u32 *val)
>> +{
>> +	struct cxl_ecs_params params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	*val = params.mode;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_ecs_set_mode(struct device *dev, void *drv_data,
>> +			    int fru_id, u32 val)
>> +{
>> +	struct cxl_ecs_params params = {
>> +		.mode = val,
>> +	};
>> +
>> +	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
>> +				     &params, CXL_ECS_PARAM_MODE); }
>> +
>> +static int cxl_ecs_get_mode_counts_rows(struct device *dev, void *drv_data,
>> +					int fru_id, u32 *val)
>> +{
>> +	struct cxl_ecs_params params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (params.mode == ECS_MODE_COUNTS_ROWS)
>> +		*val = 1;
>> +	else
>> +		*val = 0;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_ecs_get_mode_counts_codewords(struct device *dev, void
>*drv_data,
>> +					     int fru_id, u32 *val)
>> +{
>> +	struct cxl_ecs_params params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (params.mode == ECS_MODE_COUNTS_CODEWORDS)
>> +		*val = 1;
>> +	else
>> +		*val = 0;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_ecs_reset(struct device *dev, void *drv_data, int
>> +fru_id, u32 val) {
>> +	struct cxl_ecs_params params = {
>> +		.reset_counter = val,
>> +	};
>> +
>> +	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
>> +				     &params,
>CXL_ECS_PARAM_RESET_COUNTER); }
>> +
>> +static int cxl_ecs_get_threshold(struct device *dev, void *drv_data,
>> +				 int fru_id, u32 *val)
>> +{
>> +	struct cxl_ecs_params params;
>> +	int ret;
>> +
>> +	ret = cxl_mem_ecs_get_attrs(dev, drv_data, fru_id, &params);
>> +	if (ret)
>> +		return ret;
>> +
>> +	*val = params.threshold;
>> +
>> +	return 0;
>> +}
>> +
>> +static int cxl_ecs_set_threshold(struct device *dev, void *drv_data,
>> +				 int fru_id, u32 val)
>> +{
>> +	struct cxl_ecs_params params = {
>> +		.threshold = val,
>> +	};
>> +
>> +	return cxl_mem_ecs_set_attrs(dev, drv_data, fru_id,
>> +				     &params, CXL_ECS_PARAM_THRESHOLD); }
>> +
>> +static const struct edac_ecs_ops cxl_ecs_ops = {
>> +	.get_log_entry_type = cxl_ecs_get_log_entry_type,
>> +	.set_log_entry_type = cxl_ecs_set_log_entry_type,
>> +	.get_log_entry_type_per_dram =
>cxl_ecs_get_log_entry_type_per_dram,
>> +	.get_log_entry_type_per_memory_media =
>> +
>	cxl_ecs_get_log_entry_type_per_memory_media,
>> +	.get_mode = cxl_ecs_get_mode,
>> +	.set_mode = cxl_ecs_set_mode,
>> +	.get_mode_counts_codewords = cxl_ecs_get_mode_counts_codewords,
>> +	.get_mode_counts_rows = cxl_ecs_get_mode_counts_rows,
>> +	.reset = cxl_ecs_reset,
>> +	.get_threshold = cxl_ecs_get_threshold,
>> +	.set_threshold = cxl_ecs_set_threshold, };
>> +
>>  int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct
>> cxl_region *cxlr)  {
>>  	struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
>> @@ -310,7 +709,9 @@ int cxl_mem_ras_features_init(struct cxl_memdev
>*cxlmd, struct cxl_region *cxlr)
>>  	struct cxl_patrol_scrub_context *cxl_ps_ctx;
>>  	struct cxl_feat_entry feat_entry;
>>  	char cxl_dev_name[CXL_SCRUB_NAME_LEN];
>> +	struct cxl_ecs_context *cxl_ecs_ctx;
>>  	int rc, i, num_ras_features = 0;
>> +	int num_media_frus;
>>
>>  	if (cxlr) {
>>  		struct cxl_region_params *p = &cxlr->params; @@ -366,6
>+767,42 @@
>> int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region
>*cxlr)
>>  	ras_features[num_ras_features].ctx = cxl_ps_ctx;
>>  	num_ras_features++;
>>
>> +	if (!cxlr) {
>> +		rc = cxl_get_supported_feature_entry(cxlds, &cxl_ecs_uuid,
>> +						     &feat_entry);
>> +		if (rc < 0)
>> +			goto feat_register;
>> +
>> +		if (!(feat_entry.attr_flags &
>CXL_FEAT_ENTRY_FLAG_CHANGABLE))
>> +			goto feat_register;
>> +		num_media_frus = feat_entry.get_feat_size /
>> +					sizeof(struct cxl_ecs_rd_attrs);
>> +		if (!num_media_frus)
>> +			goto feat_register;
>> +
>> +		cxl_ecs_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ecs_ctx),
>> +					   GFP_KERNEL);
>> +		if (!cxl_ecs_ctx)
>> +			goto feat_register;
>> +		*cxl_ecs_ctx = (struct cxl_ecs_context) {
>> +			.get_feat_size = feat_entry.get_feat_size,
>> +			.set_feat_size = feat_entry.set_feat_size,
>> +			.get_version = feat_entry.get_feat_ver,
>> +			.set_version = feat_entry.set_feat_ver,
>> +			.set_effects = feat_entry.set_effects,
>> +			.num_media_frus = num_media_frus,
>> +			.cxlmd = cxlmd,
>> +		};
>> +
>> +		ras_features[num_ras_features].ft_type = RAS_FEAT_ECS;
>> +		ras_features[num_ras_features].ecs_ops = &cxl_ecs_ops;
>> +		ras_features[num_ras_features].ctx = cxl_ecs_ctx;
>> +		ras_features[num_ras_features].ecs_info.num_media_frus =
>> +
>	num_media_frus;
>> +		num_ras_features++;
>> +	}
>> +
>> +feat_register:
>>  	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
>>  				 num_ras_features, ras_features);  }
>> --
>> 2.34.1
>>
>
>--
>Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 13/17] ACPI:RAS2: Add ACPI RAS2 driver
  2024-09-11  9:04 ` [PATCH v12 13/17] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
@ 2024-10-01 15:47   ` Fan Ni
  0 siblings, 0 replies; 39+ messages in thread
From: Fan Ni @ 2024-10-01 15:47 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm

On Wed, Sep 11, 2024 at 10:04:42AM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Add support for ACPI RAS2 feature table (RAS2) defined in the
> ACPI 6.5 Specification, section 5.2.21.
> Driver contains RAS2 Init, which extracts the RAS2 table and driver
> adds platform device for each memory features which binds to the
s/features/feature/

Fan
> RAS2 memory driver.
> 
> Driver uses PCC mailbox to communicate with the ACPI HW and the
> driver adds OSPM interfaces to send RAS2 commands.
> 
> Co-developed-by: A Somasundaram <somasundaram.a@hpe.com>
> Signed-off-by: A Somasundaram <somasundaram.a@hpe.com>
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  drivers/acpi/Kconfig     |  10 +
>  drivers/acpi/Makefile    |   1 +
>  drivers/acpi/ras2.c      | 391 +++++++++++++++++++++++++++++++++++++++
>  include/acpi/ras2_acpi.h |  60 ++++++
>  4 files changed, 462 insertions(+)
>  create mode 100755 drivers/acpi/ras2.c
>  create mode 100644 include/acpi/ras2_acpi.h
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index e3a7c2aedd5f..482080f1f0c5 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -284,6 +284,16 @@ config ACPI_CPPC_LIB
>  	  If your platform does not support CPPC in firmware,
>  	  leave this option disabled.
>  
> +config ACPI_RAS2
> +	bool "ACPI RAS2 driver"
> +	select MAILBOX
> +	select PCC
> +	help
> +	  The driver adds support for ACPI RAS2 feature table(extracts RAS2
> +	  table from OS system table) and OSPM interfaces to send RAS2
> +	  commands via PCC mailbox subspace. Driver adds platform device for
> +	  the RAS2 memory features which binds to the RAS2 memory driver.
> +
>  config ACPI_PROCESSOR
>  	tristate "Processor"
>  	depends on X86 || ARM64 || LOONGARCH || RISCV
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 61ca4afe83dc..84e2a2519bae 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -100,6 +100,7 @@ obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
>  obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
>  obj-$(CONFIG_ACPI_CPPC_LIB)	+= cppc_acpi.o
>  obj-$(CONFIG_ACPI_SPCR_TABLE)	+= spcr.o
> +obj-$(CONFIG_ACPI_RAS2)		+= ras2.o
>  obj-$(CONFIG_ACPI_DEBUGGER_USER) += acpi_dbg.o
>  obj-$(CONFIG_ACPI_PPTT) 	+= pptt.o
>  obj-$(CONFIG_ACPI_PFRUT)	+= pfr_update.o pfr_telemetry.o
> diff --git a/drivers/acpi/ras2.c b/drivers/acpi/ras2.c
> new file mode 100755
> index 000000000000..5daf1510d19e
> --- /dev/null
> +++ b/drivers/acpi/ras2.c
> @@ -0,0 +1,391 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Implementation of ACPI RAS2 driver.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + *
> + * Support for RAS2 - ACPI 6.5 Specification, section 5.2.21
> + *
> + * Driver contains ACPI RAS2 init, which extracts the ACPI RAS2 table and
> + * get the PCC channel subspace for communicating with the ACPI compliant
> + * HW platform which supports ACPI RAS2. Driver adds platform devices
> + * for each RAS2 memory feature which binds to the memory ACPI RAS2 driver.
> + */
> +
> +#define pr_fmt(fmt)    "ACPI RAS2: " fmt
> +
> +#include <linux/delay.h>
> +#include <linux/export.h>
> +#include <linux/ktime.h>
> +#include <linux/platform_device.h>
> +#include <acpi/pcc.h>
> +#include <acpi/ras2_acpi.h>
> +
> +/*
> + * Arbitrary Retries for PCC commands because the
> + * remote processor could be much slower to reply.
> + */
> +#define RAS2_NUM_RETRIES 600
> +
> +#define RAS2_FEATURE_TYPE_MEMORY        0x00
> +
> +/* global variables for the RAS2 PCC subspaces */
> +static DEFINE_MUTEX(ras2_pcc_subspace_lock);
> +static LIST_HEAD(ras2_pcc_subspaces);
> +
> +static int ras2_report_cap_error(u32 cap_status)
> +{
> +	switch (cap_status) {
> +	case ACPI_RAS2_NOT_VALID:
> +	case ACPI_RAS2_NOT_SUPPORTED:
> +		return -EPERM;
> +	case ACPI_RAS2_BUSY:
> +		return -EBUSY;
> +	case ACPI_RAS2_FAILED:
> +	case ACPI_RAS2_ABORTED:
> +	case ACPI_RAS2_INVALID_DATA:
> +		return -EINVAL;
> +	default: /* 0 or other, Success */
> +		return 0;
> +	}
> +}
> +
> +static int ras2_check_pcc_chan(struct ras2_pcc_subspace *pcc_subspace)
> +{
> +	struct acpi_ras2_shared_memory __iomem *generic_comm_base = pcc_subspace->pcc_comm_addr;
> +	ktime_t next_deadline = ktime_add(ktime_get(), pcc_subspace->deadline);
> +	u32 cap_status;
> +	u16 status;
> +	u32 ret;
> +
> +	while (!ktime_after(ktime_get(), next_deadline)) {
> +		/*
> +		 * As per ACPI spec, the PCC space will be initialized by
> +		 * platform and should have set the command completion bit when
> +		 * PCC can be used by OSPM
> +		 */
> +		status = readw_relaxed(&generic_comm_base->status);
> +		if (status & RAS2_PCC_CMD_ERROR) {
> +			cap_status = readw_relaxed(&generic_comm_base->set_capabilities_status);
> +			ret = ras2_report_cap_error(cap_status);
> +
> +			status &= ~RAS2_PCC_CMD_ERROR;
> +			writew_relaxed(status, &generic_comm_base->status);
> +			return ret;
> +		}
> +		if (status & RAS2_PCC_CMD_COMPLETE)
> +			return 0;
> +		/*
> +		 * Reducing the bus traffic in case this loop takes longer than
> +		 * a few retries.
> +		 */
> +		msleep(10);
> +	}
> +
> +	return -EIO;
> +}
> +
> +/**
> + * ras2_send_pcc_cmd() - Send RAS2 command via PCC channel
> + * @ras2_ctx:	pointer to the RAS2 context structure
> + * @cmd:	command to send
> + *
> + * Returns: 0 on success, an error otherwise
> + */
> +int ras2_send_pcc_cmd(struct ras2_scrub_ctx *ras2_ctx, u16 cmd)
> +{
> +	struct ras2_pcc_subspace *pcc_subspace = ras2_ctx->pcc_subspace;
> +	struct acpi_ras2_shared_memory *generic_comm_base = pcc_subspace->pcc_comm_addr;
> +	static ktime_t last_cmd_cmpl_time, last_mpar_reset;
> +	struct mbox_chan *pcc_channel;
> +	unsigned int time_delta;
> +	static int mpar_count;
> +	int ret;
> +
> +	guard(mutex)(&ras2_pcc_subspace_lock);
> +	ret = ras2_check_pcc_chan(pcc_subspace);
> +	if (ret < 0)
> +		return ret;
> +	pcc_channel = pcc_subspace->pcc_chan->mchan;
> +
> +	/*
> +	 * Handle the Minimum Request Turnaround Time(MRTT)
> +	 * "The minimum amount of time that OSPM must wait after the completion
> +	 * of a command before issuing the next command, in microseconds"
> +	 */
> +	if (pcc_subspace->pcc_mrtt) {
> +		time_delta = ktime_us_delta(ktime_get(), last_cmd_cmpl_time);
> +		if (pcc_subspace->pcc_mrtt > time_delta)
> +			udelay(pcc_subspace->pcc_mrtt - time_delta);
> +	}
> +
> +	/*
> +	 * Handle the non-zero Maximum Periodic Access Rate(MPAR)
> +	 * "The maximum number of periodic requests that the subspace channel can
> +	 * support, reported in commands per minute. 0 indicates no limitation."
> +	 *
> +	 * This parameter should be ideally zero or large enough so that it can
> +	 * handle maximum number of requests that all the cores in the system can
> +	 * collectively generate. If it is not, we will follow the spec and just
> +	 * not send the request to the platform after hitting the MPAR limit in
> +	 * any 60s window
> +	 */
> +	if (pcc_subspace->pcc_mpar) {
> +		if (mpar_count == 0) {
> +			time_delta = ktime_ms_delta(ktime_get(), last_mpar_reset);
> +			if (time_delta < 60 * MSEC_PER_SEC) {
> +				dev_dbg(ras2_ctx->dev,
> +					"PCC cmd not sent due to MPAR limit");
> +				return -EIO;
> +			}
> +			last_mpar_reset = ktime_get();
> +			mpar_count = pcc_subspace->pcc_mpar;
> +		}
> +		mpar_count--;
> +	}
> +
> +	/* Write to the shared comm region. */
> +	writew_relaxed(cmd, &generic_comm_base->command);
> +
> +	/* Flip CMD COMPLETE bit */
> +	writew_relaxed(0, &generic_comm_base->status);
> +
> +	/* Ring doorbell */
> +	ret = mbox_send_message(pcc_channel, &cmd);
> +	if (ret < 0) {
> +		dev_err(ras2_ctx->dev,
> +			"Err sending PCC mbox message. cmd:%d, ret:%d\n",
> +			cmd, ret);
> +		return ret;
> +	}
> +
> +	/*
> +	 * If Minimum Request Turnaround Time is non-zero, we need
> +	 * to record the completion time of both READ and WRITE
> +	 * command for proper handling of MRTT, so we need to check
> +	 * for pcc_mrtt in addition to CMD_READ
> +	 */
> +	if (cmd == RAS2_PCC_CMD_EXEC || pcc_subspace->pcc_mrtt) {
> +		ret = ras2_check_pcc_chan(pcc_subspace);
> +		if (pcc_subspace->pcc_mrtt)
> +			last_cmd_cmpl_time = ktime_get();
> +	}
> +
> +	if (pcc_channel->mbox->txdone_irq)
> +		mbox_chan_txdone(pcc_channel, ret);
> +	else
> +		mbox_client_txdone(pcc_channel, ret);
> +
> +	return ret >= 0 ? 0 : ret;
> +}
> +EXPORT_SYMBOL_GPL(ras2_send_pcc_cmd);
> +
> +static int ras2_register_pcc_channel(struct device *dev, struct ras2_scrub_ctx *ras2_ctx,
> +				     int pcc_subspace_id)
> +{
> +	struct acpi_pcct_hw_reduced *ras2_ss;
> +	struct mbox_client *ras2_mbox_cl;
> +	struct pcc_mbox_chan *pcc_chan;
> +	struct ras2_pcc_subspace *pcc_subspace;
> +
> +	if (pcc_subspace_id < 0)
> +		return -EINVAL;
> +
> +	mutex_lock(&ras2_pcc_subspace_lock);
> +	list_for_each_entry(pcc_subspace, &ras2_pcc_subspaces, elem) {
> +		if (pcc_subspace->pcc_subspace_id == pcc_subspace_id) {
> +			ras2_ctx->pcc_subspace = pcc_subspace;
> +			pcc_subspace->ref_count++;
> +			mutex_unlock(&ras2_pcc_subspace_lock);
> +			return 0;
> +		}
> +	}
> +	mutex_unlock(&ras2_pcc_subspace_lock);
> +
> +	pcc_subspace = kcalloc(1, sizeof(*pcc_subspace), GFP_KERNEL);
> +	if (!pcc_subspace)
> +		return -ENOMEM;
> +	pcc_subspace->pcc_subspace_id = pcc_subspace_id;
> +	ras2_mbox_cl = &pcc_subspace->mbox_client;
> +	ras2_mbox_cl->dev = dev;
> +	ras2_mbox_cl->knows_txdone = true;
> +
> +	pcc_chan = pcc_mbox_request_channel(ras2_mbox_cl, pcc_subspace_id);
> +	if (IS_ERR(pcc_chan)) {
> +		kfree(pcc_subspace);
> +		return PTR_ERR(pcc_chan);
> +	}
> +	pcc_subspace->pcc_chan = pcc_chan;
> +	ras2_ss = pcc_chan->mchan->con_priv;
> +	pcc_subspace->comm_base_addr = ras2_ss->base_address;
> +
> +	/*
> +	 * ras2_ss->latency is just a Nominal value. In reality
> +	 * the remote processor could be much slower to reply.
> +	 * So add an arbitrary amount of wait on top of Nominal.
> +	 */
> +	pcc_subspace->deadline = ns_to_ktime(RAS2_NUM_RETRIES * ras2_ss->latency *
> +					     NSEC_PER_USEC);
> +	pcc_subspace->pcc_mrtt = ras2_ss->min_turnaround_time;
> +	pcc_subspace->pcc_mpar = ras2_ss->max_access_rate;
> +	pcc_subspace->pcc_comm_addr = acpi_os_ioremap(pcc_subspace->comm_base_addr,
> +						      ras2_ss->length);
> +	/* Set flag so that we dont come here for each CPU. */
> +	pcc_subspace->pcc_channel_acquired = true;
> +
> +	mutex_lock(&ras2_pcc_subspace_lock);
> +	list_add(&pcc_subspace->elem, &ras2_pcc_subspaces);
> +	pcc_subspace->ref_count++;
> +	mutex_unlock(&ras2_pcc_subspace_lock);
> +	ras2_ctx->pcc_subspace = pcc_subspace;
> +
> +	return 0;
> +}
> +
> +static void ras2_unregister_pcc_channel(void *ctx)
> +{
> +	struct ras2_scrub_ctx *ras2_ctx = ctx;
> +	struct ras2_pcc_subspace *pcc_subspace = ras2_ctx->pcc_subspace;
> +
> +	if (!pcc_subspace  || !pcc_subspace->pcc_chan)
> +		return;
> +
> +	guard(mutex)(&ras2_pcc_subspace_lock);
> +	if (pcc_subspace->ref_count > 0)
> +		pcc_subspace->ref_count--;
> +	if (!pcc_subspace->ref_count) {
> +		list_del(&pcc_subspace->elem);
> +		pcc_mbox_free_channel(pcc_subspace->pcc_chan);
> +		kfree(pcc_subspace);
> +	}
> +}
> +
> +/**
> + * devm_ras2_register_pcc_channel() - Register RAS2 PCC channel
> + * @dev:		pointer to the RAS2 device
> + * @ras2_ctx:		pointer to the RAS2 context structure
> + * @pcc_subspace_id:	identifier of the RAS2 PCC channel.
> + *
> + * Returns: 0 on success, an error otherwise
> + */
> +int devm_ras2_register_pcc_channel(struct device *dev, struct ras2_scrub_ctx *ras2_ctx,
> +				   int pcc_subspace_id)
> +{
> +	int ret;
> +
> +	ret = ras2_register_pcc_channel(dev, ras2_ctx, pcc_subspace_id);
> +	if (ret)
> +		return ret;
> +
> +	return devm_add_action_or_reset(dev, ras2_unregister_pcc_channel, ras2_ctx);
> +}
> +EXPORT_SYMBOL_NS_GPL(devm_ras2_register_pcc_channel, ACPI_RAS2);
> +
> +static struct platform_device *ras2_add_platform_device(char *name, int channel)
> +{
> +	int ret;
> +	struct platform_device *pdev __free(platform_device_put) =
> +		platform_device_alloc(name, PLATFORM_DEVID_AUTO);
> +	if (!pdev)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ret = platform_device_add_data(pdev, &channel, sizeof(channel));
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	ret = platform_device_add(pdev);
> +	if (ret)
> +		return ERR_PTR(ret);
> +
> +	return_ptr(pdev);
> +}
> +
> +static int __init ras2_acpi_init(void)
> +{
> +	struct acpi_table_header *pAcpiTable = NULL;
> +	struct acpi_ras2_pcc_desc *pcc_desc_list;
> +	struct acpi_table_ras2 *pRas2Table;
> +	struct platform_device *pdev;
> +	int pcc_subspace_id;
> +	acpi_size ras2_size;
> +	acpi_status status;
> +	u8 count = 0, i;
> +	int ret;
> +
> +	status = acpi_get_table("RAS2", 0, &pAcpiTable);
> +	if (ACPI_FAILURE(status) || !pAcpiTable) {
> +		pr_err("ACPI RAS2 driver failed to initialize, get table failed\n");
> +		return -EINVAL;
> +	}
> +
> +	ras2_size = pAcpiTable->length;
> +	if (ras2_size < sizeof(struct acpi_table_ras2)) {
> +		pr_err("ACPI RAS2 table present but broken (too short #1)\n");
> +		ret = -EINVAL;
> +		goto free_ras2_table;
> +	}
> +
> +	pRas2Table = (struct acpi_table_ras2 *)pAcpiTable;
> +	if (pRas2Table->num_pcc_descs <= 0) {
> +		pr_err("ACPI RAS2 table does not contain PCC descriptors\n");
> +		ret = -EINVAL;
> +		goto free_ras2_table;
> +	}
> +
> +	struct platform_device **pdev_list __free(kfree) =
> +			kcalloc(pRas2Table->num_pcc_descs, sizeof(*pdev_list),
> +				GFP_KERNEL);
> +	if (!pdev_list) {
> +		ret = -ENOMEM;
> +		goto free_ras2_table;
> +	}
> +
> +	pcc_desc_list = (struct acpi_ras2_pcc_desc *)(pRas2Table + 1);
> +	/* Double scan for the case of only one actual controller */
> +	pcc_subspace_id = -1;
> +	count = 0;
> +	for (i = 0; i < pRas2Table->num_pcc_descs; i++, pcc_desc_list++) {
> +		if (pcc_desc_list->feature_type != RAS2_FEATURE_TYPE_MEMORY)
> +			continue;
> +		if (pcc_subspace_id == -1) {
> +			pcc_subspace_id = pcc_desc_list->channel_id;
> +			count++;
> +		}
> +		if (pcc_desc_list->channel_id != pcc_subspace_id)
> +			count++;
> +	}
> +	if (count == 1) {
> +		pdev = ras2_add_platform_device("acpi_ras2", pcc_subspace_id);
> +		if (!pdev) {
> +			ret = -ENODEV;
> +			goto free_ras2_pdev;
> +		}
> +		pdev_list[0] = pdev;
> +		return 0;
> +	}
> +
> +	count = 0;
> +	for (i = 0; i < pRas2Table->num_pcc_descs; i++, pcc_desc_list++) {
> +		if (pcc_desc_list->feature_type != RAS2_FEATURE_TYPE_MEMORY)
> +			continue;
> +		pcc_subspace_id = pcc_desc_list->channel_id;
> +		/* Add the platform device and bind ACPI RAS2 memory driver */
> +		pdev = ras2_add_platform_device("acpi_ras2", pcc_subspace_id);
> +		if (!pdev)
> +			goto free_ras2_pdev;
> +		pdev_list[count++] = pdev;
> +	}
> +
> +	acpi_put_table(pAcpiTable);
> +	return 0;
> +
> +free_ras2_pdev:
> +	for (i = count; i >= 0; i++)
> +		platform_device_put(pdev_list[i]);
> +
> +free_ras2_table:
> +	acpi_put_table(pAcpiTable);
> +
> +	return ret;
> +}
> +late_initcall(ras2_acpi_init)
> diff --git a/include/acpi/ras2_acpi.h b/include/acpi/ras2_acpi.h
> new file mode 100644
> index 000000000000..edfca253d88a
> --- /dev/null
> +++ b/include/acpi/ras2_acpi.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * RAS2 ACPI driver header file
> + *
> + * (C) Copyright 2014, 2015 Hewlett-Packard Enterprises
> + *
> + * Copyright (c) 2024 HiSilicon Limited
> + */
> +
> +#ifndef _RAS2_ACPI_H
> +#define _RAS2_ACPI_H
> +
> +#include <linux/acpi.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/types.h>
> +
> +#define RAS2_PCC_CMD_COMPLETE	BIT(0)
> +#define RAS2_PCC_CMD_ERROR	BIT(2)
> +
> +/* RAS2 specific PCC commands */
> +#define RAS2_PCC_CMD_EXEC 0x01
> +
> +struct device;
> +
> +/* Data structures for PCC communication and RAS2 table */
> +struct pcc_mbox_chan;
> +
> +struct ras2_pcc_subspace {
> +	int pcc_subspace_id;
> +	struct mbox_client mbox_client;
> +	struct pcc_mbox_chan *pcc_chan;
> +	struct acpi_ras2_shared_memory __iomem *pcc_comm_addr;
> +	u64 comm_base_addr;
> +	bool pcc_channel_acquired;
> +	ktime_t deadline;
> +	unsigned int pcc_mpar;
> +	unsigned int pcc_mrtt;
> +	struct list_head elem;
> +	u16 ref_count;
> +};
> +
> +struct ras2_scrub_ctx {
> +	struct device *dev;
> +	struct ras2_pcc_subspace *pcc_subspace;
> +	int id;
> +	u8 instance;
> +	struct device *scrub_dev;
> +	bool bg;
> +	u64 base, size;
> +	u8 scrub_cycle_hrs, min_scrub_cycle, max_scrub_cycle;
> +	/* Lock to provide mutually exclusive access to PCC channel */
> +	struct mutex lock;
> +};
> +
> +int ras2_send_pcc_cmd(struct ras2_scrub_ctx *ras2_ctx, u16 cmd);
> +int devm_ras2_register_pcc_channel(struct device *dev, struct ras2_scrub_ctx *ras2_ctx,
> +				   int pcc_subspace_id);
> +
> +#endif /* _RAS2_ACPI_H */
> -- 
> 2.34.1
> 

-- 
Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol scrub control feature
  2024-09-11  9:04 ` [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
  2024-09-30 17:38   ` Fan Ni
@ 2024-10-01 19:47   ` Fan Ni
  1 sibling, 0 replies; 39+ messages in thread
From: Fan Ni @ 2024-10-01 19:47 UTC (permalink / raw)
  To: shiju.jose
  Cc: linux-edac, linux-cxl, linux-acpi, linux-mm, linux-kernel, bp,
	tony.luck, rafael, lenb, mchehab, dan.j.williams, dave,
	jonathan.cameron, dave.jiang, alison.schofield, vishal.l.verma,
	ira.weiny, david, Vilas.Sridharan, leo.duran, Yazen.Ghannam,
	rientjes, jiaqiyan, Jon.Grimm, dave.hansen, naoya.horiguchi,
	james.morse, jthoughton, somasundaram.a, erdemaktas, pgonda,
	duenwen, mike.malvestuto, gthelen, wschwartz, dferguson, wbs,
	nifan.cxl, jgroves, vsalve, tanxiaofei, prime.zeng,
	roberto.sassu, kangkang.shen, wanghuiqiang, linuxarm

On Wed, Sep 11, 2024 at 10:04:39AM +0100, shiju.jose@huawei.com wrote:
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
> feature. The device patrol scrub proactively locates and makes corrections
> to errors in regular cycle.
> 
> Allow specifying the number of hours within which the patrol scrub must be
> completed, subject to minimum and maximum limits reported by the device.
> Also allow disabling scrub allowing trade-off error rates against
> performance.
> 
> Add support for CXL memory device based patrol scrub control.
> Register with EDAC RAS control feature driver, which gets the scrub attr
> descriptors from the EDAC scrub and expose sysfs scrub control attributes
> to the userspace.
> For example CXL device based scrub control for the CXL mem0 device is
> exposed in /sys/bus/edac/devices/cxl_mem0/scrub*/
> 
> Also add support for region based CXL memory patrol scrub control.
> CXL memory region may be interleaved across one or more CXL memory devices.
> For example region based scrub control for CXL region1 is exposed in
> /sys/bus/edac/devices/cxl_region1/scrub*/
> 
> Open Questions:
> Q1: CXL 3.1 spec defined patrol scrub control feature at CXL memory devices
> with supporting set scrub cycle and enable/disable scrub. but not based on
> HPA range. Thus presently scrub control for a region is implemented based
> on all associated CXL memory devices.
> What is the exact use case for the CXL region based scrub control?
> How the HPA range, which Dan asked for region based scrubbing is used?
> Does spec change is required for patrol scrub control feature with support
> for setting the HPA range?
> 
> Q2: Both CXL device based and CXL region based scrub control would be
> enabled at the same time in a system?
> 
> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  Documentation/edac/edac-scrub.rst |  74 ++++++
>  drivers/cxl/Kconfig               |  18 ++
>  drivers/cxl/core/Makefile         |   1 +
>  drivers/cxl/core/memfeature.c     | 372 ++++++++++++++++++++++++++++++
>  drivers/cxl/core/region.c         |   6 +
>  drivers/cxl/cxlmem.h              |   7 +
>  drivers/cxl/mem.c                 |   4 +
>  7 files changed, 482 insertions(+)
>  create mode 100644 Documentation/edac/edac-scrub.rst
>  create mode 100644 drivers/cxl/core/memfeature.c
> 
> diff --git a/Documentation/edac/edac-scrub.rst b/Documentation/edac/edac-scrub.rst
> new file mode 100644
> index 000000000000..243035957e99
> --- /dev/null
> +++ b/Documentation/edac/edac-scrub.rst
> @@ -0,0 +1,74 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===================
> +EDAC Scrub control
> +===================
> +
> +Copyright (c) 2024 HiSilicon Limited.
> +
> +:Author:   Shiju Jose <shiju.jose@huawei.com>
> +:License:  The GNU Free Documentation License, Version 1.2
> +          (dual licensed under the GPL v2)
> +:Original Reviewers:
> +
> +- Written for: 6.12
> +- Updated for:
> +
> +Introduction
> +------------
> +The EDAC enhancement for RAS featurues exposes interfaces for controlling
> +the memory scrubbers in the system. The scrub device drivers in the
> +system register with the EDAC scrub. The driver exposes the
> +scrub controls to user in the sysfs.
> +
> +The File System
> +---------------
> +
> +The control attributes of the registered scrubber instance could be
> +accessed in the /sys/bus/edac/devices/<dev-name>/scrub*/
> +
> +sysfs
> +-----
> +
> +Sysfs files are documented in
> +`Documentation/ABI/testing/sysfs-edac-scrub-control`.
> +
> +Example
> +-------
> +
> +The usage takes the form shown in this example::
> +
> +1. CXL memory device patrol scrubber
> +1.1 device based
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/min_cycle_duration
> +3600
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/max_cycle_duration
> +918000
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
> +43200
> +root@localhost:~# echo 54000 > /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/current_cycle_duration
> +54000
> +root@localhost:~# echo 1 > /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> +1
> +root@localhost:~# echo 0 > /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_mem0/scrub0/enable_background
> +0
> +
> +1.2. region based
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/min_cycle_duration
> +3600
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/max_cycle_duration
> +918000
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
> +43200
> +root@localhost:~# echo 54000 > /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/current_cycle_duration
> +54000
> +root@localhost:~# echo 1 > /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> +1
> +root@localhost:~# echo 0 > /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> +root@localhost:~# cat /sys/bus/edac/devices/cxl_region0/scrub0/enable_background
> +0
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 99b5c25be079..394bdbc4de87 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -145,4 +145,22 @@ config CXL_REGION_INVALIDATION_TEST
>  	  If unsure, or if this kernel is meant for production environments,
>  	  say N.
>  
> +config CXL_RAS_FEAT
> +	bool "CXL: Memory RAS features"

If EDAC is compiled as module, it will lead to a situlation where EDAC is a
module, and CXL_RAS_FEAT is compiled in, and causing compile issue like below

----
  CC [M]  arch/arm64/crypto/aes-neon-blk.mod.o
  CC [M]  arch/arm64/crypto/sha512-arm64.mod.o
  CC [M]  arch/arm64/crypto/chacha-neon.mod.o
  CC [M]  arch/arm64/crypto/aes-neon-bs.mod.o
  CC [M]  fs/nfs/blocklayout/blocklayoutdriver.mod.o
  CC [M]  fs/pstore/ramoops.mod.o
  CC [M]  fs/ubifs/ubifs.mod.o
  CC [M]  fs/fuse/fuse.mod.o
  CC [M]  fs/fuse/cuse.mod.o
  CC [M]  fs/overlayfs/overlay.mod.o
  CC [M]  fs/btrfs/btrfs.mod.o
aarch64-linux-gnu-ld: drivers/cxl/core/memfeature.o: in function `cxl_mem_ras_features_init':
/home/fan/cxl/linux-edac/drivers/cxl/core/memfeature.c:1133:(.text+0x3ac): undefined reference to `edac_dev_register'
----
I think it should be "tristate" instead of "bool" like blew.

diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 394bdbc4de87..b717a152d2a5 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -146,7 +146,7 @@ config CXL_REGION_INVALIDATION_TEST
          say N.
 
 config CXL_RAS_FEAT
-       bool "CXL: Memory RAS features"
+       tristate "CXL: Memory RAS features"
        depends on CXL_PCI
        depends on CXL_MEM
        depends on EDAC


Fan

> +	depends on CXL_PCI
> +	depends on CXL_MEM
> +	depends on EDAC
> +	help
> +	  The CXL memory RAS feature control is optional allows host to control
> +	  the RAS features configurations of CXL Type 3 devices.
> +
> +	  Registers with the EDAC device subsystem to expose control attributes
> +	  of CXL memory device's RAS features to the user.
> +	  Provides interface functions to support configuring the CXL memory
> +	  device's RAS features.
> +
> +	  Say 'y/n' to enable/disable CXL.mem device'ss RAS features control.
> +	  See section 8.2.9.9.11 of CXL 3.1 specification for the detailed
> +	  information of CXL memory device features.
> +
>  endif
> diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
> index 9259bcc6773c..2a3c7197bc23 100644
> --- a/drivers/cxl/core/Makefile
> +++ b/drivers/cxl/core/Makefile
> @@ -16,3 +16,4 @@ cxl_core-y += pmu.o
>  cxl_core-y += cdat.o
>  cxl_core-$(CONFIG_TRACING) += trace.o
>  cxl_core-$(CONFIG_CXL_REGION) += region.o
> +cxl_core-$(CONFIG_CXL_RAS_FEAT) += memfeature.o
> diff --git a/drivers/cxl/core/memfeature.c b/drivers/cxl/core/memfeature.c
> new file mode 100644
> index 000000000000..90c68d20b02b
> --- /dev/null
> +++ b/drivers/cxl/core/memfeature.c
> @@ -0,0 +1,372 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * CXL memory RAS feature driver.
> + *
> + * Copyright (c) 2024 HiSilicon Limited.
> + *
> + *  - Supports functions to configure RAS features of the
> + *    CXL memory devices.
> + *  - Registers with the EDAC device subsystem driver to expose
> + *    the features sysfs attributes to the user for configuring
> + *    CXL memory RAS feature.
> + */
> +
> +#define pr_fmt(fmt)	"CXL MEM FEAT: " fmt
> +
> +#include <cxlmem.h>
> +#include <linux/cleanup.h>
> +#include <linux/limits.h>
> +#include <cxl.h>
> +#include <linux/edac.h>
> +
> +#define CXL_DEV_NUM_RAS_FEATURES	1
> +#define CXL_DEV_HOUR_IN_SECS	3600
> +
> +#define CXL_SCRUB_NAME_LEN	128
> +
> +/* CXL memory patrol scrub control definitions */
> +static const uuid_t cxl_patrol_scrub_uuid =
> +	UUID_INIT(0x96dad7d6, 0xfde8, 0x482b, 0xa7, 0x33, 0x75, 0x77, 0x4e,     \
> +		  0x06, 0xdb, 0x8a);
> +
> +/* CXL memory patrol scrub control functions */
> +struct cxl_patrol_scrub_context {
> +	u8 instance;
> +	u16 get_feat_size;
> +	u16 set_feat_size;
> +	u8 get_version;
> +	u8 set_version;
> +	u16 set_effects;
> +	struct cxl_memdev *cxlmd;
> +	struct cxl_region *cxlr;
> +};
> +
> +/**
> + * struct cxl_memdev_ps_params - CXL memory patrol scrub parameter data structure.
> + * @enable:     [IN & OUT] enable(1)/disable(0) patrol scrub.
> + * @scrub_cycle_changeable: [OUT] scrub cycle attribute of patrol scrub is changeable.
> + * @scrub_cycle_hrs:    [IN] Requested patrol scrub cycle in hours.
> + *                      [OUT] Current patrol scrub cycle in hours.
> + * @min_scrub_cycle_hrs:[OUT] minimum patrol scrub cycle in hours supported.
> + */
> +struct cxl_memdev_ps_params {
> +	bool enable;
> +	bool scrub_cycle_changeable;
> +	u16 scrub_cycle_hrs;
> +	u16 min_scrub_cycle_hrs;
> +};
> +
> +enum cxl_scrub_param {
> +	CXL_PS_PARAM_ENABLE,
> +	CXL_PS_PARAM_SCRUB_CYCLE,
> +};
> +
> +#define	CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK	BIT(0)
> +#define	CXL_MEMDEV_PS_SCRUB_CYCLE_REALTIME_REPORT_CAP_MASK	BIT(1)
> +#define	CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK	GENMASK(7, 0)
> +#define	CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK	GENMASK(15, 8)
> +#define	CXL_MEMDEV_PS_FLAG_ENABLED_MASK	BIT(0)
> +
> +struct cxl_memdev_ps_rd_attrs {
> +	u8 scrub_cycle_cap;
> +	__le16 scrub_cycle_hrs;
> +	u8 scrub_flags;
> +}  __packed;
> +
> +struct cxl_memdev_ps_wr_attrs {
> +	u8 scrub_cycle_hrs;
> +	u8 scrub_flags;
> +}  __packed;
> +
> +static int cxl_mem_ps_get_attrs(struct cxl_dev_state *cxlds,
> +				struct cxl_memdev_ps_params *params)
> +{
> +	size_t rd_data_size = sizeof(struct cxl_memdev_ps_rd_attrs);
> +	size_t data_size;
> +	struct cxl_memdev_ps_rd_attrs *rd_attrs __free(kfree) =
> +						kmalloc(rd_data_size, GFP_KERNEL);
> +	if (!rd_attrs)
> +		return -ENOMEM;
> +
> +	data_size = cxl_get_feature(cxlds, cxl_patrol_scrub_uuid,
> +				    CXL_GET_FEAT_SEL_CURRENT_VALUE,
> +				    rd_attrs, rd_data_size);
> +	if (!data_size)
> +		return -EIO;
> +
> +	params->scrub_cycle_changeable = FIELD_GET(CXL_MEMDEV_PS_SCRUB_CYCLE_CHANGE_CAP_MASK,
> +						   rd_attrs->scrub_cycle_cap);
> +	params->enable = FIELD_GET(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> +				   rd_attrs->scrub_flags);
> +	params->scrub_cycle_hrs = FIELD_GET(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> +					    rd_attrs->scrub_cycle_hrs);
> +	params->min_scrub_cycle_hrs = FIELD_GET(CXL_MEMDEV_PS_MIN_SCRUB_CYCLE_MASK,
> +						rd_attrs->scrub_cycle_hrs);
> +
> +	return 0;
> +}
> +
> +static int cxl_ps_get_attrs(struct device *dev, void *drv_data,
> +			    struct cxl_memdev_ps_params *params)
> +{
> +	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
> +	struct cxl_memdev *cxlmd;
> +	struct cxl_dev_state *cxlds;
> +	u16 min_scrub_cycle = 0;
> +	int i, ret;
> +
> +	if (cxl_ps_ctx->cxlr) {
> +		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
> +		struct cxl_region_params *p = &cxlr->params;
> +
> +		for (i = p->interleave_ways - 1; i >= 0; i--) {
> +			struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> +			cxlmd = cxled_to_memdev(cxled);
> +			cxlds = cxlmd->cxlds;
> +			ret = cxl_mem_ps_get_attrs(cxlds, params);
> +			if (ret)
> +				return ret;
> +
> +			if (params->min_scrub_cycle_hrs > min_scrub_cycle)
> +				min_scrub_cycle = params->min_scrub_cycle_hrs;
> +		}
> +		params->min_scrub_cycle_hrs = min_scrub_cycle;
> +		return 0;
> +	}
> +	cxlmd = cxl_ps_ctx->cxlmd;
> +	cxlds = cxlmd->cxlds;
> +
> +	return cxl_mem_ps_get_attrs(cxlds, params);
> +}
> +
> +static int cxl_mem_ps_set_attrs(struct device *dev, void *drv_data,
> +				struct cxl_dev_state *cxlds,
> +				struct cxl_memdev_ps_params *params,
> +				enum cxl_scrub_param param_type)
> +{
> +	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
> +	struct cxl_memdev_ps_wr_attrs wr_attrs;
> +	struct cxl_memdev_ps_params rd_params;
> +	int ret;
> +
> +	ret = cxl_mem_ps_get_attrs(cxlds, &rd_params);
> +	if (ret) {
> +		dev_err(dev, "Get cxlmemdev patrol scrub params failed ret=%d\n",
> +			ret);
> +		return ret;
> +	}
> +
> +	switch (param_type) {
> +	case CXL_PS_PARAM_ENABLE:
> +		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> +						  params->enable);
> +		wr_attrs.scrub_cycle_hrs = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> +						      rd_params.scrub_cycle_hrs);
> +		break;
> +	case CXL_PS_PARAM_SCRUB_CYCLE:
> +		if (params->scrub_cycle_hrs < rd_params.min_scrub_cycle_hrs) {
> +			dev_err(dev, "Invalid CXL patrol scrub cycle(%d) to set\n",
> +				params->scrub_cycle_hrs);
> +			dev_err(dev, "Minimum supported CXL patrol scrub cycle in hour %d\n",
> +				params->min_scrub_cycle_hrs);
> +			return -EINVAL;
> +		}
> +		wr_attrs.scrub_cycle_hrs = FIELD_PREP(CXL_MEMDEV_PS_CUR_SCRUB_CYCLE_MASK,
> +						      params->scrub_cycle_hrs);
> +		wr_attrs.scrub_flags = FIELD_PREP(CXL_MEMDEV_PS_FLAG_ENABLED_MASK,
> +						  rd_params.enable);
> +		break;
> +	}
> +
> +	ret = cxl_set_feature(cxlds, cxl_patrol_scrub_uuid,
> +			      cxl_ps_ctx->set_version,
> +			      &wr_attrs, sizeof(wr_attrs),
> +			      CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET);
> +	if (ret) {
> +		dev_err(dev, "CXL patrol scrub set feature failed ret=%d\n", ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int cxl_ps_set_attrs(struct device *dev, void *drv_data,
> +			    struct cxl_memdev_ps_params *params,
> +			    enum cxl_scrub_param param_type)
> +{
> +	struct cxl_patrol_scrub_context *cxl_ps_ctx = drv_data;
> +	struct cxl_memdev *cxlmd;
> +	struct cxl_dev_state *cxlds;
> +	int ret, i;
> +
> +	if (cxl_ps_ctx->cxlr) {
> +		struct cxl_region *cxlr = cxl_ps_ctx->cxlr;
> +		struct cxl_region_params *p = &cxlr->params;
> +
> +		for (i = p->interleave_ways - 1; i >= 0; i--) {
> +			struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> +			cxlmd = cxled_to_memdev(cxled);
> +			cxlds = cxlmd->cxlds;
> +			ret = cxl_mem_ps_set_attrs(dev, drv_data, cxlds,
> +						   params, param_type);
> +			if (ret)
> +				return ret;
> +		}
> +	} else {
> +		cxlmd = cxl_ps_ctx->cxlmd;
> +		cxlds = cxlmd->cxlds;
> +
> +		return cxl_mem_ps_set_attrs(dev, drv_data, cxlds, params, param_type);
> +	}
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, void *drv_data, bool *enabled)
> +{
> +	struct cxl_memdev_ps_params params;
> +	int ret;
> +
> +	ret = cxl_ps_get_attrs(dev, drv_data, &params);
> +	if (ret)
> +		return ret;
> +
> +	*enabled = params.enable;
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, void *drv_data, bool enable)
> +{
> +	struct cxl_memdev_ps_params params = {
> +		.enable = enable,
> +	};
> +
> +	return cxl_ps_set_attrs(dev, drv_data, &params, CXL_PS_PARAM_ENABLE);
> +}
> +
> +static int cxl_patrol_scrub_read_min_scrub_cycle(struct device *dev, void *drv_data,
> +						 u32 *min)
> +{
> +	struct cxl_memdev_ps_params params;
> +	int ret;
> +
> +	ret = cxl_ps_get_attrs(dev, drv_data, &params);
> +	if (ret)
> +		return ret;
> +	*min = params.min_scrub_cycle_hrs * CXL_DEV_HOUR_IN_SECS;
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_read_max_scrub_cycle(struct device *dev, void *drv_data,
> +						 u32 *max)
> +{
> +	*max = U8_MAX * CXL_DEV_HOUR_IN_SECS; /* Max set by register size */
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_read_scrub_cycle(struct device *dev, void *drv_data,
> +					     u32 *scrub_cycle_secs)
> +{
> +	struct cxl_memdev_ps_params params;
> +	int ret;
> +
> +	ret = cxl_ps_get_attrs(dev, drv_data, &params);
> +	if (ret)
> +		return ret;
> +
> +	*scrub_cycle_secs = params.scrub_cycle_hrs * CXL_DEV_HOUR_IN_SECS;
> +
> +	return 0;
> +}
> +
> +static int cxl_patrol_scrub_write_scrub_cycle(struct device *dev, void *drv_data,
> +					      u32 scrub_cycle_secs)
> +{
> +	struct cxl_memdev_ps_params params = {
> +		.scrub_cycle_hrs = scrub_cycle_secs / CXL_DEV_HOUR_IN_SECS,
> +	};
> +
> +	return cxl_ps_set_attrs(dev, drv_data, &params, CXL_PS_PARAM_SCRUB_CYCLE);
> +}
> +
> +static const struct edac_scrub_ops cxl_ps_scrub_ops = {
> +	.get_enabled_bg = cxl_patrol_scrub_get_enabled_bg,
> +	.set_enabled_bg = cxl_patrol_scrub_set_enabled_bg,
> +	.min_cycle_read = cxl_patrol_scrub_read_min_scrub_cycle,
> +	.max_cycle_read = cxl_patrol_scrub_read_max_scrub_cycle,
> +	.cycle_duration_read = cxl_patrol_scrub_read_scrub_cycle,
> +	.cycle_duration_write = cxl_patrol_scrub_write_scrub_cycle,
> +};
> +
> +int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
> +{
> +	struct edac_dev_feature ras_features[CXL_DEV_NUM_RAS_FEATURES];
> +	struct cxl_dev_state *cxlds;
> +	struct cxl_patrol_scrub_context *cxl_ps_ctx;
> +	struct cxl_feat_entry feat_entry;
> +	char cxl_dev_name[CXL_SCRUB_NAME_LEN];
> +	int rc, i, num_ras_features = 0;
> +
> +	if (cxlr) {
> +		struct cxl_region_params *p = &cxlr->params;
> +
> +		for (i = p->interleave_ways - 1; i >= 0; i--) {
> +			struct cxl_endpoint_decoder *cxled = p->targets[i];
> +
> +			cxlmd = cxled_to_memdev(cxled);
> +			cxlds = cxlmd->cxlds;
> +			memset(&feat_entry, 0, sizeof(feat_entry));
> +			rc = cxl_get_supported_feature_entry(cxlds, &cxl_patrol_scrub_uuid,
> +							     &feat_entry);
> +			if (rc < 0)
> +				return rc;
> +			if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
> +				return -EOPNOTSUPP;
> +		}
> +	} else {
> +		cxlds = cxlmd->cxlds;
> +		rc = cxl_get_supported_feature_entry(cxlds, &cxl_patrol_scrub_uuid,
> +						     &feat_entry);
> +		if (rc < 0)
> +			return rc;
> +
> +		if (!(feat_entry.attr_flags & CXL_FEAT_ENTRY_FLAG_CHANGABLE))
> +			return -EOPNOTSUPP;
> +	}
> +
> +	cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL);
> +	if (!cxl_ps_ctx)
> +		return -ENOMEM;
> +
> +	*cxl_ps_ctx = (struct cxl_patrol_scrub_context) {
> +		.instance = cxl_ps_ctx->instance,
> +		.get_feat_size = feat_entry.get_feat_size,
> +		.set_feat_size = feat_entry.set_feat_size,
> +		.get_version = feat_entry.get_feat_ver,
> +		.set_version = feat_entry.set_feat_ver,
> +		.set_effects = feat_entry.set_effects,
> +	};
> +	if (cxlr) {
> +		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
> +			 "cxl_region%d", cxlr->id);
> +		cxl_ps_ctx->cxlr = cxlr;
> +	} else {
> +		snprintf(cxl_dev_name, sizeof(cxl_dev_name),
> +			 "%s_%s", "cxl", dev_name(&cxlmd->dev));
> +		cxl_ps_ctx->cxlmd = cxlmd;
> +	}
> +
> +	ras_features[num_ras_features].ft_type = RAS_FEAT_SCRUB;
> +	ras_features[num_ras_features].scrub_ops = &cxl_ps_scrub_ops;
> +	ras_features[num_ras_features].ctx = cxl_ps_ctx;
> +	num_ras_features++;
> +
> +	return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL,
> +				 num_ras_features, ras_features);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_mem_ras_features_init, CXL);
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 21ad5f242875..1cc29ec9ffac 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -3434,6 +3434,12 @@ static int cxl_region_probe(struct device *dev)
>  					p->res->start, p->res->end, cxlr,
>  					is_system_ram) > 0)
>  			return 0;
> +
> +		rc = cxl_mem_ras_features_init(NULL, cxlr);
> +		if (rc)
> +			dev_warn(&cxlr->dev, "CXL RAS features init for region_id=%d failed\n",
> +				 cxlr->id);
> +
>  		return devm_cxl_add_dax_region(cxlr);
>  	default:
>  		dev_dbg(&cxlr->dev, "unsupported region mode: %d\n",
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index b565a061a4e3..2187c3378eaa 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -889,6 +889,13 @@ int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
>  int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
>  int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
>  
> +#ifdef CONFIG_CXL_RAS_FEAT
> +int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr);
> +#else
> +static inline int cxl_mem_ras_features_init(struct cxl_memdev *cxlmd, struct cxl_region *cxlr)
> +{ return 0; }
> +#endif
> +
>  #ifdef CONFIG_CXL_SUSPEND
>  void cxl_mem_active_inc(void);
>  void cxl_mem_active_dec(void);
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 7de232eaeb17..be2e69548909 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -117,6 +117,10 @@ static int cxl_mem_probe(struct device *dev)
>  	if (!cxlds->media_ready)
>  		return -EBUSY;
>  
> +	rc = cxl_mem_ras_features_init(cxlmd, NULL);
> +	if (rc)
> +		dev_warn(&cxlmd->dev, "CXL RAS features init failed\n");
> +
>  	/*
>  	 * Someone is trying to reattach this device after it lost its port
>  	 * connection (an endpoint port previously registered by this memdev was
> -- 
> 2.34.1
> 

-- 
Fan Ni


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2024-10-01 19:47 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-11  9:04 [PATCH v12 00/17] EDAC: Scrub: introduce generic EDAC RAS control feature driver + CXL/ACPI-RAS2 drivers shiju.jose
2024-09-11  9:04 ` [PATCH v12 01/17] EDAC: Add support for EDAC device features control shiju.jose
2024-09-13 16:40   ` Borislav Petkov
2024-09-16  9:21     ` Shiju Jose
2024-09-16 10:50       ` Jonathan Cameron
2024-09-16 16:16         ` Shiju Jose
2024-09-11  9:04 ` [PATCH v12 02/17] EDAC: Add EDAC scrub control driver shiju.jose
2024-09-13 17:25   ` Borislav Petkov
2024-09-16  9:22     ` Shiju Jose
2024-09-26 23:04   ` Fan Ni
2024-09-27 11:17     ` Shiju Jose
2024-09-11  9:04 ` [PATCH v12 03/17] EDAC: Add EDAC ECS " shiju.jose
2024-09-27 16:28   ` Fan Ni
2024-09-11  9:04 ` [PATCH v12 04/17] cxl: Move mailbox related bits to the same context shiju.jose
2024-09-11 17:20   ` Dave Jiang
2024-09-12  9:42     ` Shiju Jose
2024-09-11  9:04 ` [PATCH v12 05/17] cxl: Fix comment regarding cxl_query_cmd() return data shiju.jose
2024-09-11  9:04 ` [PATCH v12 06/17] cxl: Refactor user ioctl command path from mds to mailbox shiju.jose
2024-09-11  9:04 ` [PATCH v12 07/17] cxl: Add Get Supported Features command for kernel usage shiju.jose
2024-09-23 23:33   ` Dave Jiang
2024-09-25 11:18     ` Shiju Jose
2024-09-11  9:04 ` [PATCH v12 08/17] cxl/mbox: Add GET_FEATURE mailbox command shiju.jose
2024-09-30 16:17   ` Fan Ni
2024-09-11  9:04 ` [PATCH v12 09/17] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-09-30 16:58   ` Fan Ni
2024-09-11  9:04 ` [PATCH v12 10/17] cxl/memfeature: Add CXL memory device patrol scrub control feature shiju.jose
2024-09-30 17:38   ` Fan Ni
2024-10-01  8:38     ` Shiju Jose
2024-10-01 19:47   ` Fan Ni
2024-09-11  9:04 ` [PATCH v12 11/17] cxl/memfeature: Add CXL memory device ECS " shiju.jose
2024-09-30 18:12   ` Fan Ni
2024-10-01  8:39     ` Shiju Jose
2024-09-11  9:04 ` [PATCH v12 12/17] platform: Add __free() based cleanup function for platform_device_put shiju.jose
2024-09-11  9:04 ` [PATCH v12 13/17] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-10-01 15:47   ` Fan Ni
2024-09-11  9:04 ` [PATCH v12 14/17] ras: mem: Add memory " shiju.jose
2024-09-11  9:04 ` [PATCH v12 15/17] EDAC: Add EDAC PPR control driver shiju.jose
2024-09-11  9:04 ` [PATCH v12 16/17] cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command shiju.jose
2024-09-11  9:04 ` [PATCH v12 17/17] cxl/memfeature: Add CXL memory device PPR control feature shiju.jose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox