Re: [LSF/MM/BPF TOPIC] CXL Development Discussions

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Adam Manzanares <a.manzanares@samsung.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"jonathan.cameron@huawei.com" <jonathan.cameron@huawei.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	Fan Ni <fan.ni@samsung.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"gourry.memverge@gmail.com" <gourry.memverge@gmail.com>,
	"wj28.lee@gmail.com" <wj28.lee@gmail.com>,
	"rientjes@google.com" <rientjes@google.com>,
	"ruansy.fnst@fujitsu.com" <ruansy.fnst@fujitsu.com>,
	"shradha.t@samsung.com" <shradha.t@samsung.com>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	Jim Harris <jim.harris@samsung.com>,
	"mhocko@suse.com" <mhocko@suse.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
Date: Wed, 8 May 2024 18:26:23 +0000	[thread overview]
Message-ID: <e51cc411-a05a-4b0c-b43f-bc99a94208eb@nmtadam.samsung> (raw)
In-Reply-To: <66396c1938726_2f63a29443@dwillia2-mobl3.amr.corp.intel.com.notmuch>

On Mon, May 06, 2024 at 04:47:37PM -0700, Dan Williams wrote:
> Adam Manzanares wrote:
> > Hello all,
> > 
> > I would like to have a discussion with the CXL development community about
> > current outstanding issues and also invite developers interested in RAS and
> > memory tiering to participate.
> 
> Thanks for putting this together Adam!

NP, its been great working together in the community.

> 
> > The first topic I believe we should discuss is how we can ensure as a group
> > that we are prioritizing upstream work. On a recent upstream CXL development
> > discussion call there was a call to review more work. I apologize for not
> > grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> > link should be shared with others so we can help get more reviews where needed.
> 
> Dave already replied here but one thing I will add is help keeping an
> eye out for things that should be in queue. Likely a good way to
> do that is send a note along with a review so both get reflected in the
> tracking.
> 

Noted.

> > The second topic I would like to discuss is how we integrate RAS features that
> > have similar equivalents in the kernel. A CXL device can provide info about 
> > memory media errors in a similar fashion to memory controllers that have EDAC
> > support. Discussions have been put on the list and I would like to hear thoughts
> > from the community about where this should go [1]. On the same topic CXL has 
> > port level RAS features and the PCIe DW series touched on this issue  [2]
> 
> If I could uplevel this a bit there are multiple efforts in memory RAS
> that likely want to figure out a cohesive story, or at least make
> conscious decisions about implementation divergence. Some related work
> that caught my eye:
> 
> * AMD M1300 specific poison handling that sounds similar to CXL List
>   Poison facility:
>   http://lore.kernel.org/r/20240214033516.1344948-3-yazen.ghannam@amd.com
> 
> * Scrub subsystem that has both ACPI and CXL intercepts:
>   http://lore.kernel.org/r/20240419164720.1765-1-shiju.jose@huawei.com
> 
> * Inconsistencies between firmware reported fatal errors and native
>   error handling, compare:
> 
>   ghes_proc()::
>         if (ghes_severity(estatus->error_severity) >= GHES_SEV_PANIC)
>                 __ghes_panic(ghes, estatus, buf_paddr, FIX_APEI_GHES_IRQ);
> 
>   ...vs:
> 
>   pcie_do_recovery()::
>         /* TODO: Should kernel panic here? */
>         pci_info(bridge, "device recovery failed\n");
> 
>   Also the inconsistencies between EXTLOG, GHES, BERT, and native error
>   reporting.
> 

Thanks for pointing these out. I will try to put all of these references
in context for discussion.

> > The third topic I would like to discuss is how we can get a set of common
> > benchmarks for memory tiering evaluations. Our team has done some initial
> > work in this space, but we want to hear more from end users about their 
> > workloads of concern. There was a proposal related to this topic, but from what 
> > I understand no meeting has been held [3]. 
> > 
> > The last topic that I believe is worth discussion is how do we come up with
> > a baseline for testing. I am aware of 3 efforts that could be used cxl_test, 
> > qemu, and uunit testing framework [4].
> 
> I think benchmarking for memory-tiering is orthogonal to patch
> unit, function, and integration testing.
> 

Agreed. 

> For testing I think it is an "all of the above plus hardware testing if
> possible" situation. My hope is to get to a point where CXL patchwork
> lights up "S/W/F" columns with backend tests similar to NETDEV
> patchwork:
> 
> https://patchwork.kernel.org/project/netdevbpf/list/
> 
> There are some initial discussions about how to do this likely we can
> grab some folks to discuss more.
> 
> I think Paul and Song would be useful to have for this discussion. Can
> you recommend others that would be useful for this or other CXL
> topics to help with timeslot conflict resolution?
> 

Luis already chimed in and he is definitely our expert in terms of
establishing baselines for new functionalities.

next prev parent reply	other threads:[~2024-05-08 18:26 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20240506192712uscas1p225316f79bb69f979b647d2a06a00a25f@uscas1p2.samsung.com>
2024-05-06 19:27 ` Adam Manzanares
2024-05-06 20:28   ` Dave Jiang
2024-05-06 22:58     ` Dan Williams
2024-05-08 18:08     ` Adam Manzanares
2024-05-06 23:47   ` Dan Williams
2024-05-07 18:50     ` Luis Chamberlain
2024-05-08 18:38       ` Adam Manzanares
2024-05-08 19:30         ` Luis Chamberlain
2024-05-09 18:14           ` Song Liu
2024-05-09  4:19       ` Dan Williams
2024-05-08 18:26     ` Adam Manzanares [this message]
2024-05-07 11:48   ` Michal Hocko
2024-05-08 18:35     ` Adam Manzanares
2024-05-12 13:07       ` Michal Hocko
2024-05-13 12:12         ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e51cc411-a05a-4b0c-b43f-bc99a94208eb@nmtadam.samsung \
    --to=a.manzanares@samsung.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=fan.ni@samsung.com \
    --cc=gourry.memverge@gmail.com \
    --cc=ira.weiny@intel.com \
    --cc=jim.harris@samsung.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=ruansy.fnst@fujitsu.com \
    --cc=shradha.t@samsung.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wj28.lee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox