From: Adam Manzanares <a.manzanares@samsung.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>,
"jonathan.cameron@huawei.com" <jonathan.cameron@huawei.com>,
"dave@stgolabs.net" <dave@stgolabs.net>,
Fan Ni <fan.ni@samsung.com>,
"dave.jiang@intel.com" <dave.jiang@intel.com>,
"ira.weiny@intel.com" <ira.weiny@intel.com>,
"alison.schofield@intel.com" <alison.schofield@intel.com>,
"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
"gourry.memverge@gmail.com" <gourry.memverge@gmail.com>,
"wj28.lee@gmail.com" <wj28.lee@gmail.com>,
"rientjes@google.com" <rientjes@google.com>,
"ruansy.fnst@fujitsu.com" <ruansy.fnst@fujitsu.com>,
"shradha.t@samsung.com" <shradha.t@samsung.com>,
"mcgrof@kernel.org" <mcgrof@kernel.org>,
Jim Harris <jim.harris@samsung.com>,
"mhocko@suse.com" <mhocko@suse.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] CXL Development Discussions
Date: Wed, 8 May 2024 18:26:23 +0000 [thread overview]
Message-ID: <e51cc411-a05a-4b0c-b43f-bc99a94208eb@nmtadam.samsung> (raw)
In-Reply-To: <66396c1938726_2f63a29443@dwillia2-mobl3.amr.corp.intel.com.notmuch>
On Mon, May 06, 2024 at 04:47:37PM -0700, Dan Williams wrote:
> Adam Manzanares wrote:
> > Hello all,
> >
> > I would like to have a discussion with the CXL development community about
> > current outstanding issues and also invite developers interested in RAS and
> > memory tiering to participate.
>
> Thanks for putting this together Adam!
NP, its been great working together in the community.
>
> > The first topic I believe we should discuss is how we can ensure as a group
> > that we are prioritizing upstream work. On a recent upstream CXL development
> > discussion call there was a call to review more work. I apologize for not
> > grabbing the link, but I believe Dave Jiang is leveraging patchwork and this
> > link should be shared with others so we can help get more reviews where needed.
>
> Dave already replied here but one thing I will add is help keeping an
> eye out for things that should be in queue. Likely a good way to
> do that is send a note along with a review so both get reflected in the
> tracking.
>
Noted.
> > The second topic I would like to discuss is how we integrate RAS features that
> > have similar equivalents in the kernel. A CXL device can provide info about
> > memory media errors in a similar fashion to memory controllers that have EDAC
> > support. Discussions have been put on the list and I would like to hear thoughts
> > from the community about where this should go [1]. On the same topic CXL has
> > port level RAS features and the PCIe DW series touched on this issue [2]
>
> If I could uplevel this a bit there are multiple efforts in memory RAS
> that likely want to figure out a cohesive story, or at least make
> conscious decisions about implementation divergence. Some related work
> that caught my eye:
>
> * AMD M1300 specific poison handling that sounds similar to CXL List
> Poison facility:
> http://lore.kernel.org/r/20240214033516.1344948-3-yazen.ghannam@amd.com
>
> * Scrub subsystem that has both ACPI and CXL intercepts:
> http://lore.kernel.org/r/20240419164720.1765-1-shiju.jose@huawei.com
>
> * Inconsistencies between firmware reported fatal errors and native
> error handling, compare:
>
> ghes_proc()::
> if (ghes_severity(estatus->error_severity) >= GHES_SEV_PANIC)
> __ghes_panic(ghes, estatus, buf_paddr, FIX_APEI_GHES_IRQ);
>
> ...vs:
>
> pcie_do_recovery()::
> /* TODO: Should kernel panic here? */
> pci_info(bridge, "device recovery failed\n");
>
> Also the inconsistencies between EXTLOG, GHES, BERT, and native error
> reporting.
>
Thanks for pointing these out. I will try to put all of these references
in context for discussion.
> > The third topic I would like to discuss is how we can get a set of common
> > benchmarks for memory tiering evaluations. Our team has done some initial
> > work in this space, but we want to hear more from end users about their
> > workloads of concern. There was a proposal related to this topic, but from what
> > I understand no meeting has been held [3].
> >
> > The last topic that I believe is worth discussion is how do we come up with
> > a baseline for testing. I am aware of 3 efforts that could be used cxl_test,
> > qemu, and uunit testing framework [4].
>
> I think benchmarking for memory-tiering is orthogonal to patch
> unit, function, and integration testing.
>
Agreed.
> For testing I think it is an "all of the above plus hardware testing if
> possible" situation. My hope is to get to a point where CXL patchwork
> lights up "S/W/F" columns with backend tests similar to NETDEV
> patchwork:
>
> https://patchwork.kernel.org/project/netdevbpf/list/
>
> There are some initial discussions about how to do this likely we can
> grab some folks to discuss more.
>
> I think Paul and Song would be useful to have for this discussion. Can
> you recommend others that would be useful for this or other CXL
> topics to help with timeslot conflict resolution?
>
Luis already chimed in and he is definitely our expert in terms of
establishing baselines for new functionalities.
next prev parent reply other threads:[~2024-05-08 18:26 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20240506192712uscas1p225316f79bb69f979b647d2a06a00a25f@uscas1p2.samsung.com>
2024-05-06 19:27 ` Adam Manzanares
2024-05-06 20:28 ` Dave Jiang
2024-05-06 22:58 ` Dan Williams
2024-05-08 18:08 ` Adam Manzanares
2024-05-06 23:47 ` Dan Williams
2024-05-07 18:50 ` Luis Chamberlain
2024-05-08 18:38 ` Adam Manzanares
2024-05-08 19:30 ` Luis Chamberlain
2024-05-09 18:14 ` Song Liu
2024-05-09 4:19 ` Dan Williams
2024-05-08 18:26 ` Adam Manzanares [this message]
2024-05-07 11:48 ` Michal Hocko
2024-05-08 18:35 ` Adam Manzanares
2024-05-12 13:07 ` Michal Hocko
2024-05-13 12:12 ` Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e51cc411-a05a-4b0c-b43f-bc99a94208eb@nmtadam.samsung \
--to=a.manzanares@samsung.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=fan.ni@samsung.com \
--cc=gourry.memverge@gmail.com \
--cc=ira.weiny@intel.com \
--cc=jim.harris@samsung.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mcgrof@kernel.org \
--cc=mhocko@suse.com \
--cc=rientjes@google.com \
--cc=ruansy.fnst@fujitsu.com \
--cc=shradha.t@samsung.com \
--cc=vishal.l.verma@intel.com \
--cc=wj28.lee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox