linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Alison Schofield <alison.schofield@intel.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: Nathan Fontenot <nathan.fontenot@amd.com>,
	<linux-cxl@vger.kernel.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH] cxl: Update Soft Reserved resources upon region creation
Date: Wed, 11 Dec 2024 19:20:33 -0800	[thread overview]
Message-ID: <675a5681fc03_10a083294ba@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <Z1pFXFGjkMMXgFy6@aschofie-mobl2.lan>

Alison Schofield wrote:
[..]
> > Hi Nathan, this patch hit on all the mechanisms I would expect, but upon
> > reading it there is an opportunity to zoom out and do something blunter
> > than the surgical precision of this current proposal.
> > 
> > In other words, I appreciate the consideration of potential corner
> > cases, but for overall maintainability this should aim to be an all or
> > nothing approach.
> > 
> > Specifically, at the first sign of trouble, any CXL sub-driver probe
> > failure or region enumeration timeout, that the entire CXL topology be
> > torn down (trigger the equivalent of ->remove() on the ACPI0017 device),
> > and the deferred Soft Reserved ranges registered as if cxl_acpi was not
> > present (implement a fallback equivalent to hmem_register_devices()).
> > 
> > No need to trim resources as regions arrive, just tear down everything
> > setup in the cxl_acpi_probe() path with devres_release_all().
> > 
> > So, I am thinking export a flag from the CXL core that indicates whether
> > any conflict with platform-firmware established CXL regions has
> > occurred.
> > 
> > Read that flag from an cxl_acpi-driver-launched deferred workqueue that
> > is awaiting initial device probing to quiesce. If that flag indicates a
> > CXL enumeration failure then trigger devres_release_all() on the
> > ACPI0017 platform device and follow that up by walking the deferred Soft
> > Reserve resources to register raw (unparented by CXL regions) dax
> > devices.
> > 
> 
> This reads like a 'poison pill' case that is in addition to the original
> use cases that inspired this patch. I'm not sure I'm getting the 'all or
> nothing' language.

Read through this recent regression thread:

   http://lore.kernel.org/20241202111941.2636613-1-raghavendra.kt@amd.com

That is evidence that the ways in which platform CXL configurations can
confuse the CXL subsystem are myriad and seemingly growing.

So this change of heart from me is realizing that the solution needed
here is less about how to recover the Soft Reserved ranges after CXL
successfully takes over, it is about abandoning the notion that the
driver should continue to best-effort fumble along after discovering
that a fundamental assumption has been violated.

For example this would also mitigate the "Extended Linear Cache"
enumeration problem:

    http://lore.kernel.org/20241204224827.2097263-1-dave.jiang@intel.com

...where the driver fools itself into thinking it knows what is
happening just based on reading hardware decoders, but not realizing
that DDR aliases into the same physical address range. Yes, you lose
important CXL subsystem services like address translation, but it is
arguably worse to confidently offer a bad translation based on wrong
assumption than offer no translation with an undecorated dax device, and
in the meantime the memory is usable.

The driver should stop guessing and say, "there is reason to believe I
have no idea what this platform is actually doing. The platform firmware
seems to think it knows what is coherent / enabled memory range. In the
interests of not dropping precious resources on the floor, or proceeding
with potential misconfiguration of the memory map, just revert to 100%
memory-mapped enumerated resources without CXL decoration."

> The patch was doing Case 1) and 2). This poison-pill approach is 3)

So it is not "poison-pill" or "sour grapes", it is "irresponsible to
continue when fundamental assumptions are violated, and wasteful to
leave the system with reduced memory capacity when platform firmware put
it in the memory map".

The "unwind SR on CXL region destruction" use case comes along for the
ride as a bonus feature.

> 
> 1) SR aligns exactly with a CXL Window.
> This patch removes that SR so the address space is available for reuse if
> an auto-region is torn down.
> 
> 2) SR is larger than a CXL Window.
> Where the SR aligns with the CXL Window the SR is handled as in 1).
> The SR leftovers are released to DAX.
> 
> 3) Failure in auto-region assembly.
> New case: tear down ACPI0017 and release all SRs.

release all SRs back to default enumeration behavior.

> So, after writing the above, maybe I get what the 'all' case is.
> Are you suggesting we stop trimming and ignore leftovers and just
> consider both 1) and 2) the 'all' case? No SR ever makes it into
> the iomem resource tree so all resources are available for reuse
> after teardown. And then there is 3), the nothing case! I get that.
> 
> It will waste some leftovers that could have been handed to DAX.
> I know the SR greater than CXL Window came in handy for hotplug when
> the SRs were not released, but I never understood if SRs greater
> than CXL Window intended to serve any purpose.

...and that is another example to prove the point. If the CXL subsystem
can not definitively say why it is trimming too large SR resources based
on the CXL Window size it should default to giving the end user all the
memory promised in the EFI memory map.


  reply	other threads:[~2024-12-12  3:21 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-02 15:55 Nathan Fontenot
2024-12-02 18:56 ` Fan Ni
2024-12-04 16:33   ` Fontenot, Nathan
2024-12-02 19:00 ` kernel test robot
2024-12-03  0:31 ` kernel test robot
2024-12-03  1:12 ` kernel test robot
2024-12-11 11:31 ` Andy Shevchenko
2024-12-11 20:11   ` Fontenot, Nathan
2024-12-11 22:30 ` Dan Williams
2024-12-11 22:53   ` Dan Williams
2024-12-12  2:07   ` Alison Schofield
2024-12-12  3:20     ` Dan Williams [this message]
2024-12-12 18:12   ` Fontenot, Nathan
2024-12-12 19:57     ` Dan Williams
2024-12-12 22:42 ` Gregory Price
2024-12-13  1:01   ` Alison Schofield
2024-12-13 16:33     ` Fontenot, Nathan
2024-12-26 19:25     ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=675a5681fc03_10a083294ba@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nathan.fontenot@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox