From: Gregory Price <gourry@gourry.net>
To: Dan Williams <dan.j.williams@intel.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: CXL Boot to Bash - Section 2: The Drivers
Date: Thu, 6 Feb 2025 10:59:37 -0500 [thread overview]
Message-ID: <Z6TcaaScBWzZvLWW@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <67a4069572eab_2d2c294d4@dwillia2-xfh.jf.intel.com.notmuch>
On Wed, Feb 05, 2025 at 04:47:17PM -0800, Dan Williams wrote:
> Gregory Price wrote:
> > [/sys/bus/cxl/devices]# ls
> > dax_region0 decoder0.0 decoder1.0 decoder2.0 .....
> > dax_region1 decoder0.1 decoder1.1 decoder3.0 .....
> >
> > ^^^ These dax regions require `CONFIG_DEV_DAX_CXL` enabled to fully
> > surface as dax devices, which can then be converted to system ram.
>
> At least for this problem the plan is to fall back to
> CONFIG_DEV_DAX_HMEM [1] which skips all of the RAS and device
> enumeration benefits and just shunts EFI_MEMORY_SP over to device_dax.
>
Hm, would this actually happen in the scenario where CONFIG_DEV_DAX_CXL
is not enabled but everything else is? The region0 still gets created
and associated with the resource, but the dax_region0 never gets
created.
On one system I have I see the following:
c050000000-fcefffffff : Soft Reserved
c050000000-fcefffffff : CXL Window 0
c050000000-fcefffffff : region0
c050000000-fcefffffff : dax0.0
c050000000-fcefffffff : System RAM (kmem)
fcf0000000-ffffffffff : Reserved
10000000000-1035fffffff : Soft Reserved
10000000000-1035fffffff : CXL Window 1
10000000000-1035fffffff : region1
10000000000-1035fffffff : dax1.0
10000000000-1035fffffff : System RAM (kmem)
I would expect the above HMEM/shunt to only work if everything down
through CXL Window 0 is torn down.
But if CONFIG_DEV_DAX_CXL is not enabled, everything "succeeds", it just
doesn't "Do what you want"(TM) - dax0.0 and RAM entries are absent.
It makes me wonder whether the driver over-componentized the build.
> I am otherwise open to suggestions about a better model for how to
> handle a type of memory capacity that elicits diverging opinions on
> whether it should be treated as System RAM, dedicated application
> memory, or some kind of cold-memory swap target.
>
My gut tells me there's no "elegant solution" here given that user
intent is fairly unknowable - i.e. best we can do is make the build
and boot options easier to understand.
> > ---------------------------------------------------------------
> > Step 6: DAX surfacing Memory Blocks - First bit of User Policy.
> > ---------------------------------------------------------------
> >
> > The last step in surfacing memory to allocators is to convert a dax
> > device into memory blocks. On most default kernel builds, dax devices
> > are not automatically converted to SystemRAM.
>
> I thought most distributions are shipping with
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, or the default online udev rule?
> For example Fedora is CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y and RHEL is
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n, but with the udev hotplug rule.
>
Good point, my bias take showing up in the notes here. I didn't know
RHEL had gotten as far as a udev rule already. I'll adjust my notes.
But this also hides some nuance as well - the default behavior onlines
memory into ZONE_NORMAL with DEFAULT_ONLINE (next section).
> > Alternatively, this can be done at Build or Boot time using
> > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE (v6.13 or below)
> > CONFIG_MHP_DEFAULT_ONLINE_TYPE_* (v6.14 or above)
> > memhp_default_state=* (boot param predating cxl)
>
> Oh, TIL the new CONFIG_MHP_DEFAULT_ONLINE_TYPE_* option.
>
It was only just added:
https://lore.kernel.org/linux-mm/20241226182918.648799-1-gourry@gourry.net/
Basically creates parity between memhp_default_state and build options.
> > The base is 256MB aligned (the minimum for the CXL Spec), and the
> > window size is 512MB. This results in a loss of almost a full memory
> > block worth of memory (~1280MB on the front, and ~512MB on the back).
> >
> > This is a loss of ~0.7% of capacity (1.5GB) for that region (121.25GB).
>
> This feels like an example, of "hey platform vendors, I understand
> that spec grants you the freedom to misalign, please refrain from taking
> advantage of that freedom".
>
Only x86 appears to actually do this (presently) - so is this a real
constraint or just a quirk of how the x86 arch code has chosen to
"optimize memory block size"?
Granted I'm a platform consumer, not a vendor - but I wouldn't even know
where to look to see where this constraint is defined (if it is).
All I'd know is "CXL Says I can align to 256MB, and minimum memory block
size on linux is 256MB so allons y!"
On the linux side - these platforms are now out there, in the wild.
So the surface impression now appears to be that linux just throws
away ~0.5% of your CXL capacity for no reason on these platforms.
That said, I also understand that more memory blocks might affect
allocation performance when the system is pressured - but losing
gigabytes of memory can also reduce performance.
(Preview of one of my next nuance additions in section 3)
If this (advisement) change is unwelcome, then we should be spewing
a really loud warning somewhere so vendors get signal for consumers.
~Gregory
next prev parent reply other threads:[~2025-02-06 15:59 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-26 20:19 [LSF/MM] Linux management of volatile CXL memory devices - boot to bash Gregory Price
2025-02-05 2:17 ` [LSF/MM] CXL Boot to Bash - Section 1: BIOS, EFI, and Early Boot Gregory Price
2025-02-18 10:12 ` Yuquan Wang
2025-02-18 16:11 ` Gregory Price
2025-02-20 16:30 ` Jonathan Cameron
2025-02-20 16:52 ` Gregory Price
2025-03-04 0:32 ` Gregory Price
2025-03-13 16:12 ` Jonathan Cameron
2025-03-13 17:20 ` Gregory Price
2025-03-10 10:45 ` Yuquan Wang
2025-03-10 14:19 ` Gregory Price
2025-02-05 16:06 ` CXL Boot to Bash - Section 2: The Drivers Gregory Price
2025-02-06 0:47 ` Dan Williams
2025-02-06 15:59 ` Gregory Price [this message]
2025-03-04 1:32 ` Gregory Price
2025-03-06 23:56 ` CXL Boot to Bash - Section 2a (Drivers): CXL Decoder Programming Gregory Price
2025-03-07 0:57 ` Zhijian Li (Fujitsu)
2025-03-07 15:07 ` Gregory Price
2025-03-11 2:48 ` Zhijian Li (Fujitsu)
2025-04-02 6:45 ` Zhijian Li (Fujitsu)
2025-04-02 14:18 ` Gregory Price
2025-04-08 3:10 ` Zhijian Li (Fujitsu)
2025-04-08 4:14 ` Gregory Price
2025-04-08 5:37 ` Zhijian Li (Fujitsu)
2025-02-17 20:05 ` CXL Boot to Bash - Section 3: Memory (block) Hotplug Gregory Price
2025-02-18 16:24 ` David Hildenbrand
2025-02-18 17:03 ` Gregory Price
2025-02-18 17:49 ` Yang Shi
2025-02-18 18:04 ` Gregory Price
2025-02-18 19:25 ` David Hildenbrand
2025-02-18 20:25 ` Gregory Price
2025-02-18 20:57 ` David Hildenbrand
2025-02-19 1:10 ` Gregory Price
2025-02-19 8:53 ` David Hildenbrand
2025-02-19 16:14 ` Gregory Price
2025-02-20 17:50 ` Yang Shi
2025-02-20 18:43 ` Gregory Price
2025-02-20 19:26 ` David Hildenbrand
2025-02-20 19:35 ` Gregory Price
2025-02-20 19:44 ` David Hildenbrand
2025-02-20 20:06 ` Gregory Price
2025-03-11 14:53 ` Zi Yan
2025-03-11 15:58 ` Gregory Price
2025-03-11 16:08 ` Zi Yan
2025-03-11 16:15 ` Gregory Price
2025-03-11 16:35 ` Oscar Salvador
2025-03-05 22:20 ` [LSF/MM] CXL Boot to Bash - Section 0: ACPI and Linux Resources Gregory Price
2025-03-05 22:44 ` Dave Jiang
2025-03-05 23:34 ` Gregory Price
2025-03-05 23:41 ` Dave Jiang
2025-03-06 0:09 ` Gregory Price
2025-03-06 1:37 ` Yuquan Wang
2025-03-06 17:08 ` Gregory Price
2025-03-07 2:20 ` Yuquan Wang
2025-03-07 15:12 ` Gregory Price
2025-03-13 17:00 ` Jonathan Cameron
2025-03-08 3:23 ` [LSF/MM] CXL Boot to Bash - Section 0a: CFMWS and NUMA Flexiblity Gregory Price
2025-03-13 17:20 ` Jonathan Cameron
2025-03-13 18:17 ` Gregory Price
2025-03-14 11:09 ` Jonathan Cameron
2025-03-14 13:46 ` Gregory Price
2025-03-13 16:55 ` [LSF/MM] CXL Boot to Bash - Section 0: ACPI and Linux Resources Jonathan Cameron
2025-03-13 17:30 ` Gregory Price
2025-03-14 11:14 ` Jonathan Cameron
2025-03-27 9:34 ` Yuquan Wang
2025-03-27 12:36 ` Gregory Price
2025-03-27 13:21 ` Dan Williams
2025-03-27 16:36 ` Gregory Price
2025-03-31 23:49 ` [Lsf-pc] " Dan Williams
2025-03-12 0:09 ` [LSF/MM] CXL Boot to Bash - Section 4: Interleave Gregory Price
2025-03-13 8:31 ` Yuquan Wang
2025-03-13 16:48 ` Gregory Price
2025-03-26 9:28 ` Yuquan Wang
2025-03-26 12:53 ` Gregory Price
2025-03-27 2:20 ` Yuquan Wang
2025-03-27 2:51 ` [Lsf-pc] " Dan Williams
2025-03-27 6:29 ` Yuquan Wang
2025-03-14 3:21 ` [LSF/MM] CXL Boot to Bash - Section 6: Page allocation Gregory Price
2025-03-18 17:09 ` [LSFMM] Updated: Linux Management of Volatile CXL Memory Devices Gregory Price
2025-04-02 4:49 ` Gregory Price
[not found] ` <CGME20250407161445uscas1p19322b476cafd59f9d7d6e1877f3148b8@uscas1p1.samsung.com>
2025-04-07 16:14 ` Adam Manzanares
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6TcaaScBWzZvLWW@gourry-fedora-PF4VCD3F \
--to=gourry@gourry.net \
--cc=dan.j.williams@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox