From: Dan Williams <dan.j.williams@intel.com>
To: Gregory Price <gourry@gourry.net>, Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
<lsf-pc@lists.linux-foundation.org>, <linux-mm@kvack.org>,
<linux-cxl@vger.kernel.org>, Byungchul Park <byungchul@sk.com>,
Honggyu Kim <honggyu.kim@sk.com>
Subject: Re: [LSF/MM/BPF TOPIC] Restricting or migrating unmovable kernel allocations from slow tier
Date: Mon, 3 Feb 2025 14:09:26 -0800 [thread overview]
Message-ID: <67a13e969daf_2d2c29412@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <Z55MILLApIoPh0A1@gourry-fedora-PF4VCD3F>
Gregory Price wrote:
> On Sun, Feb 02, 2025 at 12:13:23AM +0900, Hyeonggon Yoo wrote:
> > On Sat, Feb 1, 2025 at 11:04 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > This all seems like a grand waste of time. Don't do that. Don't allow
> > > kernel allocations from CXL at all. Don't build systems that have
> > > vast quantities of CXL memory (or if you do, expose it as really fast
> > > swap, not as memory).
> > >
> >
> > Hi, Matthew. Thank you for sharing your opinion.
> >
> > I don't want to introduce too much complexity to MM due to CXL madness either,
> > but I think at least we need to guide users who buy CXL hardware to avoid
> > doing stupid things.
> >
> > My initial subject was "Clearly documenting the use cases of
> > memhp_default_state=online{,_kernel}" because at first glance,
> > it was deemed usable for allowing kernel allocations from CXL,
> > which turned out to be not after some evaluation.
> >
>
> This was the motivation for implementing the build-time switch for
> memhp_default_state. Distros and builders can now have flexibility
> to make this their default policy for hotplug memory blocks.
>
> https://lore.kernel.org/linux-mm/20241226182918.648799-1-gourry@gourry.net/
>
> I don't normally agree with Willy's hard takes on CXL, but I do agree
> that it's generally not fit for kernel use - and I share general skepticism
> that movement-based tiering is fundamentally better than reclaim/swap
> semantics (though I have been convinced otherwise in some scenarios,
> and I think some clear performance benefits in many scenarios are lost
> by treating it as super-fast-swap).
It is also the case that CXL topologies enumerate their performance
characteristics, "CXL" is not a latency characteristic unto itself.
For example, like "PCI", "CXL" by itself does not imply a performance
profile. You could have CPU attached DDR that presents as a "CXL"
enumerated device just to take advantage of now standardized RAS
interfaces.
Unless and until this whole heteorgeneous memory experiment fails all
the kernel can do is give userspace the ability to include/exclude
memory ranges that are marked as outside the default pool. That is what
EFI_MEMORY_SP is all about, to set aside: too precious for the default
pool => HBM, or too slow for the default pool => potentially CXL and
PMEM.
A kernel default policy, or better yet distibution policy, that more
aggressively excludes CXL memory based on its relative performance to
the default pool would be a welcome improvement.
> Rather than ask whether we can make portions of the kernel more ammenable
> to movable allocations, I think it's more beneficial to focus on whether
> we can reduce the ZONE_NORMAL cost of ZONE_MOVABLE capacity. That seems
> (to me) like the actual crux of this particular issue.
Yes, I like this line of thinking. Even if CXL attached memory struggles
to graduate out of cold-memory tier use cases, that struggle can yield
other general improvements that are welcome indepdendent of CXL.
next prev parent reply other threads:[~2025-02-03 22:09 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-01 13:29 Hyeonggon Yoo
2025-02-01 14:04 ` Matthew Wilcox
2025-02-01 15:13 ` Hyeonggon Yoo
2025-02-01 16:30 ` Gregory Price
2025-02-01 18:48 ` Matthew Wilcox
2025-02-03 22:09 ` Dan Williams [this message]
2025-02-07 7:20 ` Byungchul Park
2025-02-07 8:57 ` Gregory Price
2025-02-07 9:27 ` Gregory Price
2025-02-07 9:34 ` Honggyu Kim
2025-02-07 9:54 ` Gregory Price
2025-02-07 10:49 ` Byungchul Park
2025-02-10 2:33 ` Harry (Hyeonggon) Yoo
2025-02-10 3:19 ` Matthew Wilcox
2025-02-10 6:00 ` Gregory Price
2025-02-10 7:17 ` Byungchul Park
2025-02-10 15:47 ` Gregory Price
2025-02-10 15:55 ` Matthew Wilcox
2025-02-10 16:06 ` Gregory Price
2025-02-11 1:53 ` Byungchul Park
2025-02-21 1:52 ` Harry Yoo
2025-02-25 4:54 ` [LSF/MM/BPF TOPIC] Gathering ideas to reduce ZONE_NORMAL cost Byungchul Park
2025-02-25 5:06 ` [LSF/MM/BPF TOPIC] Restricting or migrating unmovable kernel allocations from slow tier Byungchul Park
2025-03-03 15:55 ` Gregory Price
2025-02-07 10:14 ` Byungchul Park
2025-02-10 7:02 ` Byungchul Park
2025-02-04 9:59 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67a13e969daf_2d2c29412@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=42.hyeyoo@gmail.com \
--cc=byungchul@sk.com \
--cc=gourry@gourry.net \
--cc=honggyu.kim@sk.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox