* [Linux Memory Hotness and Promotion] Notes from September 25, 2025
@ 2025-09-28 3:26 David Rientjes
2025-10-01 5:33 ` Ravi Jonnalagadda
0 siblings, 1 reply; 2+ messages in thread
From: David Rientjes @ 2025-09-28 3:26 UTC (permalink / raw)
To: Davidlohr Bueso, Fan Ni, Gregory Price, Jonathan Cameron,
Joshua Hahn, Raghavendra K T, Rao, Bharata Bhasker,
SeongJae Park, Wei Xu, Xuezheng Chu, Yiannis Nikolakopoulos,
Zi Yan
Cc: linux-mm
Hi everybody,
Here are the notes from the last Linux Memory Hotness and Promotion call
that happened on Thursday, September 25. Thanks to everybody who was
involved!
These notes are intended to bring people up to speed who could not attend
the call as well as keep the conversation going in between meetings.
----->o-----
Bharata updated that he is awaiting further reviews on his patch series
upstream. In the next revision, he plans on doing pruning of the hot page
records that get accumulated in the hash lists that end up being cold.
Today, they end up getting cold over time and aren't pruned. This is not
expected to be an invasive change.
We talked about the hotness abstraction and its current level of
complexity. Bharata noted that the current hot page record takes up 40
bytes and this was definitely something to shrink. Bharata suggested to
have this discussion on the upstream mailing list so others can chime in.
Jonathan Cameron also volunteered to review.
Raghu was investigating how klruscand and his patch series could mesh
together. Kinsey further updated that he is working on the resume issue
for klruscand that we chatted about in the last meeting. The API is
included in the most recent version of Kinsey's series so that should not
be blocking. Raghu is looking at that API for integration.
----->o-----
We chatted about overall CXL performance and the kernel's role in memory
tiering, including current support with NUMA Balancing. My suggestion was
that the kernel would always need to be the source of truth for memory
hotness based on any mechanisms that it can derive that information from
including Accessed bit harvesting, hardware assists, CHMU, etc.
Additionally, we discussed the shared AMD and Google vision for kthread
based migration of memory for optimal placement.
Yiannis asked that we separate the topic into two parts, memory tiering
and its use cases and then kernel driven migration of memory. He said
that memory tiering is shown to clearly make sense.
----->o-----
Jonathan suggested that this may not be discussion to be focused only on
tiering, but raher this is just NUMA. If we address it as just NUMA, this
is a different approach than specialized support needed for memory tiering
and grouping nodes together. Wei Xu noted that NUMA Balancing assumes all
NUMA nodes have cpus today. He also discussed the implications of both
latency and bandwidth for CXL devices; for lower cost memory attach,
devices often intentionally have lower bandwidth.
Jonathan noted that with traditional NUMA, bandwidth for the remote socket
is normally already an order of magnitude lower than local bandwidth. It
may not be the case that this will always be slower for CXL. I pivoted
the discussion toward how we would achieve optimal page placement for this
memory.
Wei noted that for memory tiering, demotion is a type of reclaim and this
is based on memory tiering, not NUMA. In other words, we wouldn't want to
migrate cold memory simply to another NUMA node in top tier memory because
that could be detrimental to other workloads on that socket. My read of
Jonathan's comment was that we should optimize for page placement based on
hot memory for both NUMA and tiering.
Jonathan noted that for memory bandwidth we have to consider weighted
interleave and distributing hot pages, i.e. if we have 10% of the
bandwidth, we want 10% of the hot pages in that memory. If we migrate
everything to the fastest memory, we don't get optimal bandwidth.
Ravi Shankar noted a recent patch series[1] was recently pushed for
interleaving of hot pages and not for the entire memory allocation. It
identifies hot pages and applies the interleave only for that set of
memory. For other pages, we do demotion. This is trying to optimize for
bandwidth expansion while doing cold page demotion. If we can do hot page
tracking and apply interleaved weights only for those pages, this would
optimize for bandwidth.
Joshua Hahn asked if this results in too much concentration in lower
memory tiers. Ravi said demotion in this case was only triggered when you
hit a low bound of memory free on the top tier. Before that, hot pages
are interleaved based on the weights.
Jonathan noted the optimal strategy would be that if you have too much in
the slow memory then you have to promote and if you have too much in the
fast memory then you need to demote -- the goal is to move the smallest
number of pages to optimize for this. Goal is to move the hottest page
and NUMA Balancing has never addressed this.
----->o-----
Next meeting will be on Thursday, October 9 at 8:30am PDT (UTC-7),
everybody is welcome: https://meet.google.com/jak-ytdx-hnm
Topics for the next meeting:
- update on cold page pruning from Bharata's patch series and any
opportunities to shrink the hotness tracking
- update on the resume fix for klruscand and timelines for sharing
upstream
- update on integrating klruscand into Raghu's series of patches and
whether the API is stable or needs to be changed
- discussion on how to optimize page placement for bandwidth and not
simply latency based on access based on weighted interleave
- update on non-temporal stores enlightenment for memory tiering
- enlightening migrate_pages() for hardware assists and how this work
will be charged to userspace
- discuss proactive demotion interface as an extension to memory.reclaim
- discuss overall testing and benchmarking methodology for various
approaches as we go along
Please let me know if you'd like to propose additional topics for
discussion, thank you!
[1]
https://lore.kernel.org/linux-mm/20250923174752.35701-1-shivankg@amd.com
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Linux Memory Hotness and Promotion] Notes from September 25, 2025
2025-09-28 3:26 [Linux Memory Hotness and Promotion] Notes from September 25, 2025 David Rientjes
@ 2025-10-01 5:33 ` Ravi Jonnalagadda
0 siblings, 0 replies; 2+ messages in thread
From: Ravi Jonnalagadda @ 2025-10-01 5:33 UTC (permalink / raw)
To: rientjes
Cc: Jonathan.Cameron, bharata, dave, gourry, joshua.hahnjy, linux-mm,
nifan.cxl, rkodsara, sj, weixugc, xuezhengchu, yiannis, ziy
Hi David, all,
I'm writing to correct the reference mentioned in the notes from the last meeting.
> Ravi Shankar noted a recent patch series[1] was recently pushed for
> interleaving of hot pages and not for the entire memory allocation.
I must have mistakenly provided the wrong reference during the discussion.
The correct link is here:
https://lore.kernel.org/all/20250709005952.17776-14-bijan311@gmail.com/T/#m374b66a5195cbd24a848665ce214d0c248a4efea
This functionality is already merged in mainline.
Apologies for the mix-up and any confusion this may have caused.
Best regards,
Ravi Jonnalagadda
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-10-01 5:33 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-28 3:26 [Linux Memory Hotness and Promotion] Notes from September 25, 2025 David Rientjes
2025-10-01 5:33 ` Ravi Jonnalagadda
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox