[Linux Memory Hotness and Promotion] Notes from September 25, 2025

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [Linux Memory Hotness and Promotion] Notes from September 25, 2025
@ 2025-09-28  3:26 David Rientjes
  2025-10-01  5:33 ` Ravi Jonnalagadda
  0 siblings, 1 reply; 2+ messages in thread
From: David Rientjes @ 2025-09-28  3:26 UTC (permalink / raw)
  To: Davidlohr Bueso, Fan Ni, Gregory Price, Jonathan Cameron,
	Joshua Hahn, Raghavendra K T, Rao, Bharata Bhasker,
	SeongJae Park, Wei Xu, Xuezheng Chu, Yiannis Nikolakopoulos,
	Zi Yan
  Cc: linux-mm

Hi everybody,

Here are the notes from the last Linux Memory Hotness and Promotion call
that happened on Thursday, September 25.  Thanks to everybody who was 
involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
Bharata updated that he is awaiting further reviews on his patch series 
upstream.  In the next revision, he plans on doing pruning of the hot page 
records that get accumulated in the hash lists that end up being cold.  
Today, they end up getting cold over time and aren't pruned.  This is not 
expected to be an invasive change.

We talked about the hotness abstraction and its current level of 
complexity.  Bharata noted that the current hot page record takes up 40 
bytes and this was definitely something to shrink.  Bharata suggested to 
have this discussion on the upstream mailing list so others can chime in.  
Jonathan Cameron also volunteered to review.

Raghu was investigating how klruscand and his patch series could mesh 
together.  Kinsey further updated that he is working on the resume issue 
for klruscand that we chatted about in the last meeting.  The API is 
included in the most recent version of Kinsey's series so that should not 
be blocking.  Raghu is looking at that API for integration.

----->o-----
We chatted about overall CXL performance and the kernel's role in memory 
tiering, including current support with NUMA Balancing.  My suggestion was 
that the kernel would always need to be the source of truth for memory 
hotness based on any mechanisms that it can derive that information from 
including Accessed bit harvesting, hardware assists, CHMU, etc.  
Additionally, we discussed the shared AMD and Google vision for kthread 
based migration of memory for optimal placement.

Yiannis asked that we separate the topic into two parts, memory tiering 
and its use cases and then kernel driven migration of memory.  He said 
that memory tiering is shown to clearly make sense.

----->o-----
Jonathan suggested that this may not be discussion to be focused only on 
tiering, but raher this is just NUMA.  If we address it as just NUMA, this 
is a different approach than specialized support needed for memory tiering 
and grouping nodes together.  Wei Xu noted that NUMA Balancing assumes all 
NUMA nodes have cpus today.  He also discussed the implications of both 
latency and bandwidth for CXL devices; for lower cost memory attach, 
devices often intentionally have lower bandwidth.

Jonathan noted that with traditional NUMA, bandwidth for the remote socket 
is normally already an order of magnitude lower than local bandwidth.  It 
may not be the case that this will always be slower for CXL.  I pivoted 
the discussion toward how we would achieve optimal page placement for this 
memory.

Wei noted that for memory tiering, demotion is a type of reclaim and this 
is based on memory tiering, not NUMA.  In other words, we wouldn't want to 
migrate cold memory simply to another NUMA node in top tier memory because 
that could be detrimental to other workloads on that socket.  My read of 
Jonathan's comment was that we should optimize for page placement based on 
hot memory for both NUMA and tiering.

Jonathan noted that for memory bandwidth we have to consider weighted 
interleave and distributing hot pages, i.e. if we have 10% of the 
bandwidth, we want 10% of the hot pages in that memory.  If we migrate 
everything to the fastest memory, we don't get optimal bandwidth.

Ravi Shankar noted a recent patch series[1] was recently pushed for 
interleaving of hot pages and not for the entire memory allocation.  It 
identifies hot pages and applies the interleave only for that set of 
memory.  For other pages, we do demotion.  This is trying to optimize for 
bandwidth expansion while doing cold page demotion.  If we can do hot page 
tracking and apply interleaved weights only for those pages, this would 
optimize for bandwidth.

Joshua Hahn asked if this results in too much concentration in lower 
memory tiers.  Ravi said demotion in this case was only triggered when you 
hit a low bound of memory free on the top tier.  Before that, hot pages 
are interleaved based on the weights.

Jonathan noted the optimal strategy would be that if you have too much in 
the slow memory then you have to promote and if you have too much in the 
fast memory then you need to demote -- the goal is to move the smallest 
number of pages to optimize for this.  Goal is to move the hottest page 
and NUMA Balancing has never addressed this.

----->o-----
Next meeting will be on Thursday, October 9 at 8:30am PDT (UTC-7),
everybody is welcome: https://meet.google.com/jak-ytdx-hnm

Topics for the next meeting:

 - update on cold page pruning from Bharata's patch series and any
   opportunities to shrink the hotness tracking
 - update on the resume fix for klruscand and timelines for sharing
   upstream
 - update on integrating klruscand into Raghu's series of patches and
   whether the API is stable or needs to be changed
 - discussion on how to optimize page placement for bandwidth and not
   simply latency based on access based on weighted interleave
 - update on non-temporal stores enlightenment for memory tiering
 - enlightening migrate_pages() for hardware assists and how this work
   will be charged to userspace
 - discuss proactive demotion interface as an extension to memory.reclaim
 - discuss overall testing and benchmarking methodology for various
   approaches as we go along

Please let me know if you'd like to propose additional topics for
discussion, thank you!

[1] 
https://lore.kernel.org/linux-mm/20250923174752.35701-1-shivankg@amd.com 

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Linux Memory Hotness and Promotion] Notes from September 25, 2025
  2025-09-28  3:26 [Linux Memory Hotness and Promotion] Notes from September 25, 2025 David Rientjes
@ 2025-10-01  5:33 ` Ravi Jonnalagadda
  0 siblings, 0 replies; 2+ messages in thread
From: Ravi Jonnalagadda @ 2025-10-01  5:33 UTC (permalink / raw)
  To: rientjes
  Cc: Jonathan.Cameron, bharata, dave, gourry, joshua.hahnjy, linux-mm,
	nifan.cxl, rkodsara, sj, weixugc, xuezhengchu, yiannis, ziy

Hi David, all,

I'm writing to correct the reference mentioned in the notes from the last meeting.

> Ravi Shankar noted a recent patch series[1] was recently pushed for
> interleaving of hot pages and not for the entire memory allocation.

I must have mistakenly provided the wrong reference during the discussion.

The correct link is here:
https://lore.kernel.org/all/20250709005952.17776-14-bijan311@gmail.com/T/#m374b66a5195cbd24a848665ce214d0c248a4efea

This functionality is already merged in mainline.
Apologies for the mix-up and any confusion this may have caused.

Best regards,
Ravi Jonnalagadda


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-10-01  5:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-28  3:26 [Linux Memory Hotness and Promotion] Notes from September 25, 2025 David Rientjes
2025-10-01  5:33 ` Ravi Jonnalagadda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox