linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Linux Memory Hotness and Promotion] Notes from December 18, 2025
@ 2025-12-21  4:10 David Rientjes
  0 siblings, 0 replies; only message in thread
From: David Rientjes @ 2025-12-21  4:10 UTC (permalink / raw)
  To: Davidlohr Bueso, Fan Ni, Gregory Price, Jonathan Cameron,
	Joshua Hahn, Raghavendra K T, Rao, Bharata Bhasker,
	SeongJae Park, Wei Xu, Xuezheng Chu, Yiannis Nikolakopoulos,
	Zi Yan
  Cc: linux-mm

Hi everybody,

Here are the notes from the last Linux Memory Hotness and Promotion call
that happened on Thursday, December 18.  Thanks to everybody who was 
involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
Raghu provided an update on his progress; he was trying to fit klruscand 
into his current approach but some redesign would be necessary; since 
there is already one approach working, klruscand + pghot, he will be going 
slow on this.  He was planning on posting the latest set of patches for 
the record so that it would be possible to revisit later, mainly focused 
on Jonathan's feedback and new optimizations.  This was likely to be 
posted by the end of the year.  Mainline would continue with klruscand 
with pghot.

Raghu had a question for klruscand, however: for the latest cleanup and 
MGLRU LRU changes proposed on the mailing list, will this affect anything?  
Wei said this would not affect klruscand since the core of MGLRU is the 
LRU of the page which is unaffected by those proposed changes.

----->o-----
We moved into discussing memory overheads for storing page hotness 
especially since this would be coming from super expensive top tier 
memory; we felt this was likely best to align so that we could determine 
the minimal viable upstream opportunity for a landing.  Gregory has a 
short discussion about this at LPC and the current proposal was around 64 
bits per tracked page and this would be limited to the CXL memory tier so 
the shorthand would be 2GB of overhead per 1TB of memory tracked.  He was 
interested in seeing how this would generalize to supporting N tiers which 
would be a minimal viable upstream requirement (HBM, DRAM, CXL tier).  
Jonathan suggested that HBM had nothing to do with hotness but was rather 
focused only on bandwidth.  The consensus of the group was that we still 
need to be able to support N tiers.

I asked about how this overlaps with NUMAB, there as been a lot of 
discussion about NUMAB=2 in this series of meetings but it's likely 
worthwile also to consider NUMAB=1.  Raghu suggested we could update the 
VMAs for DRAM tier in that case and only track the hotness of memory for 
that VMA.  I said that would still be operating on the sliding window.

We aligned that any upstream landed support must be extensible for 
additional memory tiers in the future.  Jonathan generalized this by 
saying that we need to be able to turn the support off with no overhead.  
I said this would be required for virtualizing the lower memory tiers into 
the guest where you may not care to track the hotness for optimal page 
placement.

I suggested that 2GB per 1TB of tracked memory sounded fine but it likely 
also the ceiling.  Gregory said that colleagues were surprised by the 
amount of overhead and so we should discuss this on the mailing list.  
Yiannis understood the pushback and said that we should show what this 
additional overhead is getting us: we need to demonstrate the value in the 
hotness tracking to justify the cost.

Wei said that internally he is using one byte per page for hotness 
tracking and this is a simple solution.  Jonathan said we could allow a 
precision vs cost trade-off, some mechanisms use even less than one byte 
per page; they work, but sometimes they promote the wrong thing.  We 
agreed this could be configurable.  Gregory made a good point that for 
single socket systems, for example, we don't need to capture source or 
access information.  It does drive configuration complexity, however.

Gregory suggested we may want to avoid the accessor information on some 
systems and, when we get it wrong, require double migration to get it 
right.  He was on board to limit to eight bits to start and then add a 
precision mode later.

----->o-----
The next meeting will be canceled for New Years Day.  We'll come back two
weeks after that.  Happy New Year!

Next meeting will be on Thursday, January 15 at 8:30am PST (UTC-8),
everybody is welcome: https://meet.google.com/jak-ytdx-hnm

Topics for the next meeting:
 - updates on Bharata's patch series with new benchmarks and consolidation
   of tunables
 - workloads to use as the industry standard beyond just memcached, such
   as redis-memtier
 - later: Gregory's analysis of more production-like workloads
 - discuss generalized subsystem for providing bandwidth information
   independent of the underlying platform, ideally through resctrl,
   otherwise utilizing bandwidth information will be challenging
   + preferably this bandwidth monitoring is not per NUMA node but rather
     slow and fast
 - similarly, discuss generalized subsystem for providing memory hotness
   information
 - determine minimal viable upstream opportunity to optimize for tiering
   that is extensible for future use cases and optimizations
   + extensible for multiple tiers
   + suggestion: limited to 8 bits per page to start, add a precision mode
     later
   + limited to 64 bits per page as a ceiling, may be less
   + must be possible to disable with no memory or performance overhead
 - update on non-temporal stores enlightenment for memory tiering
 - enlightening migrate_pages() for hardware assists and how this work
   will be charged to userspace, including for memory compaction

Please let me know if you'd like to propose additional topics for
discussion, thank you!


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-12-21  4:10 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-21  4:10 [Linux Memory Hotness and Promotion] Notes from December 18, 2025 David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox