* [Linux Memory Hotness and Promotion] Notes from December 4, 2025
@ 2025-12-16 3:16 David Rientjes
0 siblings, 0 replies; only message in thread
From: David Rientjes @ 2025-12-16 3:16 UTC (permalink / raw)
To: Davidlohr Bueso, Fan Ni, Gregory Price, Jonathan Cameron,
Joshua Hahn, Raghavendra K T, Rao, Bharata Bhasker,
SeongJae Park, Wei Xu, Xuezheng Chu, Yiannis Nikolakopoulos,
Zi Yan
Cc: linux-mm
Hi everybody,
Here are the notes from the last Linux Memory Hotness and Promotion call
that happened on Thursday, December 4. Thanks to everybody who was
involved!
These notes are intended to bring people up to speed who could not attend
the call as well as keep the conversation going in between meetings.
----->o-----
Bharata updated that he is ready to post v4 of his series with two main
changes: per section indicator for hotness, which reduces the effort
required for kmigrated, and then folio_mark_accessed() that shows good
results in initial testing. Bharata is also working on getting the right
configuration for redis-memtier to ensure we are moving memory back and
forth for promotion and demotion.
Discussing the consolidation of tunables, the current plan was to ensure
that these live under sysfs. There was discussion about doing this in
debugfs first, but the API should be solidified enough by the time of
upstream inclusion that this won't be necessary.
----->o-----
I pivoted the discussion to talk about single threaded vs multi threaded
promotion. Wei Xu noted that we likely need more than one thread,
probably at least one per NUMA node. He said the memory bandwidth
contention was so severe that promotion was very reduced. Hardware assist
can help, and so can memory bandwidth QoS can help, but another option is
also multi threaded promotions. Jonathan Cameron asked if multiple
threads were being used for QoS; Wei said that by moving the memory off
the low tier that we are shifting the bandwidth elsewhere. The ideal
scenario, Wei said, was to ensure that hardware based memory bandwidth QoS
would ensure that the promotion threads can make steady progress.
Gregory suggested this may be a transient factor, we've had the discussion
before about how aggressive tiering should be to improve overall latency.
Moving things as fast as possible may not always be the desired effect,
the goal is to converge on stability so you may not maximize bandwidth to
balance memory but rather eventually reach an ideal steady state. Wei
shared that in some cases this was as limited as <100MB/s for promotion as
a result of bandwidth saturation. Gregory asked how realistic of a
scenario this actually is because the fact that we have gotten into this
situation is already problematic: the only scenario whre this "should"
happen is if the DRAM tier bandwidth is already capped and the amount of
hot memory exceeds the DRAM tier capacity in which case we would have
thrashing. The goal is to ensure this scenario doesn't happen at all.
Jonathan said boosting the migration threads may not be the ultimate
solution, but rather reducing the workload that is using all the bandwidth
from being scheduled. Wei believed that a single threaded promotion
thread could still be a bottleneck. Gregory suggested that 100MB/s
promotion may not be too slow, the goal is to eventually get there but not
by promoting memory over a very short window -- if we promote as fast as
possible, we could find that memory does not remain hot in which case the
promotion may not have been justified.
----->o-----
We shifted to discussing about workloads that can be used for
experimentation. Gregory suggested we reached critical mass on the topic
such that we would really benefit from production data or actual
deployments. He took the AI to do this, although admittedly it may take
some time.
Wei had been working primarily with memcached although production data was
not imminent.
----->o-----
Next meeting will be on Thursday, December 18 at 8:30am PST (UTC-8),
everybody is welcome: https://meet.google.com/jak-ytdx-hnm
Topics for the next meeting:
- updates on Bharata's RFC v4 with new benchmarks and consolidation of
tunables
- continued discussion on memory overheads used to save the memory
hotness state and the list of promotion targets
- workloads to use as the industry standard beyond just memcache, such as
redis
- later: Gregory's analysis of more production-like workloads
- discuss generalized subsystem for providing bandwidth information
independent of the underlying platform, ideally through resctrl,
otherwise utilizing bandwidth information will be challenging
+ preferably this bandwidth monitoring is not per NUMA node but rather
slow and fast
- similarly, discuss generalized subsystem for providing memory hotness
information
- determine minimal viable upstream opportunity to optimize for tiering
that is extensible for future use cases and optimizations
- update on non-temporal stores enlightenment for memory tiering
- enlightening migrate_pages() for hardware assists and how this work
will be charged to userspace, including for memory compaction
Please let me know if you'd like to propose additional topics for
discussion, thank you!
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-12-16 3:17 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-16 3:16 [Linux Memory Hotness and Promotion] Notes from December 4, 2025 David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox