linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Linux Memory Hotness and Promotion] Notes from May 8, 2025
@ 2025-05-12  2:08 David Rientjes
  0 siblings, 0 replies; only message in thread
From: David Rientjes @ 2025-05-12  2:08 UTC (permalink / raw)
  To: Davidlohr Bueso, Fan Ni, Gregory Price, Jonathan Cameron,
	Joshua Hahn, Raghavendra K T, Rao, Bharata Bhasker,
	SeongJae Park, Xuezheng Chu, Yiannis Nikolakopoulos, Zi Yan
  Cc: linux-mm

Hi everybody,

Here are the notes from the inaugural Linux Memory Hotness and Promotion
call that happened on Thursday, May 8.  Thanks to everybody who was
involved!

These notes are intended to bring people up to speed who could not attend 
the call as well as keep the conversation going in between meetings.

----->o-----
Bharata referred to the "Kernel daemon for detecting and promoting hot
pages" patch series[1] and the previous industry-wide MM alignment
meeting on it.  Given feedback from that session, Bharata is working on
separating out migration from the existing NUMA Balancing code.  An early
prototype is available to do this in an async manner and with batching.

After the folio is isolated, it is added to a list that is part of the
task structure.  This would need us to track the target node id.  Today
this uses the last_cpupid field to track the target node id.  The
migration is done in task work context with task_numa_work().  This
fetches the target node id and then we migrate in batch similar to
Gregory Price's unmapped folio promotion patch series[2].

One obivous problem is that the last_cpupid field is carried from the old
folio to the new folio as part of migration.  It may not be important to
carry over since this is a new start for the folio on the new node.

Wei Xu asked if we really wanted to use last_cpupid for this since NUMA
Balancing will not be the only way to do this migration in the future.
Additional, for isolation, Google noted that too many pages can become
isolated with an approach like this.

Gregory Price noted that the reason why task->migrate_list was originally
implemented the way it was is to limit the number of folios at any given
time.  If every task isolates every folio, this causes other issues.  We
likely will need a limit on this in his patch series[2].

Wei expressed a concern about the number of bits that can be used from
struct page.  Davidlohr noted that we don't need the information anymore
one the folio had been queued up for promotion.  last_cpupid is only
needed for NUMA Balancing, the asynchronous migration context can store
it anywhere: Gregory threw out the possibility of a per-cpu migrate list
instead of task->migrate_list which would naturally capture the cpu
accessing information.

----->o-----
I asked about whether these asynchronous promotion kthreads would still
be singled threaded or if they would be per NUMA node.  Bharata clarified
that with his current design that it was one one thread per node.  If the
length of the promotion list is limited, this is the same as a migration
failure from the kthread, the folio gets left behind.

Gregory noted that with his implementation of task->migrate_list that
this is done in task_work.  Davidlohr said this sounds very expensive.
Gregory agreed and said that this opens us up to long running isolations
depending on how long it takes to migrate with the kthread.

I asked if this has to a single kthread for per NUMA node or whether this
should just be a kworker.  Raghavendra said that it is better off as a
kthread so that throttling is centrally managed.  Davidlohr said the per-
node approach should work based on precedent like kswapd, kcompactd, etc.

----->o-----
Zi Yan asked how the amount of work to handle the migration would be
charged if this is done by the kernel in a kthread.  Previously, this
would be done in process context for NUMA Balancing.

Wei asked how this was different from kswapd, the kernel is doing this
work transparently.  I said it was on behalf of the system as a whole,
just like reclaim; the kernel has to be the source of truth for memory
placement and there are examples like khugepaged that optimize for
individiual process performance.

For Zi's work, he wants the ability to charge back the cost of the
migration to the process itself.  If userspace decides to call
move_pages(), then the cost would be charged to the process instead of
kernel doing it through kpromoted.

Bharata noted one cheap way to do this would be to track and charge how
many folios that a particular process is queuing for promotion.

----->o-----
Next meeting will be on Thursday, May 22 at 8:30am PDT (UTC-7), everybody
is welcome: https://meet.google.com/jak-ytdx-hnm

Topics for the next meeting:

 - go through the cover letter and shared drive; if you are not included,
   send me your email address.  Email addresses must be registered as
   Google accounts, like using your corporate email to sign up for a
   gmail account or providing a personal email account that will not be
   shared publicly
 - update from Bharata on separating out migration from NUMA balancing
   patch series
   + Bharata is looking to have the next patch series posted before the
     next meeting
 - discussion on limiting the number of folios that can be isolated at
   any given time to not interfere with other parts of the system
 - following up with Raghavendra on fixing issues identified by Davidlohr
   in earlier series[3]
 - enlightening migrate_pages() for hardware assists and how this work
   will be charged to userspace

Please let me know if you'd like to propose additional topics for
discussion, thank you!

[1] https://lore.kernel.org/lkml/20250306054532.221138-1-bharata@amd.com/
[2] https://lkml.iu.edu/hypermail/linux/kernel/2504.1/08111.html
[3] https://lore.kernel.org/linux-mm/ff53d70a-7d59-4f0d-aad0-03628f9d8b67@amd.com/


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-05-12  2:08 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-12  2:08 [Linux Memory Hotness and Promotion] Notes from May 8, 2025 David Rientjes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox