From: David Rientjes <rientjes@google.com>
To: Davidlohr Bueso <dave@stgolabs.net>, Fan Ni <nifan.cxl@gmail.com>,
Gregory Price <gourry@gourry.net>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Joshua Hahn <joshua.hahnjy@gmail.com>,
Raghavendra K T <rkodsara@amd.com>,
"Rao, Bharata Bhasker" <bharata@amd.com>,
SeongJae Park <sj@kernel.org>, Wei Xu <weixugc@google.com>,
Xuezheng Chu <xuezhengchu@huawei.com>,
Yiannis Nikolakopoulos <yiannis@zptcorp.com>,
Zi Yan <ziy@nvidia.com>
Cc: linux-mm@kvack.org
Subject: [Linux Memory Hotness and Promotion] Notes from April 9, 2026
Date: Sun, 12 Apr 2026 17:30:20 -0700 (PDT) [thread overview]
Message-ID: <4b9961f6-8571-1d45-6a67-2c9896ac04ef@google.com> (raw)
Hi everybody,
Here are the notes from the last Linux Memory Hotness and Promotion call
that happened on Thursday, April 9. Thanks to everybody who was involved!
These notes are intended to bring people up to speed who could not attend
the call as well as keep the conversation going in between meetings.
----->o-----
Shivank updated offline that he is working on addressing the v4 review
feedback for his patch series and that he posted a compaction benchmark
result on the v4 thread showing DMA offload freeing ~6% more CPU cycles
for competing workloads on a busy system.
----->o-----
Bharata updated on the status of v6 of his patch series. He has addressed
all of the review comments and is getting ready to post v7. This series
will drop the RFC tag. It will also include another source of page
hotness information: the IBS based memory profiler. This is a new
instance of IBS that is available on Zen6 and later. Earlier revisions
were using the standard IBS subsystem and there was an open question abou
how this would be shared with the Linux perf subsystem.
He is also working on migration from non-process context: in this case
there is no access to VMAs or VMA flags. This poses a limitation for
shared executable pages and prevents them from getting promoted/migrated.
He was specifically looking at migrate_misplaced_folio_prepare() which has
a folio; with traditional NUMA Balancing this has a VMA, in process
context. This will not be there in the asynchronous promotion path
through kmigrated. In process context, you would be able to check if
VM_EXEC is set or if the folio is mapped shared.
The v7 of this series should be available within two weeks time.
----->o-----
Joshua updated on tier-aware memcg limits and suggested that v2 is going
to look very different than v1. This is a byproduct of being the first
project that is bringing the concept of tiering to memcg, that has caused
a lot of prerequisite work. There is an awkward interaction with per-cpu
stock and limit checking for top tier memory. He started looking into how
stock could work with different page counter metrics.
Wei suggested treating all the stock as top tier and this should be the
default location for where the memory originates from. Joshua tried to
rework the stock mechanism which is per-memcg per cpu but this is now
pushed to the page counter level where each page counter has its own
stock. There are other tangential benefits to doing it that way,
including for non-tiered users. We will have two different page counters:
one for top tier and one for low tier; this is not user visible, however.
Instead of passing a number of pages to charge, we'd either pass a folio
or an indication if the node is top tier or not. This also requires
converting all the memcg stat items to lruvec stat items.
Joshua noted that there are configurations where kernel memory would come
from lower tiers if set as ZONE_NORMAL. In very stressed situations, he
has observed socket memory getting demoted to the low tier.
The page counter addition would be sent out soon and then we can decide
how to manage stock for top tier memory.
----->o-----
Yiannis updated that he was looking into the non-temporal stores for
memory tiering. He's prepared a follow-up from his previous patch series
that was shared with this group that should be posted upstream by Monday.
Preferably this would include performance numbers to share.
He is slightly concerned about the duplication of arch/x86 code that is
called into for memory copy from the migrate_pages() path. The next
proposal may not be the cleanest implementation but he was still looking
to solicit upstream feedback.
Bharata asked if the non-temporal store work is happening in parallel to
Shivank's work for DMA offload. Yiannis looked into the first version of
Shivank's series but hasn't looked recently. The goal was to get
non-temporal store feedback even independent of other work happening.
Bharata noted that he was doing experiments for non-temporal writes in the
page clearing path. This shows promising throughput results with
handwritten benchmarks but when running for upstream benchmarks the gain
was not as significant. Yiannis noted that his main motivation was for
compression backends. Wei noted that using non-temporal writes should
reduce bandwidth consumption to the device.
----->o-----
Next meeting will be on Thursday, April 23 at 8:30am PDT (UTC-7),
everybody is welcome: https://meet.google.com/jak-ytdx-hnm
Topics for the next meeting:
- upcoming non-RFC v7 of Bharata's patch series, including new IBS
hotness data separated from the general IBS subsystem
- v4 of Shivank's series for enlightening migrate_pages() for hardware
assists and how this work will be charged to userspace, including for
memory compaction
- v2 of tier-aware memcg limits, including new page counters and rework
to pass folios into the charge path
- Yiannis's patch series for non-temporal stores support
- discuss generalized subsystem for providing bandwidth information
independent of the underlying platform, ideally through resctrl,
otherwise utilizing bandwidth information will be challenging
+ preferably this bandwidth monitoring is not per NUMA node but rather
slow and fast
- later: testing of tier aware memcg limits with Bharata's changes once
tier aware memcg limits is stable and further along
Please let me know if you'd like to propose additional topics for
discussion, thank you!
reply other threads:[~2026-04-13 0:30 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4b9961f6-8571-1d45-6a67-2c9896ac04ef@google.com \
--to=rientjes@google.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=bharata@amd.com \
--cc=dave@stgolabs.net \
--cc=gourry@gourry.net \
--cc=joshua.hahnjy@gmail.com \
--cc=linux-mm@kvack.org \
--cc=nifan.cxl@gmail.com \
--cc=rkodsara@amd.com \
--cc=sj@kernel.org \
--cc=weixugc@google.com \
--cc=xuezhengchu@huawei.com \
--cc=yiannis@zptcorp.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox