From: SeongJae Park <sj@kernel.org>
To: Raghavendra K T <raghavendra.kt@amd.com>
Cc: SeongJae Park <sj@kernel.org>,
linux-mm@kvack.org, akpm@linux-foundation.org,
lsf-pc@lists.linux-foundation.org, bharata@amd.com,
gourry@gourry.net, nehagholkar@meta.com, abhishekd@meta.com,
ying.huang@linux.alibaba.com, nphamcs@gmail.com,
hannes@cmpxchg.org, feng.tang@intel.com, kbusch@meta.com,
Hasan.Maruf@amd.com, david@redhat.com, willy@infradead.org,
k.shutemov@gmail.com, mgorman@techsingularity.net,
vbabka@suse.cz, hughd@google.com, rientjes@google.com,
shy828301@gmail.com, liam.howlett@oracle.com,
peterz@infradead.org, mingo@redhat.com, nadav.amit@gmail.com,
shivankg@amd.com, ziy@nvidia.com, jhubbard@nvidia.com,
AneeshKumar.KizhakeVeetil@arm.com, linux-kernel@vger.kernel.org,
jon.grimm@amd.com, santosh.shukla@amd.com, Michael.Day@amd.com,
riel@surriel.com, weixugc@google.com, leesuyeon0506@gmail.com,
honggyu.kim@sk.com, leillc@google.com, kmanaouil.dev@gmail.com,
rppt@kernel.org, dave.hansen@intel.com
Subject: Re: [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion based on PTE A bit scanning
Date: Thu, 23 Jan 2025 10:20:50 -0800 [thread overview]
Message-ID: <20250123182050.53941-1-sj@kernel.org> (raw)
In-Reply-To: <20250123105721.424117-1-raghavendra.kt@amd.com>
Hi Raghavendra,
On Thu, 23 Jan 2025 10:57:21 +0000 Raghavendra K T <raghavendra.kt@amd.com> wrote:
> Bharata and I would like to propose the following topic for LSFMM.
>
> Topic: Overhauling hot page detection and promotion based on PTE A bit scanning.
Thank you for proposing this. I'm interested in this!
>
> In the Linux kernel, hot page information can potentially be obtained from
> multiple sources:
>
> a. PROT_NONE faults (NUMA balancing)
> b. PTE Access bit (LRU scanning)
> c. Hardware provided page hotness info (like AMD IBS)
>
> This information is further used to migrate (or promote) pages from slow memory
> tier to top tier to increase performance.
>
> In the current hot page promotion mechanism, all the activities including the
> process address space scanning, NUMA hint fault handling and page migration are
> performed in the process context. i.e., scanning overhead is borne by the
> applications.
I understand that you're mentioning about only fully in-kernel solutions. Just
for readers' context, SK hynix' HMSDK cpacity expansion[1] does the works in
two asynchronous threads (one for promotion and the other for demotion), using
DAMON in kernel as the core worker, and controlling DAMON from the user-space.
>
> I had recently posted a patch [1] to improve this in the context of slow-tier
> page promotion. Here, Scanning is done by a global kernel thread which routinely
> scans all the processes' address spaces and checks for accesses by reading the
> PTE A bit. The hot pages thus identified are maintained in list and subsequently
> are promoted to a default top-tier node. Thus, the approach pushes overhead of
> scanning, NUMA hint faults and migrations off from process context.
>
> The topic was presented in the MM alignment session hosted by David Rientjes [2].
> The topic also finds a mention in S J Park's LSFMM proposal [3].
>
> Here is the list of potential discussion points:
Great discussion points, thank you. I'm adding how DAMON tries to deal with
some of the points below.
> 1. Other improvements and enhancements to PTE A bit scanning approach. Use of
> multiple kernel threads,
DAMON allows use of multiple kernel threads for different monitoring scopes.
There were also ideas for splitting the monitoring part and migration-like
system operation part to different threads.
> throttling improvements,
DAMON provides features called "adaptive regions adjustment" and "DAMOS quotas"
for throttling overheads from access monitoring and migration-like system
operation actions.
> promotion policies,
DAMON's access-aware system operation feature (DAMOS) allows setting this kind
of system operation policy based on access pattern and additional information
including page level information such as anonymousness, belonging cgroup, page
granular A bit recheck.
> per-process opt-in via prctl,
DAMON allows making the system operation action to pages belonging to specific
cgroups using a feature called DAMOS filters. It is not integrated with prctl,
and would work in cgroups scope, but may be able to be used. Extending DAMOS
filters for belonging processes may also be doable.
> virtual vs physical address based scanning,
DAMON supports both virtual and physical address spaces monitoring. DAMON's
pages migration is currently not supported for virtual address spaces, though I
believe adding the support is not difficult.
I'm bit in favor or physical address space, probably because I'm biased to what
DAMON currently supports, but also due to unmapped pages promotion like edge
cases.
> tuning hot page detection algorithm etc.
DAMON requires users manually tuning some important paramters for hot pages
detection. We recently provided a tuning guide[2], and working on making it
automated. I believe the essential problem is similar to many use cases
regardless of the type of low level access check primitives, so want to learn
if the tuning automation idea can be generally used.
>
> 2. Possibility of maintaining single source of truth for page hotness that would
> maintain hot page information from multiple sources and let other sub-systems
> use that info.
DAMON is currently using the PTE A bit as the essential access check primitive.
We designed DAMON to be able to be extended for other access check primitives
such as page faults and AMD IBS like h/w features. We are now planning to do
such extension, though still in the very early low-priority planning stage.
DAMON also provides the kernel API.
>
> 3. Discuss how hardware provided hotness info (like AMD IBS) can further aid
> promotion. Bharata had posted an RFC [4] on this a while back.
Maybe CXL Hotness Monitoring Unit could also be an interesting thing to discuss
together.
>
> 4. Overlap with DAMON and potential reuse.
I confess that it seems some of the works might overlap with DAMON to my biased
eyes. I'm looking forward to attend this session, to make it less biased and
more aligned with people :)
>
> Links:
>
> [1] https://lore.kernel.org/all/20241201153818.2633616-1-raghavendra.kt@amd.com/
> [2] https://lore.kernel.org/linux-mm/20241226012833.rmmbkws4wdhzdht6@ed.ac.uk/T/
> [3] https://lore.kernel.org/lkml/Z4XUoWlU-UgRik18@gourry-fedora-PF4VCD3F/T/
> [4] https://lore.kernel.org/lkml/20230208073533.715-2-bharata@amd.com/
Again, thank you for proposing this topic, and I wish to see you at Montreal!
[1] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion
[2] https://lkml.kernel.org/r/20250110185232.54907-1-sj@kernel.org
Thanks,
SJ
next prev parent reply other threads:[~2025-01-23 18:20 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-23 10:57 Raghavendra K T
2025-01-23 18:20 ` SeongJae Park [this message]
2025-01-24 8:54 ` Raghavendra K T
2025-01-24 18:05 ` Jonathan Cameron
2025-01-24 5:53 ` Hyeonggon Yoo
2025-01-24 9:02 ` Raghavendra K T
2025-01-27 7:01 ` David Rientjes
2025-01-27 7:11 ` Raghavendra K T
2025-02-06 3:14 ` Yuanchu Xie
2025-01-26 2:27 ` Huang, Ying
2025-01-27 5:11 ` Bharata B Rao
2025-01-27 18:34 ` SeongJae Park
2025-02-07 8:10 ` Huang, Ying
2025-02-07 9:06 ` Gregory Price
2025-02-07 19:52 ` SeongJae Park
2025-02-07 19:06 ` Davidlohr Bueso
2025-03-14 1:56 ` Raghavendra K T
2025-03-14 2:12 ` Raghavendra K T
2025-01-31 12:28 ` Jonathan Cameron
2025-01-31 13:09 ` [LSF/MM/BPF TOPIC] Unifying sources of page temperature information - what info is actually wanted? Jonathan Cameron
2025-02-05 6:24 ` Bharata B Rao
2025-02-05 16:05 ` Johannes Weiner
2025-02-06 6:46 ` SeongJae Park
2025-02-06 15:30 ` Jonathan Cameron
2025-02-07 9:50 ` Matthew Wilcox
2025-02-16 7:04 ` Huang, Ying
2025-02-16 6:49 ` Huang, Ying
2025-02-17 4:10 ` Bharata B Rao
2025-02-17 8:06 ` Huang, Ying
2025-03-14 14:24 ` Jonathan Cameron
2025-03-17 22:34 ` Davidlohr Bueso
2025-02-03 2:23 ` [LSF/MM/BPF TOPIC] Overhauling hot page detection and promotion based on PTE A bit scanning Raghavendra K T
2025-04-07 3:13 ` Bharata B Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250123182050.53941-1-sj@kernel.org \
--to=sj@kernel.org \
--cc=AneeshKumar.KizhakeVeetil@arm.com \
--cc=Hasan.Maruf@amd.com \
--cc=Michael.Day@amd.com \
--cc=abhishekd@meta.com \
--cc=akpm@linux-foundation.org \
--cc=bharata@amd.com \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=feng.tang@intel.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=honggyu.kim@sk.com \
--cc=hughd@google.com \
--cc=jhubbard@nvidia.com \
--cc=jon.grimm@amd.com \
--cc=k.shutemov@gmail.com \
--cc=kbusch@meta.com \
--cc=kmanaouil.dev@gmail.com \
--cc=leesuyeon0506@gmail.com \
--cc=leillc@google.com \
--cc=liam.howlett@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nadav.amit@gmail.com \
--cc=nehagholkar@meta.com \
--cc=nphamcs@gmail.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=santosh.shukla@amd.com \
--cc=shivankg@amd.com \
--cc=shy828301@gmail.com \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox