* [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
@ 2025-01-01 22:20 SeongJae Park
2025-01-02 4:09 ` Matthew Wilcox
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: SeongJae Park @ 2025-01-01 22:20 UTC (permalink / raw)
To: lsf-pc
Cc: SeongJae Park, damon, linux-mm, linux-kernel, kernel-team,
Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Gregory Price,
Kaiyang Zhao, Jiaming Yan, Honggyu Kim
Hi all,
I find a few interesting and promising projects that aim to do efficient access
pattern-aware memory management of near future, including below (alphabetically
sorted).
- CXL hotness monitoring unit
(https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com)
- Memory tiering fainess by per-cgroup control of promotion and demotion
(https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu)
- Promotion of unmapped page cache folios
(https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)
- Slow-tier page promotion based on PTE A bit
(https://lore.kernel.org/20241201153818.2633616-1-raghavendra.kt@amd.com)
- Workingset reporting
(https://lore.kernel.org/20241127025728.3689245-1-yuanchu@google.com)
The goal of DAMON is to help accelerating such developments by being a
framework that can reduce fundamental efforts for monitoring memory access
patterns and managing memory using the information. AWS Aurora Serverless v2
and SK hynix are successfully using DAMON in the way for proactive memory
reclamation[1] and CXL memory tiering[2].
To further deliver such benefits for the ongoing and future projects, we need
to better understand what the projects really need, how DAMON can provide those
now or in future, and if there are alternatives better than DAMON. Regardless
of the conclusion about DAMON, the works apparently have common parts, so the
discussion will benefit all.
I propose to have the discussion at LSF/MM/BPF. In the session, I will briefly
introduce the works and possible DAMON usages, and continue the open discussion
for better understanding each other. The discussion will not be limited to
DAMON and abovely mentioned projects but possible alternatives and general
access-aware memory management projects. After the discussion, we will
hopefully find ways to efficiently collaborate, or at least do not disturb each
other.
[1] https://assets.amazon.science/ee/a4/41ff11374f2f865e5e24de11bd17/resource-management-in-aurora-serverless.pdf
[2] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion
Thanks,
SJ
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-01 22:20 [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future SeongJae Park @ 2025-01-02 4:09 ` Matthew Wilcox 2025-01-02 15:22 ` Gregory Price 2025-01-14 3:06 ` Gregory Price ` (2 subsequent siblings) 3 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2025-01-02 4:09 UTC (permalink / raw) To: SeongJae Park Cc: lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Gregory Price, Kaiyang Zhao, Jiaming Yan, Honggyu Kim On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote: > Hi all, > > > I find a few interesting and promising projects that aim to do efficient access > pattern-aware memory management of near future, including below (alphabetically > sorted). > > - CXL hotness monitoring unit > (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com) > - Memory tiering fainess by per-cgroup control of promotion and demotion > (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu) > - Promotion of unmapped page cache folios > (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net) I'm not sure how DAMON can help with this one. As I understand DAMON, it monitors accesses to user addresses. This patchset is trying to solve the problem for file pages which aren't mapped to userspace at all. ie only accessed through read() and write(). ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-02 4:09 ` Matthew Wilcox @ 2025-01-02 15:22 ` Gregory Price 2025-01-02 18:00 ` SeongJae Park 0 siblings, 1 reply; 14+ messages in thread From: Gregory Price @ 2025-01-02 15:22 UTC (permalink / raw) To: Matthew Wilcox Cc: SeongJae Park, lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim On Thu, Jan 02, 2025 at 04:09:38AM +0000, Matthew Wilcox wrote: > On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote: > > Hi all, > > > > > > I find a few interesting and promising projects that aim to do efficient access > > pattern-aware memory management of near future, including below (alphabetically > > sorted). > > > > - CXL hotness monitoring unit > > (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com) > > - Memory tiering fainess by per-cgroup control of promotion and demotion > > (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu) > > - Promotion of unmapped page cache folios > > (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net) > > I'm not sure how DAMON can help with this one. As I understand DAMON, > it monitors accesses to user addresses. This patchset is trying to solve > the problem for file pages which aren't mapped to userspace at all. > ie only accessed through read() and write(). DAMON can monitor physical addresses to, though the mechanism is different. I haven't assessed this as a solution, yet. ~Gregory ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-02 15:22 ` Gregory Price @ 2025-01-02 18:00 ` SeongJae Park 2025-01-02 18:04 ` SeongJae Park 0 siblings, 1 reply; 14+ messages in thread From: SeongJae Park @ 2025-01-02 18:00 UTC (permalink / raw) To: Gregory Price Cc: SeongJae Park, Matthew Wilcox, lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim On Thu, 2 Jan 2025 10:22:14 -0500 Gregory Price <gourry@gourry.net> wrote: > On Thu, Jan 02, 2025 at 04:09:38AM +0000, Matthew Wilcox wrote: > > On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote: > > > Hi all, > > > > > > > > > I find a few interesting and promising projects that aim to do efficient access > > > pattern-aware memory management of near future, including below (alphabetically > > > sorted). > > > > > > - CXL hotness monitoring unit > > > (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com) > > > - Memory tiering fainess by per-cgroup control of promotion and demotion > > > (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu) > > > - Promotion of unmapped page cache folios > > > (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net) > > > > I'm not sure how DAMON can help with this one. As I understand DAMON, > > it monitors accesses to user addresses. This patchset is trying to solve > > the problem for file pages which aren't mapped to userspace at all. > > ie only accessed through read() and write(). > > DAMON can monitor physical addresses to, though the mechanism is > different. Thank you for answering this, Gregory. As Gregory explained, users can use physical address monitoring mode of DAMON for this. For unmapped pages, DAMON sets and reads PG_idle to check if it is accessed or not. Since PG_idle is respected by read() and write() use case to my understanding, DAMON should be able to check accesses to unmapped pages. > I haven't assessed this as a solution, yet. To quickly see this, I ran below simple test. First, I start DAMON in physical address space monitoring mode, wait for one minutes to let it monitor the accesses of the system, and show the access pattern on the system in access temperature histogram format. $ sudo ./damo start $ sleep 60 $ sudo ./damo report access --style temperature-sz-hist <temperature> <total size> [-7,480,000,000, -7,479,999,999) 59.868 GiB | | total size: 59.868 GiB The access temperature histogram format shows size of memory of given access temperature range. Access temperature is a metric that represents the access hotness. If any access to the region is continuously found, the value increases. If no access to the region is found, the temperature becomes zero. If it continues showing no access, the temperature further decreases (goes to minus). Refer to the document[1] for more details. So from the above output, we can show all memory of the system is not accessed at all for the last minute. Now I start a program that continuously overwrites 10 GiB file in background. Attaching the source code (Attachment 0, dd_like.c) at the bottom of this mail. After a few seconds, I show the temperature histogram again. $ sudo ./damo report access --style temperature-sz-hist <temperature> <total size> [-12,590,000,000, -11,699,000,000) 42.038 GiB |********************| [-11,699,000,000, -10,808,000,000) 0 B | | [-10,808,000,000, -9,917,000,000) 0 B | | [-9,917,000,000, -9,026,000,000) 0 B | | [-9,026,000,000, -8,135,000,000) 5.986 GiB |*** | [-8,135,000,000, -7,244,000,000) 0 B | | [-7,244,000,000, -6,353,000,000) 0 B | | [-6,353,000,000, -5,462,000,000) 0 B | | [-5,462,000,000, -4,571,000,000) 0 B | | [-4,571,000,000, -3,680,000,000) 5.951 GiB |*** | [-3,680,000,000, -2,789,000,000) 5.893 GiB |*** | total size: 59.868 GiB We can show DAMON found about 10 GiB relatively hot regions. This is a very simple test that not well tuned. Maybe because of that, there are details to investigate, including why the 10 GiB regions are having different and negative access temperature. I'll skip those for now, since the point of this test is that DAMON at least somehow react to accesses for unmapped pages. [1] https://github.com/damonitor/damo/blob/next/USAGE.md#access-temperature Thanks, SJ > > ~Gregory ==== Attachment 0 (dd_like.c) ==== #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> int main(int argc, char *argv[]) { int block_size, count; char *dest_path; if (argc != 4) { printf("Usage: ./dd_simulator <block_size> <count> <destination_file>\n"); return 1; } block_size = atoi(argv[1]); count = atoi(argv[2]); dest_path = argv[3]; // Validate input parameters if (block_size <= 0 || count <= 0) { fprintf(stderr, "Invalid block size or count\n"); return 1; } // Write block size of zeroes 'count' times char *zeroes = calloc(block_size, sizeof(char)); if (!zeroes) { fprintf(stderr, "Memory allocation failed\n"); return 1; } while (1) { // Open destination file in write mode FILE *dest_file = fopen(dest_path, "w"); if (!dest_file) { fprintf(stderr, "Failed to open %s for writing: %s\n", dest_path, strerror(errno)); return 1; } // Write block size of zeroes 'count' times for (int i = 0; i < count; i++) fwrite(zeroes, block_size, 1, dest_file); fclose(dest_file); printf("one pass"); } free(zeroes); return 0; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-02 18:00 ` SeongJae Park @ 2025-01-02 18:04 ` SeongJae Park 0 siblings, 0 replies; 14+ messages in thread From: SeongJae Park @ 2025-01-02 18:04 UTC (permalink / raw) To: SeongJae Park Cc: Gregory Price, Matthew Wilcox, lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim On Thu, 2 Jan 2025 10:00:19 -0800 SeongJae Park <sj@kernel.org> wrote: > On Thu, 2 Jan 2025 10:22:14 -0500 Gregory Price <gourry@gourry.net> wrote: > > > On Thu, Jan 02, 2025 at 04:09:38AM +0000, Matthew Wilcox wrote: > > > On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote: > > > > Hi all, > > > > > > > > > > > > I find a few interesting and promising projects that aim to do efficient access > > > > pattern-aware memory management of near future, including below (alphabetically > > > > sorted). > > > > > > > > - CXL hotness monitoring unit > > > > (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com) > > > > - Memory tiering fainess by per-cgroup control of promotion and demotion > > > > (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu) > > > > - Promotion of unmapped page cache folios > > > > (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net) > > > > > > I'm not sure how DAMON can help with this one. As I understand DAMON, > > > it monitors accesses to user addresses. This patchset is trying to solve > > > the problem for file pages which aren't mapped to userspace at all. > > > ie only accessed through read() and write(). > > > > DAMON can monitor physical addresses to, though the mechanism is > > different. > > Thank you for answering this, Gregory. As Gregory explained, users can use > physical address monitoring mode of DAMON for this. For unmapped pages, DAMON > sets and reads PG_idle to check if it is accessed or not. Since PG_idle is > respected by read() and write() use case to my understanding, DAMON should be > able to check accesses to unmapped pages. > > > I haven't assessed this as a solution, yet. > > To quickly see this, I ran below simple test. [...] > the point of this test is that DAMON at least somehow react to accesses for > unmapped pages. Forgot clarifying this point, sorry. My test shows DAMON can detect accesses to unmapped pages, but not asseses if it is feasible as the unmapped pages promotion solution. More works and discussions would be needed for that. Thanks, SJ [...] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-01 22:20 [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future SeongJae Park 2025-01-02 4:09 ` Matthew Wilcox @ 2025-01-14 3:06 ` Gregory Price 2025-01-24 2:11 ` SeongJae Park 2025-01-30 2:15 ` Yuanchu Xie 2025-01-20 18:46 ` Jonathan Cameron 2025-03-25 21:01 ` SeongJae Park 3 siblings, 2 replies; 14+ messages in thread From: Gregory Price @ 2025-01-14 3:06 UTC (permalink / raw) To: SeongJae Park Cc: lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote: > Hi all, > > > I find a few interesting and promising projects that aim to do efficient access > pattern-aware memory management of near future, including below (alphabetically > sorted). > > - Promotion of unmapped page cache folios > (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net) I'll break down a few observations I made while hacking on unmapped page cache promotion - and my concerns for a leveraging DAMON here. Additionally some other concerns I've seen raised about duplicating promotion logic across various kernel components. Latest RFC: https://lore.kernel.org/linux-mm/20250107000346.1338481-1-gourry@gourry.net/ Basic Premise: Use folio_mark_accessed() as a measure of hotness for promotion. Defer promotion to task_work due to locking complexities. My major concerns / lessons learned from this exercise include: 1) The cost of checking promotion candidacy can be problematic In my microbenchmark in the last RFC version, I showed that while the performance upside (~22-25%) is substantial, there was a non-trivial cost associated with injecting even a single global boolean check in the file_read() path. This was unexpected. I can probably optimize the disabled case with a likely() clause, but I did not expect such sensitivity. This tells me injecting an unconditional call into DAMON may be too much overhead. I would need to explore this further - including whether it is feasible to inject such a large dependency into swap.c This may not affect all cases, but it does affect at least this one. 2) The complexity of "when it is safe" to promote a folio is subtle at best, and "actively hostile" at worst. I learned in v1 of the RFC that promotion inline with fma() is not feasible due to a few contexts (task dying in particular) in which migration is not safe. I deferred to task work because I noticed prior attempts (in development notes) had seen similar issues. Adding a folio reference and/or page flag to defer that migration to another context (i.g. async kthread) solves this at the expensive of implementation complexity. (leaked folios if done wrong) I'd have to look at whether it's worth the increased complexity to aggregate this (particular) identification mechanism - but I think there is clear value to aggregating promotion. I could see some value in pumping tracking bits into DAMON - but I also see value is making tasks handle promotion as a form of fairness. 3) There were expressed opinions on runtime fairness WRT to promotion. There's two competing thoughts: A) Making accessing tasks eat inline promotion cost captures that cost in their runtime slice, promoting fairness in scheduling. B) Aggregating promotion to an external thread can reduce inline faults and tail latencies, but may hides per-task cost. This is a concern if one task drives all the promotions, effectingly stealing an entire core by nature of the async design. I don't have a good answer to this, just an observation that charging promotion time to the identifying task was a concern that was raised. 4) TPP and Unmapped Page Promotion may affect each other. There is a rate-limiting mechanism in the migration path that was intended to prevent over-pressuring bandwidth with aggressive migrations - prevent major memory stalls. By adding more pressure on this limit from an additional source, we're obviously increasing the time it takes to converge. This is probably the greatest argument for creating a new, aggregated promotion mechanism to serve all of these identification mechanism. This would make it easier for us to determine whether/what identification mechanisms can be aggregated while enabling forward progress on each of them separately. 5) Scarce resources We need to be careful not to consume excessive amounts of resources in an attempt to track all these identifying mechanisms. Even 1 byte per folio is 256MB on a 1TB machine. This gets out of hand quick. With task-work, I was able to add no additional resource consumption, but deferring to a fully async scenario and needing to track things like last-accessing CPU, timestamps, and etc. We'll need to examine this closely if we decide to aggregate either of these mechanisms. ~Gregory ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-14 3:06 ` Gregory Price @ 2025-01-24 2:11 ` SeongJae Park 2025-01-24 17:21 ` Gregory Price 2025-01-30 2:15 ` Yuanchu Xie 1 sibling, 1 reply; 14+ messages in thread From: SeongJae Park @ 2025-01-24 2:11 UTC (permalink / raw) To: Gregory Price Cc: SeongJae Park, lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim Hello Gregory, On Mon, 13 Jan 2025 22:06:09 -0500 Gregory Price <gourry@gourry.net> wrote: > On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote: > > Hi all, > > > > > > I find a few interesting and promising projects that aim to do efficient access > > pattern-aware memory management of near future, including below (alphabetically > > sorted). > > > > - Promotion of unmapped page cache folios > > (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net) > > > I'll break down a few observations I made while hacking on unmapped > page cache promotion - and my concerns for a leveraging DAMON here. Thank you for sharing this! > > Additionally some other concerns I've seen raised about duplicating > promotion logic across various kernel components. > > > Latest RFC: > https://lore.kernel.org/linux-mm/20250107000346.1338481-1-gourry@gourry.net/ > > Basic Premise: > Use folio_mark_accessed() as a measure of hotness for promotion. > Defer promotion to task_work due to locking complexities. > > My major concerns / lessons learned from this exercise include: > > 1) The cost of checking promotion candidacy can be problematic > > In my microbenchmark in the last RFC version, I showed that while > the performance upside (~22-25%) is substantial, there was a > non-trivial cost associated with injecting even a single global > boolean check in the file_read() path. This was unexpected. > > I can probably optimize the disabled case with a likely() clause, > but I did not expect such sensitivity. This tells me injecting > an unconditional call into DAMON may be too much overhead. I cannot agree more with you about the point that the mechanism for finding the promotion/demotion (and any access-aware system operation) candidates should induce only modest or at least controllable overhead. Actually it was the one of biggest motivations of DAMON design, and I haven't imagined adding unconditional calls to DAMON here. Nonetheless, injecting an unconditional call here should be avoided for not only DAMON calls but any expensive calls? I'm also not pretty sure what DAMON call you are thinking about. > > I would need to explore this further - including whether it is > feasible to inject such a large dependency into swap.c I understand DAMON is not small in terms of the code size, and has many limitations that makes it unusable in many use cases. But, again, I'm not pretty sure what kind of DAMON usage in swap.c you're thinking about, and therefore not easy to understyand what part of DAMON is considered as a large dependency that concerns you. It would be great if we can make more concrete example as a result of this topic session at LSFMMBPF. FYI, I also not having specific idea for helping unmapped pages promotion for now. That's my assignment that I will do by LSFMMBPF. But, a few things that I naively thinking DAMON might be able to help unmapped promotions are, 1. Using DAMON for profiling how much hot and cold unmapped pages are in which tier, and use the information for unmapped pages promotion optimization. 2. Using DAMOS to target-promote hot unmapped pages while using page faults-based promotion for mapped pages. 3. Using DAMOS to promote both mapped and unmapped hot pages. For the first and second ideas, DAMON need to target unmapped pages. I think DAMOS filters can be extended for that, and I posted an RFC before: https://lore.kernel.org/20241127205624.86986-1-sj@kernel.org Using the RFC-applied kernel and a version of DAMON user-space tool that adds the support, idea one could be done like below. $ sudo ./damo report access --snapshot_damos_filter reject none unmapped --style recency-sz-hist # damos filters (df): reject none unmapped <last accessed time (us)> <df-passed size> [-36.300 s, -32.670 s) 10.297 MiB |* | [-32.670 s, -29.040 s) 7.297 MiB |* | [-29.040 s, -25.410 s) 0 B | | [-25.410 s, -21.780 s) 0 B | | [-21.780 s, -18.150 s) 0 B | | [-18.150 s, -14.520 s) 0 B | | [-14.520 s, -10.890 s) 0 B | | [-10.890 s, -7.260 s) 0 B | | [-7.260 s, -3.630 s) 3.088 GiB |********************| [-3.630 s, -0 ns) 80.000 KiB |* | [-0 ns, --3630000000 ns) 16.000 KiB |* | <last accessed time (us)> <total size> [-36.300 s, -32.670 s) 24.493 GiB |********************| [-32.670 s, -29.040 s) 5.869 GiB |***** | [-29.040 s, -25.410 s) 5.568 GiB |***** | [-25.410 s, -21.780 s) 0 B | | [-21.780 s, -18.150 s) 5.899 GiB |***** | [-18.150 s, -14.520 s) 5.807 GiB |***** | [-14.520 s, -10.890 s) 0 B | | [-10.890 s, -7.260 s) 0 B | | [-7.260 s, -3.630 s) 12.231 GiB |********** | [-3.630 s, -0 ns) 356.000 KiB |* | [-0 ns, --3630000000 ns) 396.000 KiB |* | total size: 59.868 GiB The above output was retrieved while a kernel build is running in background, and says among 24.493 GiB cold memory that last accessed more than 32.67 seconds before, 10.297 MiB are unmapped pages. For the third idea, whether and how to collaborate with page faults-based promotion of mapped pages could be something to discuss. Some ideas off the my head is that we can simply make them exclusive, or use DAMOS for proactive promotion under peaceful situation, but uses page faults based promotion for more urgent situation, somewhat like kswapd and direct reclaims. For all three ideas, DAMON will do the monitoring and promotions on DAMON thread, so no change to swap.c or file io path would be required. Again, these are just not-yet-settled brainstorming level ideas, and I will try to make these more specific and settled by LSFMMBPF. Please feel free to add comments on this thread rather than waiting for LSFMMBPF, though! > > This may not affect all cases, but it does affect at least this one. > > 2) The complexity of "when it is safe" to promote a folio is subtle > at best, and "actively hostile" at worst. > > I learned in v1 of the RFC that promotion inline with fma() is not > feasible due to a few contexts (task dying in particular) in which > migration is not safe. I deferred to task work because I noticed > prior attempts (in development notes) had seen similar issues. > > Adding a folio reference and/or page flag to defer that migration to > another context (i.g. async kthread) solves this at the expensive of > implementation complexity. (leaked folios if done wrong) > > I'd have to look at whether it's worth the increased complexity to > aggregate this (particular) identification mechanism - but I think > there is clear value to aggregating promotion. > > I could see some value in pumping tracking bits into DAMON - I agree to all the points and willing to make DAMON well serve the purpose. > but I > also see value is making tasks handle promotion as a form of fairness. I agree that could be good in terms of fairness. I want to learn more about the significance of it, though. > > 3) There were expressed opinions on runtime fairness WRT to promotion. > > There's two competing thoughts: > A) Making accessing tasks eat inline promotion cost captures that > cost in their runtime slice, promoting fairness in scheduling. > > B) Aggregating promotion to an external thread can reduce inline > faults and tail latencies, but may hides per-task cost. This > is a concern if one task drives all the promotions, effectingly > stealing an entire core by nature of the async design. > > I don't have a good answer to this, just an observation that charging > promotion time to the identifying task was a concern that was raised. I think we might be able to pursue two ways in parallel? Using asynchronous external thread in more peaceful situation, and let tasks do inline promotion with fairness under more urgent situation, like kswapd and direct reclaims. DAMON may fit well for the proactive solutions under less urgent situation. DAMON_RECLAIM was made in the direction, and working without significant issues on products for years. > > > 4) TPP and Unmapped Page Promotion may affect each other. > > There is a rate-limiting mechanism in the migration path that was > intended to prevent over-pressuring bandwidth with aggressive > migrations - prevent major memory stalls. > > By adding more pressure on this limit from an additional source, > we're obviously increasing the time it takes to converge. > > This is probably the greatest argument for creating a new, aggregated > promotion mechanism to serve all of these identification mechanism. > > This would make it easier for us to determine whether/what > identification mechanisms can be aggregated while enabling forward > progress on each of them separately. I agree. DAMON allows combining multiple different mechanisms with its core logic, so I beleive it migt be a place that can aggregate the different identification mechanisms. DAMON's access monitoring results based system operations feature, namely DAMOS, also has its own aggressiveness control logic, and resides in the core layer, so could be used consistently with different promotion candidates identification mechanisms. > > 5) Scarce resources > > We need to be careful not to consume excessive amounts of resources > in an attempt to track all these identifying mechanisms. Even 1 byte > per folio is 256MB on a 1TB machine. This gets out of hand quick. > > With task-work, I was able to add no additional resource consumption, > but deferring to a fully async scenario and needing to track things > like last-accessing CPU, timestamps, and etc. > > We'll need to examine this closely if we decide to aggregate either > of these mechanisms. Agreed again. In case of DAMON, it tries to keep the resources in its own data structure. The resource consumption with the own data structure can also be problematic, but it at least allows setting the upper-bound, regardless of the system size. So it is controllable and scalable. I wish to continue more detailed discussions on LSFMMBPF and this thread! Thank you again sharing your experiences and thoughts on this topic. I show those are making the discussion much more informative and helpful. Thanks, SJ > > ~Gregory ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-24 2:11 ` SeongJae Park @ 2025-01-24 17:21 ` Gregory Price 2025-01-25 1:17 ` SeongJae Park 0 siblings, 1 reply; 14+ messages in thread From: Gregory Price @ 2025-01-24 17:21 UTC (permalink / raw) To: SeongJae Park Cc: lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim On Thu, Jan 23, 2025 at 06:11:53PM -0800, SeongJae Park wrote: > Hello Gregory, > > > 1) The cost of checking promotion candidacy can be problematic > > > > In my microbenchmark in the last RFC version, I showed that while > > the performance upside (~22-25%) is substantial, there was a > > non-trivial cost associated with injecting even a single global > > boolean check in the file_read() path. This was unexpected. > > > > I can probably optimize the disabled case with a likely() clause, > > but I did not expect such sensitivity. This tells me injecting > > an unconditional call into DAMON may be too much overhead. > > I cannot agree more with you about the point that the mechanism for finding the > promotion/demotion (and any access-aware system operation) candidates should > induce only modest or at least controllable overhead. Actually it was the one > of biggest motivations of DAMON design, and I haven't imagined adding > unconditional calls to DAMON here. > > Nonetheless, injecting an unconditional call here should be avoided for not > only DAMON calls but any expensive calls? I'm also not pretty sure what DAMON > call you are thinking about. > Just any call, DAMON or otherwise. The explicit check injecting ~2-3% overhead on my microbench was a simple + } else if (!folio_test_isolated(folio) && + (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) && If this is causing additional overhead, call me skeptical that trying anything more complicated will turn out better. > > > > I would need to explore this further - including whether it is > > feasible to inject such a large dependency into swap.c > > I understand DAMON is not small in terms of the code size, and has many > limitations that makes it unusable in many use cases. But, again, I'm not > pretty sure what kind of DAMON usage in swap.c you're thinking about, and > therefore not easy to understyand what part of DAMON is considered as a large > dependency that concerns you. It would be great if we can make more concrete > example as a result of this topic session at LSFMMBPF. > It's not a matter of code size - it's a matter of tightly coupling core components of the kernel to extraneous ones. Adding additional dependencies between components increases overall system complexity and makes it hard to reason about the behavior of the system. For example, in the prior snippet: + } else if (!folio_test_isolated(folio) && + (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) && + numa_pagecache_promotion_enabled) { + promotion_candidate(folio); This amounts to if (some condition && feature enabled) mark a folio as a candidate for promotion The promotion_candidate() function is contained within migrate.c and uses (mostly) migrate.c mechanisms (from a task_work). All you need to understand the behavior is between swap.c and migrate.c. If instead you aggregate this to DAMON, understanding the behavior of swap can require you to understand what DAMON is actually doing with this information. Now you need to understand swap.c, migrate.c, AND DAMON. It makes it more difficult to reason about the system when something goes wrong. This increases the maintenance burden for maintainers (and onboarding complexity for anyone new to the kernel, for that matter). That doesn't mean we shouldn't consider doing this - it just means that benefit needs to outweight the complexity/maintenance cost. > FYI, I also not having specific idea for helping unmapped pages promotion for > now. That's my assignment that I will do by LSFMMBPF. But, a few things that > I naively thinking DAMON might be able to help unmapped promotions are, > > 1. Using DAMON for profiling how much hot and cold unmapped pages are in which > tier, and use the information for unmapped pages promotion optimization. > 2. Using DAMOS to target-promote hot unmapped pages while using page > faults-based promotion for mapped pages. > 3. Using DAMOS to promote both mapped and unmapped hot pages. > This missing the scenario where DAMOS/DAMON is not suitible for deployment in someone's environment. The kernel should still do *something*. And that is kind of the point - we can expose more complexity to the users with DAMON, but the kernel should be able to do some reasonable promotion action without this additional system. > > but I > > also see value is making tasks handle promotion as a form of fairness. > > I agree that could be good in terms of fairness. I want to learn more about > the significance of it, though. > Fairness in this scenario is simple. If one task is causing an outsizes number of promotions to occur, and it causes some ASYNC system to handle those promotions, it is effectively acquiring more CPU time via that ASYNC system than other residents. Trying to charge this time back to the noisey task is harder than just having the task incur the cost of migration. But doing it inline can cause the task to slow down. So it's difficult to predict how it's going to pan out. Need evidence. > I agree. DAMON allows combining multiple different mechanisms with its core > logic, so I beleive it migt be a place that can aggregate the different > identification mechanisms. > > DAMON's access monitoring results based system operations feature, namely > DAMOS, also has its own aggressiveness control logic, and resides in the core > layer, so could be used consistently with different promotion candidates > identification mechanisms. > Without data this is a nice thought, but we have existing mechanisms that work and can be improved - lets not disrupt that. Finding an aggregated promotion solution helps everyone move forward without disrupting development in these areas (and makes the different indentification mechanisms play nice with each other). Trying to also create a voltron "one indentification system to rule them all" is a nice thought, but it's heavy-weight compared to adding a folio flag check and a call to mpol_migrate_misplaced(). We need to respect that reality and not regress the existing mechanisms by trying to over-engineer a generalized solution. ~Gregory ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-24 17:21 ` Gregory Price @ 2025-01-25 1:17 ` SeongJae Park 0 siblings, 0 replies; 14+ messages in thread From: SeongJae Park @ 2025-01-25 1:17 UTC (permalink / raw) To: Gregory Price Cc: SeongJae Park, lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim On Fri, 24 Jan 2025 12:21:31 -0500 Gregory Price <gourry@gourry.net> wrote: > On Thu, Jan 23, 2025 at 06:11:53PM -0800, SeongJae Park wrote: [...] > > > This tells me injecting > > > an unconditional call into DAMON may be too much overhead. [...] > > I'm also not pretty sure what DAMON > > call you are thinking about. > > > > Just any call, DAMON or otherwise. Thanks for clarifying. [...] > It's not a matter of code size - it's a matter of tightly coupling core > components of the kernel to extraneous ones. Adding additional > dependencies between components increases overall system complexity and > makes it hard to reason about the behavior of the system. [...] > Now you need to understand swap.c, migrate.c, AND DAMON. > > It makes it more difficult to reason about the system when something > goes wrong. This increases the maintenance burden for maintainers (and > onboarding complexity for anyone new to the kernel, for that matter). Thank you for this kind clarification. This is very helpful at better understanding your point. I cannot agree more on your point that tightly coupling multiple components makes things compelx. Let me emphasize your points from other side, too. This doesn't mean we should avoid using multiple components together. If the interface is well designed and being used correctly, using multiple components together rather reduce the complexity and maintenance burden. In the example, swap.c maintainer should easily know something in migrate.c that being used by swap.c is not working as documented or expected, and ask migrate.c maintainer to fix it. I'm trying to make DAMON be designed and used in such a way. I'm proposing this LSFMMBPF to help that by discussing in depth, including specific examples of current or potential DAMON usages and DAMON interfaces that not well designed for those. > > That doesn't mean we shouldn't consider doing this - it just means that > benefit needs to outweight the complexity/maintenance cost. I agree this too, of course :) [...] > This missing the scenario where DAMOS/DAMON is not suitible for > deployment in someone's environment. I understand that you are saying a scenario that deploying out-of-kernel components such as DAMON user-space tool is impossible, while those are essential for a given usage. And I agree that such case can be in real. > The kernel should still do > *something*. > > And that is kind of the point - we can expose more complexity to the > users with DAMON, but the kernel should be able to do some reasonable > promotion action without this additional system. I understand that you mean using DAMON for promotion requires users controls using additional systems such as DAMON user-space tool (damo). That's correct, at least for today's DAMON usages for CXL memory tiering. HMSDK[1] is such an additional system. Nevertheless, that's not necessarily the case in future. DAMON aims to allow flexible custom usages, while also just transparently works fairly well. I shared the humble ambition at last year's LPC[2]. We will pursue the direction for memory tiering-purpose DAMON usage, too. [...] > > > but I > > > also see value is making tasks handle promotion as a form of fairness. > > > > I agree that could be good in terms of fairness. I want to learn more about > > the significance of it, though. > > > > Fairness in this scenario is simple. > > If one task is causing an outsizes number of promotions to occur, and it > causes some ASYNC system to handle those promotions, it is effectively > acquiring more CPU time via that ASYNC system than other residents. > > Trying to charge this time back to the noisey task is harder than just > having the task incur the cost of migration. But doing it inline can > cause the task to slow down. > > So it's difficult to predict how it's going to pan out. Need evidence. Yes, I agree that we need more data to say more about this topic. Nonetheless, I understand you are saying that's something better to have in future, and need to aware of its potential risk, not a strict blocker of async approach exploration. > > > I agree. DAMON allows combining multiple different mechanisms with its core > > logic, so I beleive it migt be a place that can aggregate the different > > identification mechanisms. > > > > DAMON's access monitoring results based system operations feature, namely > > DAMOS, also has its own aggressiveness control logic, and resides in the core > > layer, so could be used consistently with different promotion candidates > > identification mechanisms. > > > > Without data this is a nice thought, but we have existing mechanisms > that work and can be improved - lets not disrupt that. Cannot agree more. My intention is not to disrubpt that but ensuring people who looking into such improvments are on the same page regarding available current and future options. > > Finding an aggregated promotion solution helps everyone move forward > without disrupting development in these areas (and makes the different > indentification mechanisms play nice with each other). > > Trying to also create a voltron "one indentification system to rule them > all" is a nice thought, but it's heavy-weight compared to adding a folio > flag check and a call to mpol_migrate_misplaced(). We need to respect > that reality and not regress the existing mechanisms by trying to > over-engineer a generalized solution. 100% agreed. This point is, and should, always be in DAMON hackers' mind. Thank you for kindly clarifying your points and nice advice :) [1] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion [2] https://lpc.events/event/18/contributions/1768/ Thanks, SJ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-14 3:06 ` Gregory Price 2025-01-24 2:11 ` SeongJae Park @ 2025-01-30 2:15 ` Yuanchu Xie 2025-01-30 3:47 ` SeongJae Park 1 sibling, 1 reply; 14+ messages in thread From: Yuanchu Xie @ 2025-01-30 2:15 UTC (permalink / raw) To: Gregory Price Cc: SeongJae Park, lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim On Mon, Jan 13, 2025 at 7:06 PM Gregory Price <gourry@gourry.net> wrote: > 5) Scarce resources > > We need to be careful not to consume excessive amounts of resources > in an attempt to track all these identifying mechanisms. Even 1 byte > per folio is 256MB on a 1TB machine. This gets out of hand quick. > > With task-work, I was able to add no additional resource consumption, > but deferring to a fully async scenario and needing to track things > like last-accessing CPU, timestamps, and etc. > > We'll need to examine this closely if we decide to aggregate either > of these mechanisms. My concern with physical address space monitoring is fragmentation. I ran some numbers on a few prod machines. Grouping by regions with the same memcg and ignoring any unmapped memory to be generous, machines with higher utilization can have a region/total pages ratio of ~40%, and even those with lower utilization (<50%) can also reach 20%. Accurately tracking these regions would require quite the region metadata, on the order of GBs. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-30 2:15 ` Yuanchu Xie @ 2025-01-30 3:47 ` SeongJae Park 2025-01-31 10:05 ` Jonathan Cameron 0 siblings, 1 reply; 14+ messages in thread From: SeongJae Park @ 2025-01-30 3:47 UTC (permalink / raw) To: Yuanchu Xie Cc: SeongJae Park, Gregory Price, lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim Hi Yuanchu, On Wed, 29 Jan 2025 18:15:08 -0800 Yuanchu Xie <yuanchu@google.com> wrote: > On Mon, Jan 13, 2025 at 7:06 PM Gregory Price <gourry@gourry.net> wrote: > > 5) Scarce resources > > > > We need to be careful not to consume excessive amounts of resources > > in an attempt to track all these identifying mechanisms. Even 1 byte > > per folio is 256MB on a 1TB machine. This gets out of hand quick. > > > > With task-work, I was able to add no additional resource consumption, > > but deferring to a fully async scenario and needing to track things > > like last-accessing CPU, timestamps, and etc. > > > > We'll need to examine this closely if we decide to aggregate either > > of these mechanisms. > My concern with physical address space monitoring is fragmentation. I > ran some numbers on a few prod machines. Grouping by regions with the > same memcg and ignoring any unmapped memory to be generous, machines > with higher utilization can have a region/total pages ratio of ~40%, > and even those with lower utilization (<50%) can also reach 20%. > Accurately tracking these regions would require quite the region > metadata, on the order of GBs. You're right, if we need page level accuracy access monitoring and want to use DAMON with its regions based mechanism for that, the memory overhead of damon_region could be high. That's mainly because DAMON's regions-based mechanism has not designed for such usage. It is more for a best-effort tradeoff between the overhead and the accuracy. Regions-based mechanism is not necessarily the only mechanism of future DAMON, though. If there are use cases that regions-based best-effort accuracy cannot be used while exactly the page level accuracy is really required, we can think about optimizing regions based mechanism or developing new one. But, IMHO, the page level accurate access pattern is not always essential. In many cases, being able to distinguish some amount of regions agains others based on access pattern is practical enough. Indeed, DAMON has been used on real-world products with physical address based moitoring mode for years with no significant problem. Also I think physical address space based monitoring results[1] on a real server workload that I shared recently seems not very bad. Of course your use case could be different from what I have experienced so far. I'm curious if and why you really need page level accuracy. [1] https://lore.kernel.org/20250110185232.54907-3-sj@kernel.org Thanks, SJ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-30 3:47 ` SeongJae Park @ 2025-01-31 10:05 ` Jonathan Cameron 0 siblings, 0 replies; 14+ messages in thread From: Jonathan Cameron @ 2025-01-31 10:05 UTC (permalink / raw) To: SeongJae Park Cc: Yuanchu Xie, Gregory Price, lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Kaiyang Zhao, Jiaming Yan, Honggyu Kim On Wed, 29 Jan 2025 19:47:49 -0800 SeongJae Park <sj@kernel.org> wrote: > Hi Yuanchu, > > On Wed, 29 Jan 2025 18:15:08 -0800 Yuanchu Xie <yuanchu@google.com> wrote: > > > On Mon, Jan 13, 2025 at 7:06 PM Gregory Price <gourry@gourry.net> wrote: > > > 5) Scarce resources > > > > > > We need to be careful not to consume excessive amounts of resources > > > in an attempt to track all these identifying mechanisms. Even 1 byte > > > per folio is 256MB on a 1TB machine. This gets out of hand quick. > > > > > > With task-work, I was able to add no additional resource consumption, > > > but deferring to a fully async scenario and needing to track things > > > like last-accessing CPU, timestamps, and etc. > > > > > > We'll need to examine this closely if we decide to aggregate either > > > of these mechanisms. > > My concern with physical address space monitoring is fragmentation. I > > ran some numbers on a few prod machines. Grouping by regions with the > > same memcg and ignoring any unmapped memory to be generous, machines > > with higher utilization can have a region/total pages ratio of ~40%, > > and even those with lower utilization (<50%) can also reach 20%. > > Accurately tracking these regions would require quite the region > > metadata, on the order of GBs. I'd second this. Some cases are reasonably well behaved and regions 'kind of work' for PA based tracking some very much not. Add anything like overcommitted VMs on top and contiguity of 'hotness' beyond very small regions goes out the window very quickly (unfortunately I'm not able to share specific data). So there are definitely cases where I'd expect something else to be needed. There are a plenty of approximate tracking methods in the literature that might be good enough with much lower overhead than precise tracking (sketches etc) if we can feed them the right data. Typically we don't need the answer on how hot all memory is, just some info on 'this lot are particularly hot' and 'this lot are reasonably' cold. Damon (as it currently stands) can sometimes give this info so to me it's a possible producer of data for another layer that focuses on abstracting the data to what we want only. Hopefully we can make that work for all the forms of tracking temperature that people are looking at. I'm biased in favor of hardware units but no everyone will have those toys available for a while yet :) > > You're right, if we need page level accuracy access monitoring and want to use > DAMON with its regions based mechanism for that, the memory overhead of > damon_region could be high. That's mainly because DAMON's regions-based > mechanism has not designed for such usage. It is more for a best-effort > tradeoff between the overhead and the accuracy. > > Regions-based mechanism is not necessarily the only mechanism of future DAMON, > though. If there are use cases that regions-based best-effort accuracy cannot > be used while exactly the page level accuracy is really required, we can think > about optimizing regions based mechanism or developing new one. > > But, IMHO, the page level accurate access pattern is not always essential. In > many cases, being able to distinguish some amount of regions agains others > based on access pattern is practical enough. Indeed, DAMON has been used on > real-world products with physical address based moitoring mode for years with > no significant problem. Also I think physical address space based monitoring > results[1] on a real server workload that I shared recently seems not very bad. > > Of course your use case could be different from what I have experienced so far. > I'm curious if and why you really need page level accuracy. > > [1] https://lore.kernel.org/20250110185232.54907-3-sj@kernel.org > > > Thanks, > SJ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-01 22:20 [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future SeongJae Park 2025-01-02 4:09 ` Matthew Wilcox 2025-01-14 3:06 ` Gregory Price @ 2025-01-20 18:46 ` Jonathan Cameron 2025-03-25 21:01 ` SeongJae Park 3 siblings, 0 replies; 14+ messages in thread From: Jonathan Cameron @ 2025-01-20 18:46 UTC (permalink / raw) To: SeongJae Park Cc: lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie, Gregory Price, Kaiyang Zhao, Jiaming Yan, Honggyu Kim On Wed, 1 Jan 2025 14:20:39 -0800 SeongJae Park <sj@kernel.org> wrote: > Hi all, > > > I find a few interesting and promising projects that aim to do efficient access > pattern-aware memory management of near future, including below (alphabetically > sorted). > Hi SJ, > - CXL hotness monitoring unit > (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com) For hardware hotness monitors the type of data has relatively little connection to what I understand Damon provides and the control schemes are somewhat different. Hotness tracking units should provide a simple list of hot fixed size granuals (hot 'pages') to whatever is using the hotness engine. Damon and other in kernel schemes might also be able to provide such outputs, but the underlying schemes seem very different as the outputs of these trackers neither map to Damon regions, or to dense sets of page counters. So to me the commonality looks to be one layer up: We get lists of stuff to consider moving and control paths to whatever is providing those lists to indicate: * More or fewer suggestions please (bandwidth controls etc) * Minimum 'hotness' below which it should not suggest moving them. For CXL Hotness monitoring units, there are open questions about how to get good data given a limited resources likely to be found on devices. Simplest sense can be thought of as a fixed set of counters, but typically it will be more complex than that with statistical accuracy tradeoffs rather than did we count it or not. We need to do some work to find out what works best across many workloads considering options (depending on hardware capabilities) such as a) coarse to fine b) random subsampling of 256MiB chunks of PA space. c) scanning across PA space looking at a smallish region (16Gig maybe) at a time. Also need to be flexible to use multiple parallel trackers if available on a given device or time slices on a single tracker. I'm not yet seeing enough different engines to figure out if there is commonality in that control scheme between CXL style interfaces and those that we may see from other places etc. If anyone is in a position to share info on other hotness monitoring offloaded units that are targeting real products + their interfaces that would be great. For now I think we are going to end up with something specific in the CXL HMU driver with the rest of the kernel just seeing a list of 'hot PA address chunks / pages in PA space'. Given we will need a virtualized solution as well for guests that are running on a fixed mix of tiers, I'd expect a "virtio-hotness" or similar that only provides these sorts of generalized controls leaving the host to figure out how to control the particular hotness trackers. The controls to that would be inline with what I'd expect to be exposed to other layers of the kernel from a given hotness tracker. For me it feels like we are a bit early wrt to hardware trackers to come to firm conclusions, but perhaps others are further ahead with answering some of the precursor questions. I am keen that we don't end up with a solution that doesn't work with them so this discussion if of interest to me. > - Memory tiering fainess by per-cgroup control of promotion and demotion > (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu) > - Promotion of unmapped page cache folios > (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net) > - Slow-tier page promotion based on PTE A bit > (https://lore.kernel.org/20241201153818.2633616-1-raghavendra.kt@amd.com) > - Workingset reporting > (https://lore.kernel.org/20241127025728.3689245-1-yuanchu@google.com) > > The goal of DAMON is to help accelerating such developments by being a > framework that can reduce fundamental efforts for monitoring memory access > patterns and managing memory using the information. AWS Aurora Serverless v2 > and SK hynix are successfully using DAMON in the way for proactive memory > reclamation[1] and CXL memory tiering[2]. > > To further deliver such benefits for the ongoing and future projects, we need > to better understand what the projects really need, how DAMON can provide those > now or in future, and if there are alternatives better than DAMON. Regardless > of the conclusion about DAMON, the works apparently have common parts, so the > discussion will benefit all. > > I propose to have the discussion at LSF/MM/BPF. In the session, I will briefly > introduce the works and possible DAMON usages, and continue the open discussion > for better understanding each other. The discussion will not be limited to > DAMON and abovely mentioned projects but possible alternatives and general > access-aware memory management projects. After the discussion, we will > hopefully find ways to efficiently collaborate, or at least do not disturb each > other. I like that last comment :) Jonathan > > [1] https://assets.amazon.science/ee/a4/41ff11374f2f865e5e24de11bd17/resource-management-in-aurora-serverless.pdf > [2] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion > > > Thanks, > SJ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future 2025-01-01 22:20 [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future SeongJae Park ` (2 preceding siblings ...) 2025-01-20 18:46 ` Jonathan Cameron @ 2025-03-25 21:01 ` SeongJae Park 3 siblings, 0 replies; 14+ messages in thread From: SeongJae Park @ 2025-03-25 21:01 UTC (permalink / raw) To: SeongJae Park Cc: lsf-pc, damon, linux-mm, linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Gregory Price, Kaiyang Zhao, Jiaming Yan, Honggyu Kim Hello, On Wed, 1 Jan 2025 14:20:39 -0800 SeongJae Park <sj@kernel.org> wrote: > Hi all, > > > I find a few interesting and promising projects that aim to do efficient access > pattern-aware memory management of near future, including below (alphabetically > sorted). > > - CXL hotness monitoring unit > (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com) > - Memory tiering fainess by per-cgroup control of promotion and demotion > (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu) > - Promotion of unmapped page cache folios > (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net) > - Slow-tier page promotion based on PTE A bit > (https://lore.kernel.org/20241201153818.2633616-1-raghavendra.kt@amd.com) > - Workingset reporting > (https://lore.kernel.org/20241127025728.3689245-1-yuanchu@google.com) > > The goal of DAMON is to help accelerating such developments by being a > framework that can reduce fundamental efforts for monitoring memory access > patterns and managing memory using the information. AWS Aurora Serverless v2 > and SK hynix are successfully using DAMON in the way for proactive memory > reclamation[1] and CXL memory tiering[2]. > > To further deliver such benefits for the ongoing and future projects, we need > to better understand what the projects really need, how DAMON can provide those > now or in future, and if there are alternatives better than DAMON. Regardless > of the conclusion about DAMON, the works apparently have common parts, so the > discussion will benefit all. > > I propose to have the discussion at LSF/MM/BPF. In the session, I will briefly > introduce the works and possible DAMON usages, and continue the open discussion > for better understanding each other. The discussion will not be limited to > DAMON and abovely mentioned projects but possible alternatives and general > access-aware memory management projects. After the discussion, we will > hopefully find ways to efficiently collaborate, or at least do not disturb each > other. > > [1] https://assets.amazon.science/ee/a4/41ff11374f2f865e5e24de11bd17/resource-management-in-aurora-serverless.pdf > [2] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion A draft of the slides for this session is now available at https://github.com/damonitor/talks/blob/master/2025/lsfmmbpf/damon_requirements_lsfmmbpf_2025.pdf I may make more last time changes to the slides, but the final version should also be available on the same URL. Thanks, SJ [...] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-03-25 21:01 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-01-01 22:20 [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future SeongJae Park 2025-01-02 4:09 ` Matthew Wilcox 2025-01-02 15:22 ` Gregory Price 2025-01-02 18:00 ` SeongJae Park 2025-01-02 18:04 ` SeongJae Park 2025-01-14 3:06 ` Gregory Price 2025-01-24 2:11 ` SeongJae Park 2025-01-24 17:21 ` Gregory Price 2025-01-25 1:17 ` SeongJae Park 2025-01-30 2:15 ` Yuanchu Xie 2025-01-30 3:47 ` SeongJae Park 2025-01-31 10:05 ` Jonathan Cameron 2025-01-20 18:46 ` Jonathan Cameron 2025-03-25 21:01 ` SeongJae Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox