[LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
@ 2025-01-01 22:20 SeongJae Park
  2025-01-02  4:09 ` Matthew Wilcox
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: SeongJae Park @ 2025-01-01 22:20 UTC (permalink / raw)
  To: lsf-pc
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, kernel-team,
	Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Gregory Price,
	Kaiyang Zhao, Jiaming Yan, Honggyu Kim

Hi all,

I find a few interesting and promising projects that aim to do efficient access
pattern-aware memory management of near future, including below (alphabetically
sorted).

- CXL hotness monitoring unit
  (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com)
- Memory tiering fainess by per-cgroup control of promotion and demotion
  (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu)
- Promotion of unmapped page cache folios
  (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)
- Slow-tier page promotion based on PTE A bit
  (https://lore.kernel.org/20241201153818.2633616-1-raghavendra.kt@amd.com)
- Workingset reporting
  (https://lore.kernel.org/20241127025728.3689245-1-yuanchu@google.com)

The goal of DAMON is to help accelerating such developments by being a
framework that can reduce fundamental efforts for monitoring memory access
patterns and managing memory using the information.  AWS Aurora Serverless v2
and SK hynix are successfully using DAMON in the way for proactive memory
reclamation[1] and CXL memory tiering[2].

To further deliver such benefits for the ongoing and future projects, we need
to better understand what the projects really need, how DAMON can provide those
now or in future, and if there are alternatives better than DAMON.  Regardless
of the conclusion about DAMON, the works apparently have common parts, so the
discussion will benefit all.

I propose to have the discussion at LSF/MM/BPF.  In the session, I will briefly
introduce the works and possible DAMON usages, and continue the open discussion
for better understanding each other.  The discussion will not be limited to
DAMON and abovely mentioned projects but possible alternatives and general
access-aware memory management projects.  After the discussion, we will
hopefully find ways to efficiently collaborate, or at least do not disturb each
other.

[1] https://assets.amazon.science/ee/a4/41ff11374f2f865e5e24de11bd17/resource-management-in-aurora-serverless.pdf
[2] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion

Thanks,
SJ

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-01 22:20 [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future SeongJae Park
@ 2025-01-02  4:09 ` Matthew Wilcox
  2025-01-02 15:22   ` Gregory Price
  2025-01-14  3:06 ` Gregory Price
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2025-01-02  4:09 UTC (permalink / raw)
  To: SeongJae Park
  Cc: lsf-pc, damon, linux-mm, linux-kernel, kernel-team,
	Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Gregory Price,
	Kaiyang Zhao, Jiaming Yan, Honggyu Kim

On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote:
> Hi all,
> 
> 
> I find a few interesting and promising projects that aim to do efficient access
> pattern-aware memory management of near future, including below (alphabetically
> sorted).
> 
> - CXL hotness monitoring unit
>   (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com)
> - Memory tiering fainess by per-cgroup control of promotion and demotion
>   (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu)
> - Promotion of unmapped page cache folios
>   (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)

I'm not sure how DAMON can help with this one.  As I understand DAMON,
it monitors accesses to user addresses.  This patchset is trying to solve
the problem for file pages which aren't mapped to userspace at all.
ie only accessed through read() and write().


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-02  4:09 ` Matthew Wilcox
@ 2025-01-02 15:22   ` Gregory Price
  2025-01-02 18:00     ` SeongJae Park
  0 siblings, 1 reply; 14+ messages in thread
From: Gregory Price @ 2025-01-02 15:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: SeongJae Park, lsf-pc, damon, linux-mm, linux-kernel,
	kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron,
	Kaiyang Zhao, Jiaming Yan, Honggyu Kim

On Thu, Jan 02, 2025 at 04:09:38AM +0000, Matthew Wilcox wrote:
> On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote:
> > Hi all,
> > 
> > 
> > I find a few interesting and promising projects that aim to do efficient access
> > pattern-aware memory management of near future, including below (alphabetically
> > sorted).
> > 
> > - CXL hotness monitoring unit
> >   (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com)
> > - Memory tiering fainess by per-cgroup control of promotion and demotion
> >   (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu)
> > - Promotion of unmapped page cache folios
> >   (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)
> 
> I'm not sure how DAMON can help with this one.  As I understand DAMON,
> it monitors accesses to user addresses.  This patchset is trying to solve
> the problem for file pages which aren't mapped to userspace at all.
> ie only accessed through read() and write().

DAMON can monitor physical addresses to, though the mechanism is
different.  I haven't assessed this as a solution, yet.

~Gregory


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-02 15:22   ` Gregory Price
@ 2025-01-02 18:00     ` SeongJae Park
  2025-01-02 18:04       ` SeongJae Park
  0 siblings, 1 reply; 14+ messages in thread
From: SeongJae Park @ 2025-01-02 18:00 UTC (permalink / raw)
  To: Gregory Price
  Cc: SeongJae Park, Matthew Wilcox, lsf-pc, damon, linux-mm,
	linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie,
	Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim

On Thu, 2 Jan 2025 10:22:14 -0500 Gregory Price <gourry@gourry.net> wrote:

> On Thu, Jan 02, 2025 at 04:09:38AM +0000, Matthew Wilcox wrote:
> > On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote:
> > > Hi all,
> > > 
> > > 
> > > I find a few interesting and promising projects that aim to do efficient access
> > > pattern-aware memory management of near future, including below (alphabetically
> > > sorted).
> > > 
> > > - CXL hotness monitoring unit
> > >   (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com)
> > > - Memory tiering fainess by per-cgroup control of promotion and demotion
> > >   (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu)
> > > - Promotion of unmapped page cache folios
> > >   (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)
> > 
> > I'm not sure how DAMON can help with this one.  As I understand DAMON,
> > it monitors accesses to user addresses.  This patchset is trying to solve
> > the problem for file pages which aren't mapped to userspace at all.
> > ie only accessed through read() and write().
> 
> DAMON can monitor physical addresses to, though the mechanism is
> different.

Thank you for answering this, Gregory.  As Gregory explained, users can use
physical address monitoring mode of DAMON for this.  For unmapped pages, DAMON
sets and reads PG_idle to check if it is accessed or not.  Since PG_idle is
respected by read() and write() use case to my understanding, DAMON should be
able to check accesses to unmapped pages.

> I haven't assessed this as a solution, yet.

To quickly see this, I ran below simple test.

First, I start DAMON in physical address space monitoring mode, wait for one
minutes to let it monitor the accesses of the system, and show the access
pattern on the system in access temperature histogram format.

$ sudo ./damo start
$ sleep 60
$ sudo ./damo report access --style temperature-sz-hist
<temperature> <total size>
[-7,480,000,000, -7,479,999,999) 59.868 GiB |                    |
total size: 59.868 GiB

The access temperature histogram format shows size of memory of given access
temperature range.  Access temperature is a metric that represents the access
hotness.  If any access to the region is continuously found, the value
increases.  If no access to the region is found, the temperature becomes zero.
If it continues showing no access, the temperature further decreases (goes to
minus).  Refer to the document[1] for more details.

So from the above output, we can show all memory of the system is not accessed
at all for the last minute.

Now I start a program that continuously overwrites 10 GiB file in background.
Attaching the source code (Attachment 0, dd_like.c) at the bottom of this mail.

After a few seconds, I show the temperature histogram again.

$ sudo ./damo report access --style temperature-sz-hist
<temperature> <total size>
[-12,590,000,000, -11,699,000,000) 42.038 GiB |********************|
[-11,699,000,000, -10,808,000,000) 0 B        |                    |
[-10,808,000,000, -9,917,000,000)  0 B        |                    |
[-9,917,000,000, -9,026,000,000)   0 B        |                    |
[-9,026,000,000, -8,135,000,000)   5.986 GiB  |***                 |
[-8,135,000,000, -7,244,000,000)   0 B        |                    |
[-7,244,000,000, -6,353,000,000)   0 B        |                    |
[-6,353,000,000, -5,462,000,000)   0 B        |                    |
[-5,462,000,000, -4,571,000,000)   0 B        |                    |
[-4,571,000,000, -3,680,000,000)   5.951 GiB  |***                 |
[-3,680,000,000, -2,789,000,000)   5.893 GiB  |***                 |
total size: 59.868 GiB

We can show DAMON found about 10 GiB relatively hot regions.

This is a very simple test that not well tuned.  Maybe because of that, there
are details to investigate, including why the 10 GiB regions are having
different and negative access temperature.  I'll skip those for now, since the
point of this test is that DAMON at least somehow react to accesses for
unmapped pages.

[1] https://github.com/damonitor/damo/blob/next/USAGE.md#access-temperature


Thanks,
SJ

> 
> ~Gregory

==== Attachment 0 (dd_like.c) ====
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>

int main(int argc, char *argv[]) {
	int block_size, count;
	char *dest_path;

	if (argc != 4) {
		printf("Usage: ./dd_simulator <block_size> <count> <destination_file>\n");
		return 1;
	}

	block_size = atoi(argv[1]);
	count = atoi(argv[2]);
	dest_path = argv[3];

	// Validate input parameters
	if (block_size <= 0 || count <= 0) {
		fprintf(stderr, "Invalid block size or count\n");
		return 1;
	}

	// Write block size of zeroes 'count' times
	char *zeroes = calloc(block_size, sizeof(char));
	if (!zeroes) {
		fprintf(stderr, "Memory allocation failed\n");
		return 1;
	}

	while (1) {
		// Open destination file in write mode
		FILE *dest_file = fopen(dest_path, "w");
		if (!dest_file) {
			fprintf(stderr, "Failed to open %s for writing: %s\n", dest_path, strerror(errno));
			return 1;
		}

		// Write block size of zeroes 'count' times
		for (int i = 0; i < count; i++)
			fwrite(zeroes, block_size, 1, dest_file);

		fclose(dest_file);
		printf("one pass");
	}
	free(zeroes);
	return 0;
}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-02 18:00     ` SeongJae Park
@ 2025-01-02 18:04       ` SeongJae Park
  0 siblings, 0 replies; 14+ messages in thread
From: SeongJae Park @ 2025-01-02 18:04 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Gregory Price, Matthew Wilcox, lsf-pc, damon, linux-mm,
	linux-kernel, kernel-team, Raghavendra K T, Yuanchu Xie,
	Jonathan Cameron, Kaiyang Zhao, Jiaming Yan, Honggyu Kim

On Thu, 2 Jan 2025 10:00:19 -0800 SeongJae Park <sj@kernel.org> wrote:

> On Thu, 2 Jan 2025 10:22:14 -0500 Gregory Price <gourry@gourry.net> wrote:
> 
> > On Thu, Jan 02, 2025 at 04:09:38AM +0000, Matthew Wilcox wrote:
> > > On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote:
> > > > Hi all,
> > > > 
> > > > 
> > > > I find a few interesting and promising projects that aim to do efficient access
> > > > pattern-aware memory management of near future, including below (alphabetically
> > > > sorted).
> > > > 
> > > > - CXL hotness monitoring unit
> > > >   (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com)
> > > > - Memory tiering fainess by per-cgroup control of promotion and demotion
> > > >   (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu)
> > > > - Promotion of unmapped page cache folios
> > > >   (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)
> > > 
> > > I'm not sure how DAMON can help with this one.  As I understand DAMON,
> > > it monitors accesses to user addresses.  This patchset is trying to solve
> > > the problem for file pages which aren't mapped to userspace at all.
> > > ie only accessed through read() and write().
> > 
> > DAMON can monitor physical addresses to, though the mechanism is
> > different.
> 
> Thank you for answering this, Gregory.  As Gregory explained, users can use
> physical address monitoring mode of DAMON for this.  For unmapped pages, DAMON
> sets and reads PG_idle to check if it is accessed or not.  Since PG_idle is
> respected by read() and write() use case to my understanding, DAMON should be
> able to check accesses to unmapped pages.
> 
> > I haven't assessed this as a solution, yet.
> 
> To quickly see this, I ran below simple test.
[...]
> the point of this test is that DAMON at least somehow react to accesses for
> unmapped pages.

Forgot clarifying this point, sorry.  My test shows DAMON can detect accesses
to unmapped pages, but not asseses if it is feasible as the unmapped pages
promotion solution.  More works and discussions would be needed for that.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-01 22:20 [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future SeongJae Park
  2025-01-02  4:09 ` Matthew Wilcox
@ 2025-01-14  3:06 ` Gregory Price
  2025-01-24  2:11   ` SeongJae Park
  2025-01-30  2:15   ` Yuanchu Xie
  2025-01-20 18:46 ` Jonathan Cameron
  2025-03-25 21:01 ` SeongJae Park
  3 siblings, 2 replies; 14+ messages in thread
From: Gregory Price @ 2025-01-14  3:06 UTC (permalink / raw)
  To: SeongJae Park
  Cc: lsf-pc, damon, linux-mm, linux-kernel, kernel-team,
	Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Kaiyang Zhao,
	Jiaming Yan, Honggyu Kim

On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote:
> Hi all,
> 
> 
> I find a few interesting and promising projects that aim to do efficient access
> pattern-aware memory management of near future, including below (alphabetically
> sorted).
> 
> - Promotion of unmapped page cache folios
>   (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)

I'll break down a few observations I made while hacking on unmapped
page cache promotion - and my concerns for a leveraging DAMON here.

Additionally some other concerns I've seen raised about duplicating
promotion logic across various kernel components.

Latest RFC:
https://lore.kernel.org/linux-mm/20250107000346.1338481-1-gourry@gourry.net/

Basic Premise:
   Use folio_mark_accessed() as a measure of hotness for promotion.
   Defer promotion to task_work due to locking complexities.

My major concerns / lessons learned from this exercise include:

1) The cost of checking promotion candidacy can be problematic

   In my microbenchmark in the last RFC version, I showed that while
   the performance upside (~22-25%) is substantial, there was a
   non-trivial cost associated with injecting even a single global
   boolean check in the file_read() path.  This was unexpected.

   I can probably optimize the disabled case with a likely() clause,
   but I did not expect such sensitivity.  This tells me injecting
   an unconditional call into DAMON may be too much overhead. 

   I would need to explore this further - including whether it is
   feasible to inject such a large dependency into swap.c

   This may not affect all cases, but it does affect at least this one.

2) The complexity of "when it is safe" to promote a folio is subtle
   at best, and "actively hostile" at worst.

   I learned in v1 of the RFC that promotion inline with fma() is not
   feasible due to a few contexts (task dying in particular) in which
   migration is not safe.  I deferred to task work because I noticed
   prior attempts (in development notes) had seen similar issues.

   Adding a folio reference and/or page flag to defer that migration to
   another context (i.g. async kthread) solves this at the expensive of
   implementation complexity. (leaked folios if done wrong)

   I'd have to look at whether it's worth the increased complexity to
   aggregate this (particular) identification mechanism - but I think
   there is clear value to aggregating promotion.

   I could see some value in pumping tracking bits into DAMON - but I
   also see value is making tasks handle promotion as a form of fairness.

3) There were expressed opinions on runtime fairness WRT to promotion.

   There's two competing thoughts:
   A) Making accessing tasks eat inline promotion cost captures that
      cost in their runtime slice, promoting fairness in scheduling.

   B) Aggregating promotion to an external thread can reduce inline
      faults and tail latencies, but may hides per-task cost. This
      is a concern if one task drives all the promotions, effectingly
      stealing an entire core by nature of the async design.

   I don't have a good answer to this, just an observation that charging
   promotion time to the identifying task was a concern that was raised.

4) TPP and Unmapped Page Promotion may affect each other.

   There is a rate-limiting mechanism in the migration path that was
   intended to prevent over-pressuring bandwidth with aggressive
   migrations - prevent major memory stalls.

   By adding more pressure on this limit from an additional source,
   we're obviously increasing the time it takes to converge.

   This is probably the greatest argument for creating a new, aggregated
   promotion mechanism to serve all of these identification mechanism.

   This would make it easier for us to determine whether/what
   identification mechanisms can be aggregated while enabling forward
   progress on each of them separately.

5) Scarce resources

   We need to be careful not to consume excessive amounts of resources
   in an attempt to track all these identifying mechanisms.  Even 1 byte
   per folio is 256MB on a 1TB machine.  This gets out of hand quick.

   With task-work, I was able to add no additional resource consumption,
   but deferring to a fully async scenario and needing to track things
   like last-accessing CPU, timestamps, and etc.

   We'll need to examine this closely if we decide to aggregate either
   of these mechanisms.

~Gregory

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-14  3:06 ` Gregory Price
@ 2025-01-24  2:11   ` SeongJae Park
  2025-01-24 17:21     ` Gregory Price
  2025-01-30  2:15   ` Yuanchu Xie
  1 sibling, 1 reply; 14+ messages in thread
From: SeongJae Park @ 2025-01-24  2:11 UTC (permalink / raw)
  To: Gregory Price
  Cc: SeongJae Park, lsf-pc, damon, linux-mm, linux-kernel,
	kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron,
	Kaiyang Zhao, Jiaming Yan, Honggyu Kim

Hello Gregory,

On Mon, 13 Jan 2025 22:06:09 -0500 Gregory Price <gourry@gourry.net> wrote:

> On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote:
> > Hi all,
> > 
> > 
> > I find a few interesting and promising projects that aim to do efficient access
> > pattern-aware memory management of near future, including below (alphabetically
> > sorted).
> > 
> > - Promotion of unmapped page cache folios
> >   (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)
> 
> 
> I'll break down a few observations I made while hacking on unmapped
> page cache promotion - and my concerns for a leveraging DAMON here.

Thank you for sharing this!

> 
> Additionally some other concerns I've seen raised about duplicating
> promotion logic across various kernel components.
> 
> 
> Latest RFC:
> https://lore.kernel.org/linux-mm/20250107000346.1338481-1-gourry@gourry.net/
> 
> Basic Premise:
>    Use folio_mark_accessed() as a measure of hotness for promotion.
>    Defer promotion to task_work due to locking complexities.
> 
> My major concerns / lessons learned from this exercise include:
> 
> 1) The cost of checking promotion candidacy can be problematic
> 
>    In my microbenchmark in the last RFC version, I showed that while
>    the performance upside (~22-25%) is substantial, there was a
>    non-trivial cost associated with injecting even a single global
>    boolean check in the file_read() path.  This was unexpected.
> 
>    I can probably optimize the disabled case with a likely() clause,
>    but I did not expect such sensitivity.  This tells me injecting
>    an unconditional call into DAMON may be too much overhead. 

I cannot agree more with you about the point that the mechanism for finding the
promotion/demotion (and any access-aware system operation) candidates should
induce only modest or at least controllable overhead.  Actually it was the one
of biggest motivations of DAMON design, and I haven't imagined adding
unconditional calls to DAMON here.

Nonetheless, injecting an unconditional call here should be avoided for not
only DAMON calls but any expensive calls?  I'm also not pretty sure what DAMON
call you are thinking about.

> 
>    I would need to explore this further - including whether it is
>    feasible to inject such a large dependency into swap.c

I understand DAMON is not small in terms of the code size, and has many
limitations that makes it unusable in many use cases.  But, again, I'm not
pretty sure what kind of DAMON usage in swap.c you're thinking about, and
therefore not easy to understyand what part of DAMON is considered as a large
dependency that concerns you.  It would be great if we can make more concrete
example as a result of this topic session at LSFMMBPF.

FYI, I also not having specific idea for helping unmapped pages promotion for
now.  That's my assignment that I will do by LSFMMBPF.  But, a few things that
I naively thinking DAMON might be able to help unmapped promotions are,

1. Using DAMON for profiling how much hot and cold unmapped pages are in which
   tier, and use the information for unmapped pages promotion optimization.
2. Using DAMOS to target-promote hot unmapped pages while using page
   faults-based promotion for mapped pages.
3. Using DAMOS to promote both mapped and unmapped hot pages.

For the first and second ideas, DAMON need to target unmapped pages.  I think
DAMOS filters can be extended for that, and I posted an RFC before:
https://lore.kernel.org/20241127205624.86986-1-sj@kernel.org

Using the RFC-applied kernel and a version of DAMON user-space tool that adds
the support, idea one could be done like below.

    $ sudo ./damo report access --snapshot_damos_filter reject none unmapped --style recency-sz-hist
    # damos filters (df): reject none unmapped
    <last accessed time (us)> <df-passed size>
    [-36.300 s, -32.670 s)   10.297 MiB |*                   |
    [-32.670 s, -29.040 s)   7.297 MiB  |*                   |
    [-29.040 s, -25.410 s)   0 B        |                    |
    [-25.410 s, -21.780 s)   0 B        |                    |
    [-21.780 s, -18.150 s)   0 B        |                    |
    [-18.150 s, -14.520 s)   0 B        |                    |
    [-14.520 s, -10.890 s)   0 B        |                    |
    [-10.890 s, -7.260 s)    0 B        |                    |
    [-7.260 s, -3.630 s)     3.088 GiB  |********************|
    [-3.630 s, -0 ns)        80.000 KiB |*                   |
    [-0 ns, --3630000000 ns) 16.000 KiB |*                   |

    <last accessed time (us)> <total size>
    [-36.300 s, -32.670 s)   24.493 GiB  |********************|
    [-32.670 s, -29.040 s)   5.869 GiB   |*****               |
    [-29.040 s, -25.410 s)   5.568 GiB   |*****               |
    [-25.410 s, -21.780 s)   0 B         |                    |
    [-21.780 s, -18.150 s)   5.899 GiB   |*****               |
    [-18.150 s, -14.520 s)   5.807 GiB   |*****               |
    [-14.520 s, -10.890 s)   0 B         |                    |
    [-10.890 s, -7.260 s)    0 B         |                    |
    [-7.260 s, -3.630 s)     12.231 GiB  |**********          |
    [-3.630 s, -0 ns)        356.000 KiB |*                   |
    [-0 ns, --3630000000 ns) 396.000 KiB |*                   |
    total size: 59.868 GiB

The above output was retrieved while a kernel build is running in background,
and says among 24.493 GiB cold memory that last accessed more than 32.67
seconds before, 10.297 MiB are unmapped pages.

For the third idea, whether and how to collaborate with page faults-based
promotion of mapped pages could be something to discuss.  Some ideas off the my
head is that we can simply make them exclusive, or use DAMOS for proactive
promotion under peaceful situation, but uses page faults based promotion for
more urgent situation, somewhat like kswapd and direct reclaims.

For all three ideas, DAMON will do the monitoring and promotions on DAMON
thread, so no change to swap.c or file io path would be required.

Again, these are just not-yet-settled brainstorming level ideas, and I will try
to make these more specific and settled by LSFMMBPF.  Please feel free to add
comments on this thread rather than waiting for LSFMMBPF, though!

> 
>    This may not affect all cases, but it does affect at least this one.
> 
> 2) The complexity of "when it is safe" to promote a folio is subtle
>    at best, and "actively hostile" at worst.
> 
>    I learned in v1 of the RFC that promotion inline with fma() is not
>    feasible due to a few contexts (task dying in particular) in which
>    migration is not safe.  I deferred to task work because I noticed
>    prior attempts (in development notes) had seen similar issues.
> 
>    Adding a folio reference and/or page flag to defer that migration to
>    another context (i.g. async kthread) solves this at the expensive of
>    implementation complexity. (leaked folios if done wrong)
> 
>    I'd have to look at whether it's worth the increased complexity to
>    aggregate this (particular) identification mechanism - but I think
>    there is clear value to aggregating promotion.
> 
>    I could see some value in pumping tracking bits into DAMON -

I agree to all the points and willing to make DAMON well serve the purpose.

>    but I
>    also see value is making tasks handle promotion as a form of fairness.

I agree that could be good in terms of fairness.  I want to learn more about
the significance of it, though.

> 
> 3) There were expressed opinions on runtime fairness WRT to promotion.
> 
>    There's two competing thoughts:
>    A) Making accessing tasks eat inline promotion cost captures that
>       cost in their runtime slice, promoting fairness in scheduling.
> 
>    B) Aggregating promotion to an external thread can reduce inline
>       faults and tail latencies, but may hides per-task cost. This
>       is a concern if one task drives all the promotions, effectingly
>       stealing an entire core by nature of the async design.
> 
>    I don't have a good answer to this, just an observation that charging
>    promotion time to the identifying task was a concern that was raised.

I think we might be able to pursue two ways in parallel?  Using asynchronous
external thread in more peaceful situation, and let tasks do inline promotion
with fairness under more urgent situation, like kswapd and direct reclaims.

DAMON may fit well for the proactive solutions under less urgent situation.
DAMON_RECLAIM was made in the direction, and working without significant issues
on products for years.

> 
> 
> 4) TPP and Unmapped Page Promotion may affect each other.
> 
>    There is a rate-limiting mechanism in the migration path that was
>    intended to prevent over-pressuring bandwidth with aggressive
>    migrations - prevent major memory stalls.
> 
>    By adding more pressure on this limit from an additional source,
>    we're obviously increasing the time it takes to converge.
> 
>    This is probably the greatest argument for creating a new, aggregated
>    promotion mechanism to serve all of these identification mechanism.
> 
>    This would make it easier for us to determine whether/what
>    identification mechanisms can be aggregated while enabling forward
>    progress on each of them separately.

I agree.  DAMON allows combining multiple different mechanisms with its core
logic, so I beleive it migt be a place that can aggregate the different
identification mechanisms.

DAMON's access monitoring results based system operations feature, namely
DAMOS, also has its own aggressiveness control logic, and resides in the core
layer, so could be used consistently with different promotion candidates
identification mechanisms.

> 
> 5) Scarce resources
> 
>    We need to be careful not to consume excessive amounts of resources
>    in an attempt to track all these identifying mechanisms.  Even 1 byte
>    per folio is 256MB on a 1TB machine.  This gets out of hand quick.
> 
>    With task-work, I was able to add no additional resource consumption,
>    but deferring to a fully async scenario and needing to track things
>    like last-accessing CPU, timestamps, and etc.
> 
>    We'll need to examine this closely if we decide to aggregate either
>    of these mechanisms.

Agreed again.  In case of DAMON, it tries to keep the resources in its own data
structure.  The resource consumption with the own data structure can also be
problematic, but it at least allows setting the upper-bound, regardless of the
system size.  So it is controllable and scalable.

I wish to continue more detailed discussions on LSFMMBPF and this thread!

Thank you again sharing your experiences and thoughts on this topic.  I show
those are making the discussion much more informative and helpful.

Thanks,
SJ

> 
> ~Gregory

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-24  2:11   ` SeongJae Park
@ 2025-01-24 17:21     ` Gregory Price
  2025-01-25  1:17       ` SeongJae Park
  0 siblings, 1 reply; 14+ messages in thread
From: Gregory Price @ 2025-01-24 17:21 UTC (permalink / raw)
  To: SeongJae Park
  Cc: lsf-pc, damon, linux-mm, linux-kernel, kernel-team,
	Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Kaiyang Zhao,
	Jiaming Yan, Honggyu Kim

On Thu, Jan 23, 2025 at 06:11:53PM -0800, SeongJae Park wrote:
> Hello Gregory,
> 
> > 1) The cost of checking promotion candidacy can be problematic
> > 
> >    In my microbenchmark in the last RFC version, I showed that while
> >    the performance upside (~22-25%) is substantial, there was a
> >    non-trivial cost associated with injecting even a single global
> >    boolean check in the file_read() path.  This was unexpected.
> > 
> >    I can probably optimize the disabled case with a likely() clause,
> >    but I did not expect such sensitivity.  This tells me injecting
> >    an unconditional call into DAMON may be too much overhead. 
> 
> I cannot agree more with you about the point that the mechanism for finding the
> promotion/demotion (and any access-aware system operation) candidates should
> induce only modest or at least controllable overhead.  Actually it was the one
> of biggest motivations of DAMON design, and I haven't imagined adding
> unconditional calls to DAMON here.
> 
> Nonetheless, injecting an unconditional call here should be avoided for not
> only DAMON calls but any expensive calls?  I'm also not pretty sure what DAMON
> call you are thinking about.
> 

Just any call, DAMON or otherwise.  The explicit check injecting ~2-3%
overhead on my microbench was a simple

+       } else if (!folio_test_isolated(folio) &&
+                  (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&

If this is causing additional overhead, call me skeptical that trying
anything more complicated will turn out better.

> > 
> >    I would need to explore this further - including whether it is
> >    feasible to inject such a large dependency into swap.c
> 
> I understand DAMON is not small in terms of the code size, and has many
> limitations that makes it unusable in many use cases.  But, again, I'm not
> pretty sure what kind of DAMON usage in swap.c you're thinking about, and
> therefore not easy to understyand what part of DAMON is considered as a large
> dependency that concerns you.  It would be great if we can make more concrete
> example as a result of this topic session at LSFMMBPF.
> 

It's not a matter of code size - it's a matter of tightly coupling core
components of the kernel to extraneous ones.  Adding additional
dependencies between components increases overall system complexity and
makes it hard to reason about the behavior of the system.

For example, in the prior snippet:

+       } else if (!folio_test_isolated(folio) &&
+                  (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
+                  numa_pagecache_promotion_enabled) {
+               promotion_candidate(folio);

This amounts to

if (some condition && feature enabled)
	mark a folio as a candidate for promotion

The promotion_candidate() function is contained within migrate.c and
uses (mostly) migrate.c mechanisms (from a task_work).

All you need to understand the behavior is between swap.c and migrate.c.

If instead you aggregate this to DAMON, understanding the behavior of
swap can require you to understand what DAMON is actually doing with
this information.

Now you need to understand swap.c, migrate.c, AND DAMON.

It makes it more difficult to reason about the system when something
goes wrong. This increases the maintenance burden for maintainers (and
onboarding complexity for anyone new to the kernel, for that matter).

That doesn't mean we shouldn't consider doing this - it just means that
benefit needs to outweight the complexity/maintenance cost.

> FYI, I also not having specific idea for helping unmapped pages promotion for
> now.  That's my assignment that I will do by LSFMMBPF.  But, a few things that
> I naively thinking DAMON might be able to help unmapped promotions are,
> 
> 1. Using DAMON for profiling how much hot and cold unmapped pages are in which
>    tier, and use the information for unmapped pages promotion optimization.
> 2. Using DAMOS to target-promote hot unmapped pages while using page
>    faults-based promotion for mapped pages.
> 3. Using DAMOS to promote both mapped and unmapped hot pages.
>

This missing the scenario where DAMOS/DAMON is not suitible for
deployment in someone's environment.  The kernel should still do
*something*.

And that is kind of the point - we can expose more complexity to the
users with DAMON, but the kernel should be able to do some reasonable
promotion action without this additional system.

> >    but I
> >    also see value is making tasks handle promotion as a form of fairness.
> 
> I agree that could be good in terms of fairness.  I want to learn more about
> the significance of it, though.
>

Fairness in this scenario is simple.

If one task is causing an outsizes number of promotions to occur, and it
causes some ASYNC system to handle those promotions, it is effectively
acquiring more CPU time via that ASYNC system than other residents.

Trying to charge this time back to the noisey task is harder than just
having the task incur the cost of migration.  But doing it inline can
cause the task to slow down.

So it's difficult to predict how it's going to pan out.  Need evidence.

> I agree.  DAMON allows combining multiple different mechanisms with its core
> logic, so I beleive it migt be a place that can aggregate the different
> identification mechanisms.
> 
> DAMON's access monitoring results based system operations feature, namely
> DAMOS, also has its own aggressiveness control logic, and resides in the core
> layer, so could be used consistently with different promotion candidates
> identification mechanisms.
> 

Without data this is a nice thought, but we have existing mechanisms
that work and can be improved - lets not disrupt that.

Finding an aggregated promotion solution helps everyone move forward
without disrupting development in these areas (and makes the different
indentification mechanisms play nice with each other).

Trying to also create a voltron "one indentification system to rule them
all" is a nice thought, but it's heavy-weight compared to adding a folio
flag check and a call to mpol_migrate_misplaced().  We need to respect
that reality and not regress the existing mechanisms by trying to
over-engineer a generalized solution.

~Gregory

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-24 17:21     ` Gregory Price
@ 2025-01-25  1:17       ` SeongJae Park
  0 siblings, 0 replies; 14+ messages in thread
From: SeongJae Park @ 2025-01-25  1:17 UTC (permalink / raw)
  To: Gregory Price
  Cc: SeongJae Park, lsf-pc, damon, linux-mm, linux-kernel,
	kernel-team, Raghavendra K T, Yuanchu Xie, Jonathan Cameron,
	Kaiyang Zhao, Jiaming Yan, Honggyu Kim

On Fri, 24 Jan 2025 12:21:31 -0500 Gregory Price <gourry@gourry.net> wrote:

> On Thu, Jan 23, 2025 at 06:11:53PM -0800, SeongJae Park wrote:
[...]
> > >    This tells me injecting
> > >    an unconditional call into DAMON may be too much overhead. 
[...]
> > I'm also not pretty sure what DAMON
> > call you are thinking about.
> > 
> 
> Just any call, DAMON or otherwise.

Thanks for clarifying.

[...]
> It's not a matter of code size - it's a matter of tightly coupling core
> components of the kernel to extraneous ones.  Adding additional
> dependencies between components increases overall system complexity and
> makes it hard to reason about the behavior of the system.
[...]
> Now you need to understand swap.c, migrate.c, AND DAMON.
> 
> It makes it more difficult to reason about the system when something
> goes wrong. This increases the maintenance burden for maintainers (and
> onboarding complexity for anyone new to the kernel, for that matter).

Thank you for this kind clarification.  This is very helpful at better
understanding your point.  I cannot agree more on your point that tightly
coupling multiple components makes things compelx.  Let me emphasize your
points from other side, too.  This doesn't mean we should avoid using multiple
components together.  If the interface is well designed and being used
correctly, using multiple components together rather reduce the complexity and
maintenance burden.

In the example, swap.c maintainer should easily know something in migrate.c
that being used by swap.c is not working as documented or expected, and ask
migrate.c maintainer to fix it.  I'm trying to make DAMON be designed and used
in such a way.  I'm proposing this LSFMMBPF to help that by discussing in
depth, including specific examples of current or potential DAMON usages and
DAMON interfaces that not well designed for those.

> 
> That doesn't mean we shouldn't consider doing this - it just means that
> benefit needs to outweight the complexity/maintenance cost.

I agree this too, of course :)

[...]
> This missing the scenario where DAMOS/DAMON is not suitible for
> deployment in someone's environment.

I understand that you are saying a scenario that deploying out-of-kernel
components such as DAMON user-space tool is impossible, while those are
essential for a given usage.  And I agree that such case can be in real.

> The kernel should still do
> *something*.
> 
> And that is kind of the point - we can expose more complexity to the
> users with DAMON, but the kernel should be able to do some reasonable
> promotion action without this additional system.

I understand that you mean using DAMON for promotion requires users controls
using additional systems such as DAMON user-space tool (damo).  That's correct,
at least for today's DAMON usages for CXL memory tiering.  HMSDK[1] is such an
additional system.

Nevertheless, that's not necessarily the case in future.  DAMON aims to allow
flexible custom usages, while also just transparently works fairly well.  I
shared the humble ambition at last year's LPC[2].  We will pursue the direction
for memory tiering-purpose DAMON usage, too.

[...]
> > >    but I
> > >    also see value is making tasks handle promotion as a form of fairness.
> > 
> > I agree that could be good in terms of fairness.  I want to learn more about
> > the significance of it, though.
> >
> 
> Fairness in this scenario is simple.
> 
> If one task is causing an outsizes number of promotions to occur, and it
> causes some ASYNC system to handle those promotions, it is effectively
> acquiring more CPU time via that ASYNC system than other residents.
> 
> Trying to charge this time back to the noisey task is harder than just
> having the task incur the cost of migration.  But doing it inline can
> cause the task to slow down.
> 
> So it's difficult to predict how it's going to pan out.  Need evidence.

Yes, I agree that we need more data to say more about this topic.  Nonetheless,
I understand you are saying that's something better to have in future, and need
to aware of its potential risk, not a strict blocker of async approach
exploration.

> 
> > I agree.  DAMON allows combining multiple different mechanisms with its core
> > logic, so I beleive it migt be a place that can aggregate the different
> > identification mechanisms.
> > 
> > DAMON's access monitoring results based system operations feature, namely
> > DAMOS, also has its own aggressiveness control logic, and resides in the core
> > layer, so could be used consistently with different promotion candidates
> > identification mechanisms.
> > 
> 
> Without data this is a nice thought, but we have existing mechanisms
> that work and can be improved - lets not disrupt that.

Cannot agree more.  My intention is not to disrubpt that but ensuring people
who looking into such improvments are on the same page regarding available
current and future options.

> 
> Finding an aggregated promotion solution helps everyone move forward
> without disrupting development in these areas (and makes the different
> indentification mechanisms play nice with each other).
> 
> Trying to also create a voltron "one indentification system to rule them
> all" is a nice thought, but it's heavy-weight compared to adding a folio
> flag check and a call to mpol_migrate_misplaced().  We need to respect
> that reality and not regress the existing mechanisms by trying to
> over-engineer a generalized solution.

100% agreed.  This point is, and should, always be in DAMON hackers' mind.

Thank you for kindly clarifying your points and nice advice :)

[1] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion
[2] https://lpc.events/event/18/contributions/1768/

Thanks,
SJ

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-14  3:06 ` Gregory Price
  2025-01-24  2:11   ` SeongJae Park
@ 2025-01-30  2:15   ` Yuanchu Xie
  2025-01-30  3:47     ` SeongJae Park
  1 sibling, 1 reply; 14+ messages in thread
From: Yuanchu Xie @ 2025-01-30  2:15 UTC (permalink / raw)
  To: Gregory Price
  Cc: SeongJae Park, lsf-pc, damon, linux-mm, linux-kernel,
	kernel-team, Raghavendra K T, Jonathan Cameron, Kaiyang Zhao,
	Jiaming Yan, Honggyu Kim

On Mon, Jan 13, 2025 at 7:06 PM Gregory Price <gourry@gourry.net> wrote:
> 5) Scarce resources
>
>    We need to be careful not to consume excessive amounts of resources
>    in an attempt to track all these identifying mechanisms.  Even 1 byte
>    per folio is 256MB on a 1TB machine.  This gets out of hand quick.
>
>    With task-work, I was able to add no additional resource consumption,
>    but deferring to a fully async scenario and needing to track things
>    like last-accessing CPU, timestamps, and etc.
>
>    We'll need to examine this closely if we decide to aggregate either
>    of these mechanisms.
My concern with physical address space monitoring is fragmentation. I
ran some numbers on a few prod machines. Grouping by regions with the
same memcg and ignoring any unmapped memory to be generous, machines
with higher utilization can have a region/total pages ratio of ~40%,
and even those with lower utilization (<50%) can also reach 20%.
Accurately tracking these regions would require quite the region
metadata, on the order of GBs.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-30  2:15   ` Yuanchu Xie
@ 2025-01-30  3:47     ` SeongJae Park
  2025-01-31 10:05       ` Jonathan Cameron
  0 siblings, 1 reply; 14+ messages in thread
From: SeongJae Park @ 2025-01-30  3:47 UTC (permalink / raw)
  To: Yuanchu Xie
  Cc: SeongJae Park, Gregory Price, lsf-pc, damon, linux-mm,
	linux-kernel, kernel-team, Raghavendra K T, Jonathan Cameron,
	Kaiyang Zhao, Jiaming Yan, Honggyu Kim

Hi Yuanchu,

On Wed, 29 Jan 2025 18:15:08 -0800 Yuanchu Xie <yuanchu@google.com> wrote:

> On Mon, Jan 13, 2025 at 7:06 PM Gregory Price <gourry@gourry.net> wrote:
> > 5) Scarce resources
> >
> >    We need to be careful not to consume excessive amounts of resources
> >    in an attempt to track all these identifying mechanisms.  Even 1 byte
> >    per folio is 256MB on a 1TB machine.  This gets out of hand quick.
> >
> >    With task-work, I was able to add no additional resource consumption,
> >    but deferring to a fully async scenario and needing to track things
> >    like last-accessing CPU, timestamps, and etc.
> >
> >    We'll need to examine this closely if we decide to aggregate either
> >    of these mechanisms.
> My concern with physical address space monitoring is fragmentation. I
> ran some numbers on a few prod machines. Grouping by regions with the
> same memcg and ignoring any unmapped memory to be generous, machines
> with higher utilization can have a region/total pages ratio of ~40%,
> and even those with lower utilization (<50%) can also reach 20%.
> Accurately tracking these regions would require quite the region
> metadata, on the order of GBs.

You're right, if we need page level accuracy access monitoring and want to use
DAMON with its regions based mechanism for that, the memory overhead of
damon_region could be high.  That's mainly because DAMON's regions-based
mechanism has not designed for such usage.  It is more for a best-effort
tradeoff between the overhead and the accuracy.

Regions-based mechanism is not necessarily the only mechanism of future DAMON,
though.  If there are use cases that regions-based best-effort accuracy cannot
be used while exactly the page level accuracy is really required, we can think
about optimizing regions based mechanism or developing new one.

But, IMHO, the page level accurate access pattern is not always essential.  In
many cases, being able to distinguish some amount of regions agains others
based on access pattern is practical enough.  Indeed, DAMON has been used on
real-world products with physical address based moitoring mode for years with
no significant problem.  Also I think physical address space based monitoring
results[1] on a real server workload that I shared recently seems not very bad.

Of course your use case could be different from what I have experienced so far.
I'm curious if and why you really need page level accuracy.

[1] https://lore.kernel.org/20250110185232.54907-3-sj@kernel.org

Thanks,
SJ

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-30  3:47     ` SeongJae Park
@ 2025-01-31 10:05       ` Jonathan Cameron
  0 siblings, 0 replies; 14+ messages in thread
From: Jonathan Cameron @ 2025-01-31 10:05 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Yuanchu Xie, Gregory Price, lsf-pc, damon, linux-mm,
	linux-kernel, kernel-team, Raghavendra K T, Kaiyang Zhao,
	Jiaming Yan, Honggyu Kim

On Wed, 29 Jan 2025 19:47:49 -0800
SeongJae Park <sj@kernel.org> wrote:

> Hi Yuanchu,
> 
> On Wed, 29 Jan 2025 18:15:08 -0800 Yuanchu Xie <yuanchu@google.com> wrote:
> 
> > On Mon, Jan 13, 2025 at 7:06 PM Gregory Price <gourry@gourry.net> wrote:  
> > > 5) Scarce resources
> > >
> > >    We need to be careful not to consume excessive amounts of resources
> > >    in an attempt to track all these identifying mechanisms.  Even 1 byte
> > >    per folio is 256MB on a 1TB machine.  This gets out of hand quick.
> > >
> > >    With task-work, I was able to add no additional resource consumption,
> > >    but deferring to a fully async scenario and needing to track things
> > >    like last-accessing CPU, timestamps, and etc.
> > >
> > >    We'll need to examine this closely if we decide to aggregate either
> > >    of these mechanisms.  
> > My concern with physical address space monitoring is fragmentation. I
> > ran some numbers on a few prod machines. Grouping by regions with the
> > same memcg and ignoring any unmapped memory to be generous, machines
> > with higher utilization can have a region/total pages ratio of ~40%,
> > and even those with lower utilization (<50%) can also reach 20%.
> > Accurately tracking these regions would require quite the region
> > metadata, on the order of GBs.  

I'd second this. Some cases are reasonably well behaved and
regions 'kind of work' for PA based tracking some very much not.
Add anything like overcommitted VMs on top and contiguity of 'hotness'
beyond very small regions goes out the window very quickly (unfortunately
I'm not able to share specific data).

So there are definitely cases where I'd expect something else to be needed.

There are a plenty of approximate tracking methods in the literature
that might be good enough with much lower overhead than precise tracking
(sketches etc) if we can feed them the right data.

Typically we don't need the answer on how hot all memory is, just some info
on 'this lot are particularly hot' and 'this lot are reasonably' cold.

Damon (as it currently stands) can sometimes give this info so to me
it's a possible producer of data for another layer that focuses on
abstracting the data to what we want only.  Hopefully we can make
that work for all the forms of tracking temperature that people are
looking at. I'm biased in favor of hardware units but no everyone will
have those toys available for a while yet :)

> 
> You're right, if we need page level accuracy access monitoring and want to use
> DAMON with its regions based mechanism for that, the memory overhead of
> damon_region could be high.  That's mainly because DAMON's regions-based
> mechanism has not designed for such usage.  It is more for a best-effort
> tradeoff between the overhead and the accuracy.
> 
> Regions-based mechanism is not necessarily the only mechanism of future DAMON,
> though.  If there are use cases that regions-based best-effort accuracy cannot
> be used while exactly the page level accuracy is really required, we can think
> about optimizing regions based mechanism or developing new one.
> 
> But, IMHO, the page level accurate access pattern is not always essential.  In
> many cases, being able to distinguish some amount of regions agains others
> based on access pattern is practical enough.  Indeed, DAMON has been used on
> real-world products with physical address based moitoring mode for years with
> no significant problem.  Also I think physical address space based monitoring
> results[1] on a real server workload that I shared recently seems not very bad.
> 
> Of course your use case could be different from what I have experienced so far.
> I'm curious if and why you really need page level accuracy.
> 
> [1] https://lore.kernel.org/20250110185232.54907-3-sj@kernel.org
> 
> 
> Thanks,
> SJ



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-01 22:20 [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future SeongJae Park
  2025-01-02  4:09 ` Matthew Wilcox
  2025-01-14  3:06 ` Gregory Price
@ 2025-01-20 18:46 ` Jonathan Cameron
  2025-03-25 21:01 ` SeongJae Park
  3 siblings, 0 replies; 14+ messages in thread
From: Jonathan Cameron @ 2025-01-20 18:46 UTC (permalink / raw)
  To: SeongJae Park
  Cc: lsf-pc, damon, linux-mm, linux-kernel, kernel-team,
	Raghavendra K T, Yuanchu Xie, Gregory Price, Kaiyang Zhao,
	Jiaming Yan, Honggyu Kim

On Wed,  1 Jan 2025 14:20:39 -0800
SeongJae Park <sj@kernel.org> wrote:

> Hi all,
> 
> 
> I find a few interesting and promising projects that aim to do efficient access
> pattern-aware memory management of near future, including below (alphabetically
> sorted).
> 

Hi SJ,

> - CXL hotness monitoring unit
>   (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com)

For hardware hotness monitors the type of data has relatively little connection
to what I understand Damon provides and the control schemes are somewhat different.
Hotness tracking units should provide a simple list of hot fixed size granuals
(hot 'pages') to whatever is using the hotness engine.  Damon and other in kernel
schemes might also be able to provide such outputs, but the underlying schemes
seem very different as the outputs of these trackers neither map to Damon regions,
or to dense sets of page counters.

So to me the commonality looks to be one layer up: We get lists of stuff
to consider moving and control paths to whatever is providing those lists
to indicate:
* More or fewer suggestions please (bandwidth controls etc)
* Minimum 'hotness' below which it should not suggest moving them.

For CXL Hotness monitoring units, there are open questions about how to get good
data given a limited resources likely to be found on devices. Simplest sense
can be thought of as a fixed set of counters, but typically it will be more
complex than that with statistical accuracy tradeoffs rather than did we
count it or not.

We need to do some work to find out what works best across many workloads
considering options (depending on hardware capabilities) such as
a) coarse to fine
b) random subsampling of 256MiB chunks of PA space.
c) scanning across PA space looking at a smallish region (16Gig maybe) at
   a time.
Also need to be flexible to use multiple parallel trackers if available on
a given device or time slices on a single tracker.

I'm not yet seeing enough different engines to figure out if there is
commonality in that control scheme between CXL style interfaces and those that
we may see from other places etc. If anyone is in a position to share info
on other hotness monitoring offloaded units that are targeting real products +
their interfaces that would be great. For now I think we are going to end
up with something specific in the CXL HMU driver with the rest of the kernel just
seeing a list of 'hot PA address chunks / pages in PA space'.

Given we will need a virtualized solution as well for guests that are running
on a fixed mix of tiers, I'd expect a "virtio-hotness" or similar that only
provides these sorts of generalized controls leaving the host to figure
out how to control the particular hotness trackers. The controls to that
would be inline with what I'd expect to be exposed to other layers of the
kernel from a given hotness tracker.

For me it feels like we are a bit early wrt to hardware trackers to come
to firm conclusions, but perhaps others are further ahead with
answering some of the precursor questions. I am keen that we don't
end up with a solution that doesn't work with them so this discussion
if of interest to me.

> - Memory tiering fainess by per-cgroup control of promotion and demotion
>   (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu)
> - Promotion of unmapped page cache folios
>   (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)
> - Slow-tier page promotion based on PTE A bit
>   (https://lore.kernel.org/20241201153818.2633616-1-raghavendra.kt@amd.com)
> - Workingset reporting
>   (https://lore.kernel.org/20241127025728.3689245-1-yuanchu@google.com)
> 
> The goal of DAMON is to help accelerating such developments by being a
> framework that can reduce fundamental efforts for monitoring memory access
> patterns and managing memory using the information.  AWS Aurora Serverless v2
> and SK hynix are successfully using DAMON in the way for proactive memory
> reclamation[1] and CXL memory tiering[2].
> 
> To further deliver such benefits for the ongoing and future projects, we need
> to better understand what the projects really need, how DAMON can provide those
> now or in future, and if there are alternatives better than DAMON.  Regardless
> of the conclusion about DAMON, the works apparently have common parts, so the
> discussion will benefit all.
> 
> I propose to have the discussion at LSF/MM/BPF.  In the session, I will briefly
> introduce the works and possible DAMON usages, and continue the open discussion
> for better understanding each other.  The discussion will not be limited to
> DAMON and abovely mentioned projects but possible alternatives and general
> access-aware memory management projects.  After the discussion, we will
> hopefully find ways to efficiently collaborate, or at least do not disturb each
> other.

I like that last comment :)

Jonathan

> 
> [1] https://assets.amazon.science/ee/a4/41ff11374f2f865e5e24de11bd17/resource-management-in-aurora-serverless.pdf
> [2] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion
> 
> 
> Thanks,
> SJ

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
  2025-01-01 22:20 [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future SeongJae Park
                   ` (2 preceding siblings ...)
  2025-01-20 18:46 ` Jonathan Cameron
@ 2025-03-25 21:01 ` SeongJae Park
  3 siblings, 0 replies; 14+ messages in thread
From: SeongJae Park @ 2025-03-25 21:01 UTC (permalink / raw)
  To: SeongJae Park
  Cc: lsf-pc, damon, linux-mm, linux-kernel, kernel-team,
	Raghavendra K T, Yuanchu Xie, Jonathan Cameron, Gregory Price,
	Kaiyang Zhao, Jiaming Yan, Honggyu Kim

Hello,

On Wed,  1 Jan 2025 14:20:39 -0800 SeongJae Park <sj@kernel.org> wrote:

> Hi all,
> 
> 
> I find a few interesting and promising projects that aim to do efficient access
> pattern-aware memory management of near future, including below (alphabetically
> sorted).
> 
> - CXL hotness monitoring unit
>   (https://lore.kernel.org/20241121101845.1815660-1-Jonathan.Cameron@huawei.com)
> - Memory tiering fainess by per-cgroup control of promotion and demotion
>   (https://lore.kernel.org/20241108190152.3587484-1-kaiyang2@cs.cmu.edu)
> - Promotion of unmapped page cache folios
>   (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)
> - Slow-tier page promotion based on PTE A bit
>   (https://lore.kernel.org/20241201153818.2633616-1-raghavendra.kt@amd.com)
> - Workingset reporting
>   (https://lore.kernel.org/20241127025728.3689245-1-yuanchu@google.com)
> 
> The goal of DAMON is to help accelerating such developments by being a
> framework that can reduce fundamental efforts for monitoring memory access
> patterns and managing memory using the information.  AWS Aurora Serverless v2
> and SK hynix are successfully using DAMON in the way for proactive memory
> reclamation[1] and CXL memory tiering[2].
> 
> To further deliver such benefits for the ongoing and future projects, we need
> to better understand what the projects really need, how DAMON can provide those
> now or in future, and if there are alternatives better than DAMON.  Regardless
> of the conclusion about DAMON, the works apparently have common parts, so the
> discussion will benefit all.
> 
> I propose to have the discussion at LSF/MM/BPF.  In the session, I will briefly
> introduce the works and possible DAMON usages, and continue the open discussion
> for better understanding each other.  The discussion will not be limited to
> DAMON and abovely mentioned projects but possible alternatives and general
> access-aware memory management projects.  After the discussion, we will
> hopefully find ways to efficiently collaborate, or at least do not disturb each
> other.
> 
> [1] https://assets.amazon.science/ee/a4/41ff11374f2f865e5e24de11bd17/resource-management-in-aurora-serverless.pdf
> [2] https://github.com/skhynix/hmsdk/wiki/Capacity-Expansion

A draft of the slides for this session is now available at
https://github.com/damonitor/talks/blob/master/2025/lsfmmbpf/damon_requirements_lsfmmbpf_2025.pdf

I may make more last time changes to the slides, but the final version should
also be available on the same URL.


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-03-25 21:01 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-01 22:20 [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future SeongJae Park
2025-01-02  4:09 ` Matthew Wilcox
2025-01-02 15:22   ` Gregory Price
2025-01-02 18:00     ` SeongJae Park
2025-01-02 18:04       ` SeongJae Park
2025-01-14  3:06 ` Gregory Price
2025-01-24  2:11   ` SeongJae Park
2025-01-24 17:21     ` Gregory Price
2025-01-25  1:17       ` SeongJae Park
2025-01-30  2:15   ` Yuanchu Xie
2025-01-30  3:47     ` SeongJae Park
2025-01-31 10:05       ` Jonathan Cameron
2025-01-20 18:46 ` Jonathan Cameron
2025-03-25 21:01 ` SeongJae Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox