Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Gregory Price <gourry@gourry.net>
To: SeongJae Park <sj@kernel.org>
Cc: lsf-pc@lists.linux-foundation.org, damon@lists.linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com, Raghavendra K T <raghavendra.kt@amd.com>,
	Yuanchu Xie <yuanchu@google.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Kaiyang Zhao <kaiyang2@cs.cmu.edu>,
	Jiaming Yan <jiamingy@amazon.com>,
	Honggyu Kim <honggyu.kim@sk.com>
Subject: Re: [LSF/MM/BPF TOPIC] DAMON Requirements for Access-aware MM of Future
Date: Mon, 13 Jan 2025 22:06:09 -0500	[thread overview]
Message-ID: <Z4XUoWlU-UgRik18@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <20250101222039.74565-1-sj@kernel.org>

On Wed, Jan 01, 2025 at 02:20:39PM -0800, SeongJae Park wrote:
> Hi all,
> 
> 
> I find a few interesting and promising projects that aim to do efficient access
> pattern-aware memory management of near future, including below (alphabetically
> sorted).
> 
> - Promotion of unmapped page cache folios
>   (https://lore.kernel.org/20241210213744.2968-1-gourry@gourry.net)

I'll break down a few observations I made while hacking on unmapped
page cache promotion - and my concerns for a leveraging DAMON here.

Additionally some other concerns I've seen raised about duplicating
promotion logic across various kernel components.

Latest RFC:
https://lore.kernel.org/linux-mm/20250107000346.1338481-1-gourry@gourry.net/

Basic Premise:
   Use folio_mark_accessed() as a measure of hotness for promotion.
   Defer promotion to task_work due to locking complexities.

My major concerns / lessons learned from this exercise include:

1) The cost of checking promotion candidacy can be problematic

   In my microbenchmark in the last RFC version, I showed that while
   the performance upside (~22-25%) is substantial, there was a
   non-trivial cost associated with injecting even a single global
   boolean check in the file_read() path.  This was unexpected.

   I can probably optimize the disabled case with a likely() clause,
   but I did not expect such sensitivity.  This tells me injecting
   an unconditional call into DAMON may be too much overhead. 

   I would need to explore this further - including whether it is
   feasible to inject such a large dependency into swap.c

   This may not affect all cases, but it does affect at least this one.

2) The complexity of "when it is safe" to promote a folio is subtle
   at best, and "actively hostile" at worst.

   I learned in v1 of the RFC that promotion inline with fma() is not
   feasible due to a few contexts (task dying in particular) in which
   migration is not safe.  I deferred to task work because I noticed
   prior attempts (in development notes) had seen similar issues.

   Adding a folio reference and/or page flag to defer that migration to
   another context (i.g. async kthread) solves this at the expensive of
   implementation complexity. (leaked folios if done wrong)

   I'd have to look at whether it's worth the increased complexity to
   aggregate this (particular) identification mechanism - but I think
   there is clear value to aggregating promotion.

   I could see some value in pumping tracking bits into DAMON - but I
   also see value is making tasks handle promotion as a form of fairness.

3) There were expressed opinions on runtime fairness WRT to promotion.

   There's two competing thoughts:
   A) Making accessing tasks eat inline promotion cost captures that
      cost in their runtime slice, promoting fairness in scheduling.

   B) Aggregating promotion to an external thread can reduce inline
      faults and tail latencies, but may hides per-task cost. This
      is a concern if one task drives all the promotions, effectingly
      stealing an entire core by nature of the async design.

   I don't have a good answer to this, just an observation that charging
   promotion time to the identifying task was a concern that was raised.

4) TPP and Unmapped Page Promotion may affect each other.

   There is a rate-limiting mechanism in the migration path that was
   intended to prevent over-pressuring bandwidth with aggressive
   migrations - prevent major memory stalls.

   By adding more pressure on this limit from an additional source,
   we're obviously increasing the time it takes to converge.

   This is probably the greatest argument for creating a new, aggregated
   promotion mechanism to serve all of these identification mechanism.

   This would make it easier for us to determine whether/what
   identification mechanisms can be aggregated while enabling forward
   progress on each of them separately.

5) Scarce resources

   We need to be careful not to consume excessive amounts of resources
   in an attempt to track all these identifying mechanisms.  Even 1 byte
   per folio is 256MB on a 1TB machine.  This gets out of hand quick.

   With task-work, I was able to add no additional resource consumption,
   but deferring to a fully async scenario and needing to track things
   like last-accessing CPU, timestamps, and etc.

   We'll need to examine this closely if we decide to aggregate either
   of these mechanisms.

~Gregory

next prev parent reply	other threads:[~2025-01-14  3:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-01 22:20 SeongJae Park
2025-01-02  4:09 ` Matthew Wilcox
2025-01-02 15:22   ` Gregory Price
2025-01-02 18:00     ` SeongJae Park
2025-01-02 18:04       ` SeongJae Park
2025-01-14  3:06 ` Gregory Price [this message]
2025-01-24  2:11   ` SeongJae Park
2025-01-24 17:21     ` Gregory Price
2025-01-25  1:17       ` SeongJae Park
2025-01-30  2:15   ` Yuanchu Xie
2025-01-30  3:47     ` SeongJae Park
2025-01-31 10:05       ` Jonathan Cameron
2025-01-20 18:46 ` Jonathan Cameron
2025-03-25 21:01 ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4XUoWlU-UgRik18@gourry-fedora-PF4VCD3F \
    --to=gourry@gourry.net \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=damon@lists.linux.dev \
    --cc=honggyu.kim@sk.com \
    --cc=jiamingy@amazon.com \
    --cc=kaiyang2@cs.cmu.edu \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=raghavendra.kt@amd.com \
    --cc=sj@kernel.org \
    --cc=yuanchu@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox