linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
To: SeongJae Park <sj@kernel.org>
Cc: damon@lists.linux.dev, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  linux-doc@vger.kernel.org,
	akpm@linux-foundation.org, corbet@lwn.net,  bijan311@gmail.com,
	ajayjoshi@micron.com, honggyu.kim@sk.com,  yunjeong.mun@sk.com
Subject: Re: [RFC PATCH v4 4/4] mm/damon: add PA-mode cache for eligible memory detection lag
Date: Wed, 25 Feb 2026 10:58:15 -0800	[thread overview]
Message-ID: <CALa+Y14yVxW+NSP6-G+93yHFLKhFhvKMQowGUR1MBcPgvO_q-A@mail.gmail.com> (raw)
In-Reply-To: <20260224055451.58713-1-sj@kernel.org>

On Mon, Feb 23, 2026 at 9:54 PM SeongJae Park <sj@kernel.org> wrote:
>
> On Mon, 23 Feb 2026 12:32:32 +0000 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:
>
> > In PA-mode, DAMON needs time to re-detect hot memory at new physical
> > addresses after migration. This causes the goal metrics to temporarily
> > show incorrect values until detection catches up.
>
> I agree this can happen, and could be problematic on some setup.
>

Thank you for acknowledging the issue.

> >
> > Add an eligible cache mechanism to compensate for this detection lag:
> >
> > - Track migration deltas per node using a rolling window that
> >   automatically expires old data
> > - Use direction-aware adjustment: for target nodes (receiving memory),
> >   use max(detected, predicted) to ensure migrated memory is counted
> >   even before detection catches up; for source nodes (losing memory),
> >   use predicted values when detection shows unreliable low values
> > - Maintain the zero-sum property across nodes to preserve total
> >   eligible memory
> > - Include cooldown mechanism to keep cache active while detection
> >   stabilizes after migration stops
> > - Add time-based expiry to clear stale cache data when no migration
> >   occurs for a configured period
> >
> > The cache uses max_eligible tracking to handle detection oscillation,
> > prioritizing peak observed values over potentially stale snapshots.
> > A threshold check prevents quota oscillation when detection swings
> > between zero and small values.
>
> But, I feel this might be too overfit solution for a specific setup.
>
> >
> > Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com>
> > ---
> >  include/linux/damon.h    |  45 +++++
> >  mm/damon/core.c          | 421 +++++++++++++++++++++++++++++++++++----
> >  mm/damon/sysfs-schemes.c |  30 +++
> >  3 files changed, 460 insertions(+), 36 deletions(-)
>
> The size of the change is quite big.  I'm now curious if the problem is
> significant enough for this size of change, and if this solution is only the
> single and the best one.

I understand. The cache was consciously separated as patch 4 because it
represents ONE possible approach to handle detection lag - not
necessarily THE approach.
My goal was to share what was needed to achieve equilibrium with my
synthetic benchmark workload,
while making it clear this could be dropped or replaced with alternatives.

>
> First of all, I'm curious if the problem is that significant.  I assume you may
> seen the issue from your test setup that you shared with the cover letter.
> From my understanding of the cover letter of this patch series, however, you
> are testing this on a setup having two complementary schemes.  And you use
> TEMPORAL tuner.  The motivation of TEMPORAL tuner was for setup that not having
> a factor to move the quota goal value without additional intervention.  In
> complementary schemes setup, the schemes becomes such factors for each other.
> In the case, TEMPORAL tuner might be worse in terms of the size of temporal
> oscillations.  I don't know details of your test setup, but I suspect the use
> of TEMPORAL tuner might made the issue bigger than real.

That's a fair point. I chose TEMPORAL because I wanted to move the required
amount of pages as quickly as possible to reach equilibrium - essentially
"migrate at full speed until target is reached, then stop." For my multiload
benchmark with uniformly hot memory, this seemed like the most direct
approach.

You're right that with complementary schemes, the schemes act as factors for
each other, and CONSIST tuner with its feedback loop might make the detection
lag problem more manageable through gradual adjustment.

>
> I also assume the real world people may use DAMON with auto-tuning mostly
> because they don't know the access pattern of the system and assume it will be
> dynamic.  In the case, even if we perfectly solve the issue, some of
> oscillation will happen.  So, I think the issue in the real world might be
> smaller than that we can find on some specific test setups.

Agreed. Real-world workloads with mixed hot/cold memory and dynamic access
patterns might behave differently from my synthetic benchmark where all memory
is uniformly hot. The uniform-hot case is essentially a worst-case scenario
that forces continuous oscillation regardless of detection lag compensation.

>
> Meanwhile, the node_[in]eligible_mem_bp concept makes sense to me.  I'm worried
> if this patch is unnecessarily delaying the progress of the main change.
>
> So, unless we have clear evidence of the significance of this issue, I'd prefer
> dropping this for now.  After that, if the issue turns out to be significant or
> this solution is proven to be significantly beneficial, from your next more
> realistic test setup, or from real world usage after upstreaming of the main
> change, we can revisit.  What do you think?

I agree with dropping this patch for now. Let's focus on getting the
core metrics merged first.
The cache mechanism can be revisited later if real-world usage shows
it's needed.

Thanks,
Ravi.

>
>
> Thanks,
> SJ
>
> [...]


  reply	other threads:[~2026-02-25 18:58 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 12:32 [RFC PATCH v3 0/4] mm/damon: Introduce node_eligible_mem_bp and node_ineligible_mem_bp Quota Goal Metrics Ravi Jonnalagadda
2026-02-23 12:32 ` [RFC PATCH v3 1/4] mm/damon/sysfs: set goal_tuner after scheme creation Ravi Jonnalagadda
2026-02-24  1:40   ` SeongJae Park
2026-02-25 18:23     ` Ravi Jonnalagadda
2026-02-23 12:32 ` [RFC PATCH v3 2/4] mm/damon: fix esz=0 quota bypass allowing unlimited migration Ravi Jonnalagadda
2026-02-24  1:54   ` SeongJae Park
2026-02-25 18:28     ` Ravi Jonnalagadda
2026-02-23 12:32 ` [RFC PATCH v3 3/4] mm/damon: add node_eligible_mem_bp and node_ineligible_mem_bp goal metrics Ravi Jonnalagadda
2026-02-24  4:27   ` SeongJae Park
2026-02-25 18:46     ` Ravi Jonnalagadda
2026-02-23 12:32 ` [RFC PATCH v4 4/4] mm/damon: add PA-mode cache for eligible memory detection lag Ravi Jonnalagadda
2026-02-24  5:54   ` SeongJae Park
2026-02-25 18:58     ` Ravi Jonnalagadda [this message]
2026-02-24  5:36 ` [RFC PATCH v3 0/4] mm/damon: Introduce node_eligible_mem_bp and node_ineligible_mem_bp Quota Goal Metrics SeongJae Park
2026-02-25 18:19   ` Ravi Jonnalagadda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALa+Y14yVxW+NSP6-G+93yHFLKhFhvKMQowGUR1MBcPgvO_q-A@mail.gmail.com \
    --to=ravis.opensrc@gmail.com \
    --cc=ajayjoshi@micron.com \
    --cc=akpm@linux-foundation.org \
    --cc=bijan311@gmail.com \
    --cc=corbet@lwn.net \
    --cc=damon@lists.linux.dev \
    --cc=honggyu.kim@sk.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox